CLI Reference

IsoGraph exposes a single CLI entry point:

isograph --help

The current subcommands are benchmark, freeze-real, fit, compare, export, explain-module, and annotate-structure.

Overrides

benchmark, freeze-real, and fit accept Hydra-style overrides after --:

isograph benchmark -- backend=latent fixture_filter=medium_v1 stage_name=stage2_docs
isograph fit --dataset-path my_cohort --backend vae -- vae.alpha=0.6 vae.hidden_dim=256

benchmark

Run the bundled fixture suite, or a filtered subset, through a selected backend.

The default backend is vae. Available backends: baseline, latent, graph, vae, wgcna.

Examples:

# VAE on a single fixture (default backend)
isograph benchmark -- fixture_filter=toy_v1 stage_name=vae_toy

# WGCNA on the scale suite
isograph benchmark --config-name stage6_scale_comparison_wgcna

Behavior:

  • For dataset_suite: core_v1 — generates the synthetic core_v1 fixtures as needed and freezes the real fixture unless fixture_filter targets only synthetic datasets.

  • For dataset_suite: multiplex_v1 — generates abundance-aware toy, medium, noisy, and large fixtures with explicit truth_switch, truth_abundance, and truth_channel_role tables.

  • For dataset_suite: scale_v1 — generates xlarge_v1 (6 000 genes), xxlarge_v1 (12 000 genes), and xxlarge_stress_v1 (12 000 genes, stressed parameters).

  • Writes per-fixture artifacts under artifacts/benchmarks/<stage_name>/.

  • Writes benchmark and runtime summaries under artifacts/reports/.

  • Writes calibration reports when the selected backend emits calibration metadata.

freeze-real

Freeze the bundled real-data fixture from local source tables.

Example:

isograph freeze-real --suite-name core_v1

The command reads BenchmarkCommandConfig.real_data through the benchmark config and caches intermediate selections under benchmarks/cache/real_data/.

fit

Fit any backend on a prepared dataset bundle. VAE is the default.

# VAE (default)
isograph fit \
  --dataset-path benchmarks/datasets/custom/my_cohort_v1 \
  --output-dir artifacts/fits/vae_default

# Baseline
isograph fit \
  --dataset-path benchmarks/datasets/core_v1/toy_v1 \
  --backend baseline \
  --output-dir artifacts/fits/toy_v1

# VAE with Hydra overrides
isograph fit \
  --dataset-path benchmarks/datasets/custom/my_cohort_v1 \
  --backend vae \
  --output-dir artifacts/fits/vae_tuned \
  -- vae.alpha=0.6 vae.hidden_dim=256 vae.n_epochs=400

Available backends: baseline, latent, graph, vae, wgcna.

Outputs:

  • modules.parquet

  • edges.parquet

  • traits.parquet

  • feature_scores.parquet

  • calibration.json (when the backend emits calibration metadata — VAE, latent)

  • fit_config.json

Default config values for all backends live in configs/fit.yaml and can be overridden with Hydra syntax after --.

explain-module

Explain one or more fitted modules at transcript-feature resolution.

isograph explain-module \
  --artifact-dir artifacts/fits/my_dataset \
  --feature-table features.parquet \
  --feature-meta feature_metadata.parquet \
  --module-ids M000 M001 \
  --output-dir artifacts/explain/run1

With plots and VAE decoder attribution:

isograph explain-module \
  --artifact-dir artifacts/fits/my_dataset \
  --feature-table features.parquet \
  --feature-meta feature_metadata.parquet \
  --plot --output-format png pdf \
  --vae-attribution \
  --output-dir artifacts/explain/run1

With Captum Integrated Gradients (requires pip install isograph[torch-explain]):

isograph explain-module \
  --artifact-dir artifacts/fits/my_dataset \
  --feature-table features.parquet \
  --feature-meta feature_metadata.parquet \
  --integrated-gradients --ig-n-steps 100 \
  --output-dir artifacts/explain/run1

With a structural annotation table (from annotate-structure):

isograph explain-module \
  --artifact-dir artifacts/fits/my_dataset \
  --feature-table features.parquet \
  --feature-meta feature_metadata.parquet \
  --annotation-table transcript_structure_annotations.tsv \
  --output-dir artifacts/explain/run1

Inputs:

  • --artifact-dir must contain modules.parquet and feature_scores.parquet.

  • --feature-table: Parquet with sample IDs as index (or sample_id column), feature IDs as columns.

  • --feature-meta: Parquet or TSV with columns feature_id, gene_id, feature_type (required); gene_name, transcript_id, exon_id, event_id (optional).

Outputs per module in {output_dir}/{module_id}/:

  • gene_driver_table.parquet — gene-level drivers sorted by |r|

  • transcript_polarity_table.parquet — transcript-level correlations with switch_strength

  • high_vs_low_table.parquet — mean usage contrast between high- and low-module samples

  • vae_drivers.parquet — high-confidence VAE decoder attribution (with --vae-attribution)

  • ig_attributions.parquet — per-feature IG scores (with --integrated-gradients)

  • Plot files (*.png, *.pdf) when --plot is given

Shared manifest: {output_dir}/module_explanation_manifest.json

Key flags:

Flag

Default

Description

--module-ids

all

Module IDs to explain

--plot

off

Write publication-ready plot files

--output-format

png

png, pdf, or both

--annotation-table

none

Structural annotation TSV from annotate-structure

--vae-attribution

off

VAE decoder Jacobian attribution (needs checkpoint)

--vae-fdr-threshold

0.05

FDR cutoff for high-confidence VAE drivers

--vae-percentile-threshold

90.0

|decoded_delta| percentile for VAE drivers

--integrated-gradients

off

Captum IG encoder attribution (needs checkpoint + captum)

--ig-n-steps

50

IG interpolation steps

--ig-baseline

zero

IG baseline: zero or mean

annotate-structure

Annotate transcript switch pairs with structural labels from a GTF file.

isograph annotate-structure \
  --gtf gencode.v47.annotation.gtf.gz \
  --switch-pairs switch_pairs.tsv \
  --output transcript_structure_annotations.tsv

Cache the parsed GTF to avoid re-parsing on repeated runs (raw GENCODE v47 ≈ 20 min on NFS; cached ≈ seconds):

isograph annotate-structure \
  --gtf gencode.v47.annotation.gtf.gz \
  --switch-pairs switch_pairs.tsv \
  --gtf-cache gencode_v47_cache.parquet \
  --output transcript_structure_annotations.tsv

Inputs:

  • --gtf: GTF or GTF.gz annotation file (GENCODE or Ensembl conventions supported).

  • --switch-pairs: TSV with columns gene_id, transcript_id_1, transcript_id_2.

Output (--output): TSV with structural labels per switch pair:

Label

Type

Description

first_exon_changed

bool

First exon differs between transcripts

last_exon_changed

bool

Last exon differs

internal_exon_diff

bool

Internal exon composition differs

cds_changed

bool

CDS coordinates differ

utr_changed

bool

UTR coordinates differ

biotype_switch

bool

Transcript biotype differs

coding_status_change

bool

One transcript is coding, the other is not

tx_length_delta

float

Transcript length difference (bp)

cds_length_delta

float

CDS length difference (bp)

shared_exon_fraction

float

Fraction of exons shared between the two transcripts

Pass the output to isograph explain-module --annotation-table to merge these labels into the driver tables.

compare

Compare either two snapshot directories or two benchmark JSON reports.

Examples:

isograph compare \
  --reference snapshots/stage0_toy_v1_baseline_v1_seed0000 \
  --candidate artifacts/benchmarks/quickstart_baseline/toy_v1
isograph compare \
  --reference artifacts/reports/stage2_latent-benchmark.json \
  --candidate artifacts/reports/stage4_vae-benchmark.json

export

Write a JSON summary of a prepared dataset bundle.

Example:

isograph export \
  --dataset-path benchmarks/datasets/core_v1/toy_v1 \
  --output-path artifacts/reports/toy_v1-summary.json