# CLI Reference

IsoGraph exposes a single CLI entry point:

```bash
isograph --help
```

The current subcommands are `benchmark`, `freeze-real`, `fit`, `compare`, `export`,
`explain-module`, and `annotate-structure`.

## Overrides

`benchmark`, `freeze-real`, and `fit` accept Hydra-style overrides after `--`:

```bash
isograph benchmark -- backend=latent fixture_filter=medium_v1 stage_name=stage2_docs
isograph fit --dataset-path my_cohort --backend vae -- vae.alpha=0.6 vae.hidden_dim=256
```

## `benchmark`

Run the bundled fixture suite, or a filtered subset, through a selected backend.

The default backend is `vae`. Available backends: `baseline`, `latent`, `graph`, `vae`,
`wgcna`.

Examples:

```bash
# VAE on a single fixture (default backend)
isograph benchmark -- fixture_filter=toy_v1 stage_name=vae_toy

# WGCNA on the scale suite
isograph benchmark --config-name stage6_scale_comparison_wgcna
```

Behavior:

- For `dataset_suite: core_v1` — generates the synthetic `core_v1` fixtures as needed and
  freezes the real fixture unless `fixture_filter` targets only synthetic datasets.
- For `dataset_suite: multiplex_v1` — generates abundance-aware toy, medium, noisy,
  and large fixtures with explicit `truth_switch`, `truth_abundance`, and
  `truth_channel_role` tables.
- For `dataset_suite: scale_v1` — generates `xlarge_v1` (6 000 genes), `xxlarge_v1`
  (12 000 genes), and `xxlarge_stress_v1` (12 000 genes, stressed parameters).
- Writes per-fixture artifacts under `artifacts/benchmarks/<stage_name>/`.
- Writes benchmark and runtime summaries under `artifacts/reports/`.
- Writes calibration reports when the selected backend emits calibration metadata.

## `freeze-real`

Freeze the bundled real-data fixture from local source tables.

Example:

```bash
isograph freeze-real --suite-name core_v1
```

The command reads `BenchmarkCommandConfig.real_data` through the benchmark config and
caches intermediate selections under `benchmarks/cache/real_data/`.

## `fit`

Fit any backend on a prepared dataset bundle. VAE is the default.

```bash
# VAE (default)
isograph fit \
  --dataset-path benchmarks/datasets/custom/my_cohort_v1 \
  --output-dir artifacts/fits/vae_default

# Baseline
isograph fit \
  --dataset-path benchmarks/datasets/core_v1/toy_v1 \
  --backend baseline \
  --output-dir artifacts/fits/toy_v1

# VAE with Hydra overrides
isograph fit \
  --dataset-path benchmarks/datasets/custom/my_cohort_v1 \
  --backend vae \
  --output-dir artifacts/fits/vae_tuned \
  -- vae.alpha=0.6 vae.hidden_dim=256 vae.n_epochs=400
```

Available backends: `baseline`, `latent`, `graph`, `vae`, `wgcna`.

Outputs:

- `modules.parquet`
- `edges.parquet`
- `traits.parquet`
- `feature_scores.parquet`
- `calibration.json` (when the backend emits calibration metadata — VAE, latent)
- `fit_config.json`

Default config values for all backends live in `configs/fit.yaml` and can be
overridden with Hydra syntax after `--`.

## `explain-module`

Explain one or more fitted modules at transcript-feature resolution.

```bash
isograph explain-module \
  --artifact-dir artifacts/fits/my_dataset \
  --feature-table features.parquet \
  --feature-meta feature_metadata.parquet \
  --module-ids M000 M001 \
  --output-dir artifacts/explain/run1
```

With plots and VAE decoder attribution:

```bash
isograph explain-module \
  --artifact-dir artifacts/fits/my_dataset \
  --feature-table features.parquet \
  --feature-meta feature_metadata.parquet \
  --plot --output-format png pdf \
  --vae-attribution \
  --output-dir artifacts/explain/run1
```

With Captum Integrated Gradients (requires `pip install isograph[torch-explain]`):

```bash
isograph explain-module \
  --artifact-dir artifacts/fits/my_dataset \
  --feature-table features.parquet \
  --feature-meta feature_metadata.parquet \
  --integrated-gradients --ig-n-steps 100 \
  --output-dir artifacts/explain/run1
```

With a structural annotation table (from `annotate-structure`):

```bash
isograph explain-module \
  --artifact-dir artifacts/fits/my_dataset \
  --feature-table features.parquet \
  --feature-meta feature_metadata.parquet \
  --annotation-table transcript_structure_annotations.tsv \
  --output-dir artifacts/explain/run1
```

**Inputs:**

- `--artifact-dir` must contain `modules.parquet` and `feature_scores.parquet`.
- `--feature-table`: Parquet with sample IDs as index (or `sample_id` column), feature IDs
  as columns.
- `--feature-meta`: Parquet or TSV with columns `feature_id`, `gene_id`, `feature_type`
  (required); `gene_name`, `transcript_id`, `exon_id`, `event_id` (optional).

**Outputs per module** in `{output_dir}/{module_id}/`:

- `gene_driver_table.parquet` — gene-level drivers sorted by |r|
- `transcript_polarity_table.parquet` — transcript-level correlations with `switch_strength`
- `high_vs_low_table.parquet` — mean usage contrast between high- and low-module samples
- `vae_drivers.parquet` — high-confidence VAE decoder attribution (with `--vae-attribution`)
- `ig_attributions.parquet` — per-feature IG scores (with `--integrated-gradients`)
- Plot files (`*.png`, `*.pdf`) when `--plot` is given

**Shared manifest:** `{output_dir}/module_explanation_manifest.json`

Key flags:

| Flag | Default | Description |
|---|---|---|
| `--module-ids` | all | Module IDs to explain |
| `--plot` | off | Write publication-ready plot files |
| `--output-format` | png | `png`, `pdf`, or both |
| `--annotation-table` | none | Structural annotation TSV from `annotate-structure` |
| `--vae-attribution` | off | VAE decoder Jacobian attribution (needs checkpoint) |
| `--vae-fdr-threshold` | 0.05 | FDR cutoff for high-confidence VAE drivers |
| `--vae-percentile-threshold` | 90.0 | `\|decoded_delta\|` percentile for VAE drivers |
| `--integrated-gradients` | off | Captum IG encoder attribution (needs checkpoint + captum) |
| `--ig-n-steps` | 50 | IG interpolation steps |
| `--ig-baseline` | zero | IG baseline: `zero` or `mean` |

## `annotate-structure`

Annotate transcript switch pairs with structural labels from a GTF file.

```bash
isograph annotate-structure \
  --gtf gencode.v47.annotation.gtf.gz \
  --switch-pairs switch_pairs.tsv \
  --output transcript_structure_annotations.tsv
```

Cache the parsed GTF to avoid re-parsing on repeated runs (raw GENCODE v47 ≈ 20 min on NFS;
cached ≈ seconds):

```bash
isograph annotate-structure \
  --gtf gencode.v47.annotation.gtf.gz \
  --switch-pairs switch_pairs.tsv \
  --gtf-cache gencode_v47_cache.parquet \
  --output transcript_structure_annotations.tsv
```

**Inputs:**

- `--gtf`: GTF or GTF.gz annotation file (GENCODE or Ensembl conventions supported).
- `--switch-pairs`: TSV with columns `gene_id`, `transcript_id_1`, `transcript_id_2`.

**Output** (`--output`): TSV with structural labels per switch pair:

| Label | Type | Description |
|---|---|---|
| `first_exon_changed` | bool | First exon differs between transcripts |
| `last_exon_changed` | bool | Last exon differs |
| `internal_exon_diff` | bool | Internal exon composition differs |
| `cds_changed` | bool | CDS coordinates differ |
| `utr_changed` | bool | UTR coordinates differ |
| `biotype_switch` | bool | Transcript biotype differs |
| `coding_status_change` | bool | One transcript is coding, the other is not |
| `tx_length_delta` | float | Transcript length difference (bp) |
| `cds_length_delta` | float | CDS length difference (bp) |
| `shared_exon_fraction` | float | Fraction of exons shared between the two transcripts |

Pass the output to `isograph explain-module --annotation-table` to merge these labels
into the driver tables.

## `compare`

Compare either two snapshot directories or two benchmark JSON reports.

Examples:

```bash
isograph compare \
  --reference snapshots/stage0_toy_v1_baseline_v1_seed0000 \
  --candidate artifacts/benchmarks/quickstart_baseline/toy_v1
```

```bash
isograph compare \
  --reference artifacts/reports/stage2_latent-benchmark.json \
  --candidate artifacts/reports/stage4_vae-benchmark.json
```

## `export`

Write a JSON summary of a prepared dataset bundle.

Example:

```bash
isograph export \
  --dataset-path benchmarks/datasets/core_v1/toy_v1 \
  --output-path artifacts/reports/toy_v1-summary.json
```