# Multiplex and Stress Benchmarks

Isoform switching and gene expression can change independently. A gene that doubles its
expression without changing its dominant isoform is an abundance event; a gene that
redistributes transcript usage without changing total expression is a switching event.
Both types of co-regulation occur in real RNA-seq data, and treating them as a single
signal conflates distinct biological mechanisms.

IsoGraph separates these signals by constructing two feature channels per gene: an
**abundance channel** (log-CPM–standardized gene counts, available for all genes) and a
**switch channel** (CLR-SVD transcript usage coordinate, available for multi-isoform
genes). A typed feature graph with per-channel edge thresholds and an auto-calibrated
abundance threshold connects genes through whichever channels are active. The `multiplex_v1`
benchmark suite provides planted ground-truth modules with known role assignments to
validate that IsoGraph recovers each type.

The benchmark runner reports overall planted-module recovery plus role-aware recall for:

- `switch_only` — genes whose module membership is driven by switching, not expression
- `abundance_only` — genes driven by expression level changes only
- `coupled` — both channels active and positively correlated (r ≥ 0.2)
- `discordant` — both channels active but anticorrelated (r < −0.2)

## Standard Multiplex Suite

The `multiplex_v1` suite contains four regular fixtures:

| Fixture | Genes | Samples | Planted modules |
|---|---:|---:|---:|
| `toy_multiplex_v1` | 40 | 64 | 2 |
| `medium_multiplex_v1` | 320 | 180 | 6 |
| `noisy_multiplex_v1` | 360 | 110 | 6 |
| `large_multiplex_v1` | 900 | 140 | 10 |

Run the tuned Stage 9 backends:

```bash
isograph benchmark --config-name stage9_multiplex_vae
isograph benchmark --config-name stage9_multiplex_graph
isograph benchmark --config-name stage9_multiplex_latent
isograph benchmark --config-name stage9_multiplex_wgcna
```

Aggregate stress reports against WGCNA:

```bash
python scripts/stress_multiplex_summary.py
```

The summary is written to `artifacts/reports/stress-multiplex-backend-summary.json`.

## Explain Compatibility

Explain modules work on multiplex artifacts. The attribution outputs preserve
`feature_id` and `feature_type` so a gene can be explained through its switch channel,
abundance channel, or both. The stress helper evaluates explain accuracy across the
standard multiplex stress artifacts:

```bash
python scripts/stress_multiplex_explain.py
```

The helper writes `artifacts/explain/stress_multiplex/stress_multiplex_explain_metrics.json`.

## Extra-Large Multiplex Stress Fixture

The 12k-gene fixture is intentionally generated only when requested by name so routine
test and benchmark runs do not pay its cost.

| Fixture | Genes | Samples | Planted modules |
|---|---:|---:|---:|
| `xxlarge_multiplex_v1` | 12,000 | 240 | 16 |

Run VAE and WGCNA:

```bash
isograph benchmark --config-name stress_multiplex_xxlarge_vae
isograph benchmark --config-name stress_multiplex_xxlarge_wgcna
```

The current stress reports show:

| Backend | Detected modules | Recovery | Runtime seconds | Notes |
|---|---:|---:|---:|---|
| VAE | 16 | 0.9266666666666667 | 7554.699150358792 | Correct module count; no posterior collapse |
| WGCNA | 1898 | 0.9191666666666665 | 877.2644422741141 | Severe over-segmentation |

VAE role-aware recall on `xxlarge_multiplex_v1` was `1.0` for `switch_only`, `1.0`
for `coupled`, `1.0` for `discordant`, and `0.725` for `abundance_only`.

## Understanding Role-Aware Recall

Role-aware recall measures the fraction of planted genes in each role that IsoGraph
assigns to the correct module. The calibration gates for VAE on the standard multiplex
suite are:

| Fixture | Backend | Overall recovery | `abundance_only` recall | `switch_only` recall |
|---|---|---:|---:|---:|
| `toy_multiplex_v1` | VAE | ≥ 0.40 | ≥ 0.90 | ≥ 0.90 |
| `medium_multiplex_v1` | VAE | ≥ 0.895 | ≥ 0.90 | ≥ 0.90 |
| `noisy_multiplex_v1` | VAE | ≥ 0.883 | ≥ 0.90 | ≥ 0.90 |
| `large_multiplex_v1` | VAE | ≥ 0.896 | ≥ 0.90 | ≥ 0.90 |

**Why giant-component fraction matters.** A well-calibrated run produces modules that are
clearly separated in the gene graph. When the abundance threshold is too permissive, many
genes connect into one giant component — WGCNA shows this pattern at 12k scale (1898
detected modules vs. 16 planted). Check `giant_component_fraction < 0.05` as a sanity
metric; it is reported in the benchmark JSON.

**What low recall means.** If `abundance_only` recall is low (< 0.70) on your real data,
the abundance threshold may be merging abundance modules with switch modules. Lower
`alpha_abundance` or reduce the `alpha_abundance_grid` range. If `switch_only` recall is
low, the switch threshold may be too strict; lower `alpha_switch`.

## Artifact Policy

Generated dataset bundles and per-fixture benchmark directories are reproducible and
ignored by git. Commit compact evidence as JSON reports under `artifacts/reports/`.