Multiplex and Stress Benchmarks

Isoform switching and gene expression can change independently. A gene that doubles its expression without changing its dominant isoform is an abundance event; a gene that redistributes transcript usage without changing total expression is a switching event. Both types of co-regulation occur in real RNA-seq data, and treating them as a single signal conflates distinct biological mechanisms.

IsoGraph separates these signals by constructing two feature channels per gene: an abundance channel (log-CPM–standardized gene counts, available for all genes) and a switch channel (CLR-SVD transcript usage coordinate, available for multi-isoform genes). A typed feature graph with per-channel edge thresholds and an auto-calibrated abundance threshold connects genes through whichever channels are active. The multiplex_v1 benchmark suite provides planted ground-truth modules with known role assignments to validate that IsoGraph recovers each type.

The benchmark runner reports overall planted-module recovery plus role-aware recall for:

  • switch_only — genes whose module membership is driven by switching, not expression

  • abundance_only — genes driven by expression level changes only

  • coupled — both channels active and positively correlated (r ≥ 0.2)

  • discordant — both channels active but anticorrelated (r < −0.2)

Standard Multiplex Suite

The multiplex_v1 suite contains four regular fixtures:

Fixture

Genes

Samples

Planted modules

toy_multiplex_v1

40

64

2

medium_multiplex_v1

320

180

6

noisy_multiplex_v1

360

110

6

large_multiplex_v1

900

140

10

Run the tuned Stage 9 backends:

isograph benchmark --config-name stage9_multiplex_vae
isograph benchmark --config-name stage9_multiplex_graph
isograph benchmark --config-name stage9_multiplex_latent
isograph benchmark --config-name stage9_multiplex_wgcna

Aggregate stress reports against WGCNA:

python scripts/stress_multiplex_summary.py

The summary is written to artifacts/reports/stress-multiplex-backend-summary.json.

Explain Compatibility

Explain modules work on multiplex artifacts. The attribution outputs preserve feature_id and feature_type so a gene can be explained through its switch channel, abundance channel, or both. The stress helper evaluates explain accuracy across the standard multiplex stress artifacts:

python scripts/stress_multiplex_explain.py

The helper writes artifacts/explain/stress_multiplex/stress_multiplex_explain_metrics.json.

Extra-Large Multiplex Stress Fixture

The 12k-gene fixture is intentionally generated only when requested by name so routine test and benchmark runs do not pay its cost.

Fixture

Genes

Samples

Planted modules

xxlarge_multiplex_v1

12,000

240

16

Run VAE and WGCNA:

isograph benchmark --config-name stress_multiplex_xxlarge_vae
isograph benchmark --config-name stress_multiplex_xxlarge_wgcna

The current stress reports show:

Backend

Detected modules

Recovery

Runtime seconds

Notes

VAE

16

0.9266666666666667

7554.699150358792

Correct module count; no posterior collapse

WGCNA

1898

0.9191666666666665

877.2644422741141

Severe over-segmentation

VAE role-aware recall on xxlarge_multiplex_v1 was 1.0 for switch_only, 1.0 for coupled, 1.0 for discordant, and 0.725 for abundance_only.

Understanding Role-Aware Recall

Role-aware recall measures the fraction of planted genes in each role that IsoGraph assigns to the correct module. The calibration gates for VAE on the standard multiplex suite are:

Fixture

Backend

Overall recovery

abundance_only recall

switch_only recall

toy_multiplex_v1

VAE

≥ 0.40

≥ 0.90

≥ 0.90

medium_multiplex_v1

VAE

≥ 0.895

≥ 0.90

≥ 0.90

noisy_multiplex_v1

VAE

≥ 0.883

≥ 0.90

≥ 0.90

large_multiplex_v1

VAE

≥ 0.896

≥ 0.90

≥ 0.90

Why giant-component fraction matters. A well-calibrated run produces modules that are clearly separated in the gene graph. When the abundance threshold is too permissive, many genes connect into one giant component — WGCNA shows this pattern at 12k scale (1898 detected modules vs. 16 planted). Check giant_component_fraction < 0.05 as a sanity metric; it is reported in the benchmark JSON.

What low recall means. If abundance_only recall is low (< 0.70) on your real data, the abundance threshold may be merging abundance modules with switch modules. Lower alpha_abundance or reduce the alpha_abundance_grid range. If switch_only recall is low, the switch threshold may be too strict; lower alpha_switch.

Artifact Policy

Generated dataset bundles and per-fixture benchmark directories are reproducible and ignored by git. Commit compact evidence as JSON reports under artifacts/reports/.