# Overview IsoGraph is a research software package for gene-aware network analysis of transcript-level RNA-seq data. The project is organized around reproducible datasets, typed configurations, stable command-line entry points, and model backends that can be compared on the same fixture suite. ## Documentation Map - Use the `README.md` for project status, installation, and fast orientation. - Use these reference docs for exact command behavior, data requirements, configuration fields, outputs, and Python APIs. - Use the GitHub Wiki for tutorial-style walkthroughs, especially when preparing and analyzing your own data. ## Current Scope IsoGraph currently includes: - A deterministic **baseline** backend. - A **latent** probabilistic backend (sklearn Factor Analysis + LedoitWolf partial correlation) with cross-validated component selection and stability selection support. - A **graph-aware** backend extending the latent model with graph-Laplacian smoothing. - A **VAE** backend — the default production backend — with nonlinear latent representation, early stopping, posterior-collapse diagnostics, and optional checkpointing. Requires PyTorch. - A **WGCNA** backend wrapping R's `blockwiseModules` for direct comparison with WGCNA, including blockwise mode for datasets above 5 000 genes. - Synthetic fixture suites: `core_v1` (24–800 genes), `scale_v1` (6 000–12 000 genes), and `multiplex_v1` — a typed abundance + switch suite with planted ground-truth roles (`switch_only`, `abundance_only`, `coupled`, `discordant`) across four fixtures (40–900 genes) plus an optional `xxlarge_multiplex_v1` at 12 000 genes. - A real-data fixture freeze workflow for BrainSeq-style bulk RNA-seq inputs. - **Multiplex network inference** (Stage 9A/9B) — each gene contributes a log-CPM abundance channel; multi-isoform genes also contribute a CLR-SVD switch channel. A typed feature graph with per-channel edge thresholds and auto-calibrated abundance threshold (`alpha_abundance_grid`) prevents spurious merging of switch modules. Gene channel role classification (`coupled`, `abundance_only`, `switch_only`, `discordant`) is reported in `module_gene_roles.parquet` for every fit. - A **module explanation** module (`isograph.explain`) with `isograph explain-module` and `isograph annotate-structure` CLI subcommands for transcript-feature-level driver tables, publication-ready plots, VAE decoder attribution (Stage 8D), Captum Integrated Gradients encoder attribution (Stage 8E), and GTF-based structural annotation of switch pairs. Attribution outputs carry `feature_type` metadata so switch and abundance drivers are distinguished in multiplex fits. - Multiplex stress reports for VAE, graph, latent, and WGCNA backends, including role-aware recall and giant-component diagnostics.