# Overview

IsoGraph is a research software package for gene-aware network analysis
of transcript-level RNA-seq data. The project is organized around reproducible datasets,
typed configurations, stable command-line entry points, and model backends that can be
compared on the same fixture suite.

## Documentation Map

- Use the `README.md` for project status, installation, and fast orientation.
- Use these reference docs for exact command behavior, data requirements, configuration
  fields, outputs, and Python APIs.
- Use the GitHub Wiki for tutorial-style walkthroughs, especially when preparing and
  analyzing your own data.

## Current Scope

IsoGraph currently includes:

- A deterministic **baseline** backend.
- A **latent** probabilistic backend (sklearn Factor Analysis + LedoitWolf partial correlation)
  with cross-validated component selection and stability selection support.
- A **graph-aware** backend extending the latent model with graph-Laplacian smoothing.
- A **VAE** backend — the default production backend — with nonlinear latent representation,
  early stopping, posterior-collapse diagnostics, and optional checkpointing. Requires PyTorch.
- A **WGCNA** backend wrapping R's `blockwiseModules` for direct comparison with WGCNA,
  including blockwise mode for datasets above 5 000 genes.
- Synthetic fixture suites: `core_v1` (24–800 genes), `scale_v1` (6 000–12 000 genes),
  and `multiplex_v1` — a typed abundance + switch suite with planted ground-truth roles
  (`switch_only`, `abundance_only`, `coupled`, `discordant`) across four fixtures
  (40–900 genes) plus an optional `xxlarge_multiplex_v1` at 12 000 genes.
- A real-data fixture freeze workflow for BrainSeq-style bulk RNA-seq inputs.
- **Multiplex network inference** (Stage 9A/9B) — each gene contributes a log-CPM
  abundance channel; multi-isoform genes also contribute a CLR-SVD switch channel.
  A typed feature graph with per-channel edge thresholds and auto-calibrated abundance
  threshold (`alpha_abundance_grid`) prevents spurious merging of switch modules.
  Gene channel role classification (`coupled`, `abundance_only`, `switch_only`,
  `discordant`) is reported in `module_gene_roles.parquet` for every fit.
- A **module explanation** module (`isograph.explain`) with `isograph explain-module` and
  `isograph annotate-structure` CLI subcommands for transcript-feature-level driver tables,
  publication-ready plots, VAE decoder attribution (Stage 8D), Captum Integrated Gradients
  encoder attribution (Stage 8E), and GTF-based structural annotation of switch pairs.
  Attribution outputs carry `feature_type` metadata so switch and abundance drivers are
  distinguished in multiplex fits.
- Multiplex stress reports for VAE, graph, latent, and WGCNA backends, including
  role-aware recall and giant-component diagnostics.