isograph.evaluation.selection

Alpha selection via stability selection for real data without ground truth.

Stability selection estimates how reproducibly each gene-gene edge appears when a model is refit on repeated random subsamples of the data. Edges that appear consistently across subsamples are considered stable.

The implementation follows a simple loop:

  • iterate over a user-supplied alpha_grid

  • refit the model on repeated subsamples for each alpha

  • count how often each edge appears

  • report the number of stable edges per alpha

IsoGraph reports the coarsest alpha that still yields at least one stable edge as recommended_alpha. Lower alpha values yield denser networks; higher values yield sparser ones.

class isograph.evaluation.selection.StabilityResult(alpha_grid, stable_edge_counts, recommended_alpha, edge_stability)

Results from a stability selection run.

Parameters:
  • alpha_grid (list[float])

  • stable_edge_counts (list[int])

  • recommended_alpha (float)

  • edge_stability (dict[float, dict[frozenset, float]])

alpha_grid: list[float]
stable_edge_counts: list[int]
recommended_alpha: float
edge_stability: dict[float, dict[frozenset, float]]
summary_table()
Return type:

DataFrame

isograph.evaluation.selection.stability_selection(model, transcript_counts, transcript_table, sample_table, alpha_grid, gene_counts=None, gene_table=None, n_iterations=50, subsample_fraction=0.8, stability_threshold=0.6, seed=0)

Estimate edge stability across subsamples for each alpha in alpha_grid.

Parameters:
  • model (NetworkModel) – A fitted (or unfitted) NetworkModel instance. The model’s config will be temporarily patched with each alpha from the grid.

  • transcript_counts (ndarray) – Shape (n_transcripts, n_samples). Raw count matrix.

  • transcript_table (DataFrame) – Rows describe each transcript. Must contain gene_id and transcript_id columns.

  • sample_table (DataFrame) – One row per sample. Must be aligned with columns of transcript_counts.

  • alpha_grid (list[float]) – Sorted (ascending) list of partial-correlation threshold values to test.

  • n_iterations (int) – Number of subsampling rounds per alpha. Higher values give more stable estimates; 50 is sufficient for most datasets.

  • subsample_fraction (float) – Fraction of samples to draw per round (without replacement).

  • stability_threshold (float) – Minimum fraction of rounds in which a gene pair must appear as an edge to be counted as a stable edge.

  • seed (int) – Random seed for reproducibility.

  • gene_counts (ndarray | None)

  • gene_table (DataFrame | None)

Returns:

Contains stable-edge counts per alpha and a recommended alpha.

Return type:

StabilityResult