Own-Data Data Model

IsoGraph does not ingest arbitrary raw count folders directly. For custom analyses, first package your data as an IsoGraph dataset bundle with aligned metadata tables, feature tables, and dense matrices.

Minimum Practical Bundle

For custom-data work, build a bundle containing at least:

  • a manifest.json

  • a sample table aligned to the matrix columns

  • a gene feature table

  • a transcript feature table with transcript_id and gene_id

  • a transcript_counts dense matrix

For parity with the bundled fixtures and export tooling, include gene_counts and a gene table as well. Add psi feature tables and matrices when you have splicing event data you want to preserve in the bundle.

Required Alignment Rules

  • Rows of each feature table must align with the rows of its matching matrix.

  • Rows of the sample table must align with the columns of every matrix.

  • The transcript table must include transcript_id and gene_id.

  • A gene table should include gene_id.

  • Covariates or traits referenced by model configs, such as Age, Dx, RIN, PMI, mito_mapping_rate, and percent_assigned, must exist in the sample table if you want them used. Missing columns are skipped rather than inferred.

Building a Bundle

from pathlib import Path

import numpy as np
import pandas as pd

from isograph.io.artifacts import (
    DatasetBundle,
    build_feature_spec,
    build_matrix_spec,
    save_dataset_bundle,
)
from isograph.validation import DatasetManifest

sample_table = pd.DataFrame(
    {
        "sample_id": ["S1", "S2", "S3"],
        "Age": [64.0, 59.0, 71.0],
        "Dx": ["Control", "SCZD", "Control"],
    }
)

gene_table = pd.DataFrame({"gene_id": ["G1", "G2"]})
transcript_table = pd.DataFrame(
    {
        "transcript_id": ["T1", "T2", "T3", "T4"],
        "gene_id": ["G1", "G1", "G2", "G2"],
    }
)

gene_counts = np.array(
    [
        [120.0, 80.0, 95.0],
        [60.0, 110.0, 90.0],
    ]
)
transcript_counts = np.array(
    [
        [70.0, 50.0, 60.0],
        [50.0, 30.0, 35.0],
        [20.0, 55.0, 40.0],
        [40.0, 55.0, 50.0],
    ]
)

manifest = DatasetManifest(
    dataset_name="my_cohort_v1",
    suite_name="custom",
    description="Example custom cohort packaged for IsoGraph",
    sample_table="samples.parquet",
    feature_tables=[
        build_feature_spec("gene", "genes.parquet", gene_table),
        build_feature_spec("transcript", "transcripts.parquet", transcript_table),
    ],
    matrices=[
        build_matrix_spec("gene_counts", "gene_counts.npz", gene_counts),
        build_matrix_spec("transcript_counts", "transcript_counts.npz", transcript_counts),
    ],
    provenance={"source": "custom cohort"},
)

bundle = DatasetBundle(
    manifest=manifest,
    sample_table=sample_table,
    feature_tables={
        "gene": gene_table,
        "transcript": transcript_table,
    },
    matrices={
        "gene_counts": gene_counts,
        "transcript_counts": transcript_counts,
    },
    truth_tables={},
)

save_dataset_bundle(bundle, Path("benchmarks/datasets/custom/my_cohort_v1"))

Running the VAE Backend from the CLI

VAE is the default backend for isograph fit:

isograph fit \
  --dataset-path benchmarks/datasets/custom/my_cohort_v1 \
  --output-dir artifacts/fits/my_cohort_v1

To use a different backend, pass --backend <name>:

isograph fit \
  --dataset-path benchmarks/datasets/custom/my_cohort_v1 \
  --backend baseline \
  --output-dir artifacts/fits/my_cohort_v1_baseline

Running Backends from Python

from pathlib import Path

from isograph.io.artifacts import load_dataset_bundle
from isograph.models.latent import LatentNetworkModel
from isograph.workflow.config import LatentModelConfig

bundle = load_dataset_bundle(Path("benchmarks/datasets/custom/my_cohort_v1"))

model = LatentNetworkModel(LatentModelConfig(alpha=0.05, n_components=5))
artifacts = model.fit(
    transcript_counts=bundle.matrices["transcript_counts"],
    transcript_table=bundle.feature_tables["transcript"],
    sample_table=bundle.sample_table,
)

print(artifacts.module_table.head())
print(artifacts.edge_table.head())

The same pattern applies to GraphNetworkModel and, with PyTorch installed, VaeNetworkModel.