Own-Data Data Model
IsoGraph does not ingest arbitrary raw count folders directly. For custom analyses, first package your data as an IsoGraph dataset bundle with aligned metadata tables, feature tables, and dense matrices.
Minimum Practical Bundle
For custom-data work, build a bundle containing at least:
a
manifest.jsona sample table aligned to the matrix columns
a gene feature table
a transcript feature table with
transcript_idandgene_ida
transcript_countsdense matrix
For parity with the bundled fixtures and export tooling, include gene_counts and a gene
table as well. Add psi feature tables and matrices when you have splicing event data
you want to preserve in the bundle.
Required Alignment Rules
Rows of each feature table must align with the rows of its matching matrix.
Rows of the sample table must align with the columns of every matrix.
The transcript table must include
transcript_idandgene_id.A gene table should include
gene_id.Covariates or traits referenced by model configs, such as
Age,Dx,RIN,PMI,mito_mapping_rate, andpercent_assigned, must exist in the sample table if you want them used. Missing columns are skipped rather than inferred.
Building a Bundle
from pathlib import Path
import numpy as np
import pandas as pd
from isograph.io.artifacts import (
DatasetBundle,
build_feature_spec,
build_matrix_spec,
save_dataset_bundle,
)
from isograph.validation import DatasetManifest
sample_table = pd.DataFrame(
{
"sample_id": ["S1", "S2", "S3"],
"Age": [64.0, 59.0, 71.0],
"Dx": ["Control", "SCZD", "Control"],
}
)
gene_table = pd.DataFrame({"gene_id": ["G1", "G2"]})
transcript_table = pd.DataFrame(
{
"transcript_id": ["T1", "T2", "T3", "T4"],
"gene_id": ["G1", "G1", "G2", "G2"],
}
)
gene_counts = np.array(
[
[120.0, 80.0, 95.0],
[60.0, 110.0, 90.0],
]
)
transcript_counts = np.array(
[
[70.0, 50.0, 60.0],
[50.0, 30.0, 35.0],
[20.0, 55.0, 40.0],
[40.0, 55.0, 50.0],
]
)
manifest = DatasetManifest(
dataset_name="my_cohort_v1",
suite_name="custom",
description="Example custom cohort packaged for IsoGraph",
sample_table="samples.parquet",
feature_tables=[
build_feature_spec("gene", "genes.parquet", gene_table),
build_feature_spec("transcript", "transcripts.parquet", transcript_table),
],
matrices=[
build_matrix_spec("gene_counts", "gene_counts.npz", gene_counts),
build_matrix_spec("transcript_counts", "transcript_counts.npz", transcript_counts),
],
provenance={"source": "custom cohort"},
)
bundle = DatasetBundle(
manifest=manifest,
sample_table=sample_table,
feature_tables={
"gene": gene_table,
"transcript": transcript_table,
},
matrices={
"gene_counts": gene_counts,
"transcript_counts": transcript_counts,
},
truth_tables={},
)
save_dataset_bundle(bundle, Path("benchmarks/datasets/custom/my_cohort_v1"))
Running the VAE Backend from the CLI
VAE is the default backend for isograph fit:
isograph fit \
--dataset-path benchmarks/datasets/custom/my_cohort_v1 \
--output-dir artifacts/fits/my_cohort_v1
To use a different backend, pass --backend <name>:
isograph fit \
--dataset-path benchmarks/datasets/custom/my_cohort_v1 \
--backend baseline \
--output-dir artifacts/fits/my_cohort_v1_baseline
Running Backends from Python
from pathlib import Path
from isograph.io.artifacts import load_dataset_bundle
from isograph.models.latent import LatentNetworkModel
from isograph.workflow.config import LatentModelConfig
bundle = load_dataset_bundle(Path("benchmarks/datasets/custom/my_cohort_v1"))
model = LatentNetworkModel(LatentModelConfig(alpha=0.05, n_components=5))
artifacts = model.fit(
transcript_counts=bundle.matrices["transcript_counts"],
transcript_table=bundle.feature_tables["transcript"],
sample_table=bundle.sample_table,
)
print(artifacts.module_table.head())
print(artifacts.edge_table.head())
The same pattern applies to GraphNetworkModel and, with PyTorch installed,
VaeNetworkModel.