PerturbNMF Pipeline

flowchart TD
    A["Input: counts.h5ad\n(cells x genes)"] --> B["Stage 1: Inference\n(sk-cNMF CPU or torch-cNMF GPU)"]
    B --> D["Output: cNMF_{K}_{thresh}.h5mu\n(MuData with scores + loadings)"]
    D --> E["Stage 2a: Metrics\n(9 metrics)"]
    E --> F["Output: Evaluation/{K}_{thresh}/\n(CSV results per metric)"]
    E --> G["Stage 2b: Perturbation Calibration\n(U-test, CRT, Matched DE)"]
    G --> F
    F --> I["Stage 3a: Plotting\n(K-selection, Program analysis, Perturbation analysis)"]
    I --> L["Output: PDFs + HTML report"]
    F --> S["Stage 3b: Excel Summarization"]
    S --> L
    F --> Q["Stage 3c: Annotation\n(LLM-driven gene program annotation)"]
    Q --> L
    M["Guide Annotation TSV"] --> E
    N["GWAS Data (OpenTargets)"] --> E
    O["Normalized Counts .h5ad"] --> E
    P["Reference GTF (optional)"] -.-> B

Detail requirement see: https://docs.google.com/document/d/1eusT8lUCeKl1lTkQ37qd8IoRy3P1798lSVOkpPbyGMU/edit?usp=sharing

Overview

End-to-end pipeline for running and evaluating (with visualization) consensus Non-negative Matrix Factorization (cNMF) on single-cell data with perturbation.

Components

Stage 1: Inference

Run cNMF to decompose the cell × gene matrix into gene programs. Pick one:

sk-cNMF: CPU-based implementation using scikit-learn
torch-cNMF: GPU-accelerated implementation using PyTorch

See src/Stage1_Inference/README.md for detailed usage and recommended K selection steps.

Stage 2: Evaluation

Evaluate the quality of inferred gene programs using comprehensive metrics, with perturbation calibration as part of the evaluation process.

Evaluation metrics:

Categorical association analysis
Perturbation sensitivity testing (default U-test)
Motif enrichment
Trait enrichment analysis (GWAS/OpenTargets)
GO geneset enrichment analysis
geneset enrichment analysis
Explained variance calculation
Reconstruction error
Stability metrics

See src/Stage2_Evaluation/A_Metrics/README.md for detailed parameters and output format.

Perturbation calibration (pick one method):

U-test: Fast, non-parametric — good for initial exploratory analysis
CRT: Permutation-based, covariate-adjusted — more statistically rigorous
Matched Cell DE: Permutation-based, covariate-adjusted — more statistically rigorous

Calibration validates that p-value calculations are well-calibrated by generating a null distribution from non-targeting guides:

Generate fake p-values by randomly selecting non-targeting guides as targeting, then perform perturbation testing
The fake p-values vs uniform distribution QQ-plot should align on the diagonal
The real p-values vs uniform distribution QQ-plot should show enrichment (rarer than expected)
If calibrated → proceed to downstream analysis. If not → change the p-value calculation method or use different covariate.

See src/Stage2_Evaluation/B_Calibration/README.md for detailed method descriptions and guidance on choosing a test.

Stage 3: Interpretation

K-selection plots for optimal K selection
Program plots for per-program quality control
Perturbation plots visualization
Excel summarization of results
Annotation: LLM-driven gene program annotation (PubTator3 literature mining, verfiication of LLM generated contents)

See src/Stage3_Interpretation/README.md for detailed parameters and output format.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.claude/skills		.claude/skills
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
setup_resources.sh		setup_resources.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PerturbNMF Pipeline

Overview

Components

Stage 1: Inference

Stage 2: Evaluation

Stage 3: Interpretation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PerturbNMF Pipeline

Overview

Components

Stage 1: Inference

Stage 2: Evaluation

Stage 3: Interpretation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages