Enhanced multimodal fMRI brain encoding toolkit built on Meta's TRIBE v2
CortexLab extends TRIBE v2 with a reviewer-grade modality-lesion pipeline (causal ablation + permutation tests + BH-FDR), GPU voxelwise ridge regression (torch + Triton), HCP-MMP parcellation, noise-ceiling normalisation, three cortical-surface rendering engines (matplotlib / plotly+WebGL / pyvista+VTK), brain-alignment benchmarking with statistical testing, temporal dynamics analysis, ROI connectivity mapping, cognitive load scoring, and streaming inference. 283 tests under CI on every push, published on HuggingFace and PyPI, with an interactive dashboard.
| Feature | What it does |
|---|---|
| Modality-Lesion Pipeline | End-to-end causal ablation per modality on BOLD Moments: GPU voxelwise ridge encoder, row-permutation tests for per-voxel p-values, Benjamini-Hochberg FDR, HCP-MMP parcellation, BOLD Moments noise-ceiling normalisation. CLI orchestrator handles the full pipeline. |
| Cortical Surface Plots | Pluggable renderer with three engines: matplotlib (CPU, always available), plotly + WebGL (GPU), pyvista + VTK (TRIBE-quality smooth-shaded). Static 4-panel figures, rotating-brain GIF/MP4 animations, q-masked variants for FDR-corrected results. |
| GPU Voxelwise Ridge | torch and Triton backends for ridge regression at cortex scale (~327k voxels). 5-100x faster than scikit-learn on a single GPU. |
| Brain-Alignment Benchmark | Score any AI model (CLIP, DINOv2, V-JEPA2, SigLIP2, PaLiGemma2) on how "brain-like" its representations are, with permutation tests, bootstrap CIs, and FDR correction. |
| Foundation-Model Feature Extractors | Vision (CLIP, DINOv2, SigLIP2, V-JEPA2, PaLiGemma2) and text (CLIP-text, SigLIP2-text). Cache-friendly NPZ output that drops directly into the lesion pipeline. |
| Cognitive Load Scorer | Predict visual complexity, auditory demand, language processing, and executive load from brain activation patterns. |
| Temporal Dynamics | Analyze peak response latency per brain region, lag correlations, and sustained vs transient response decomposition. |
| ROI Connectivity | Compute functional connectivity matrices, cluster brain networks, and derive graph metrics (degree, betweenness, modularity). |
| Streaming Inference | Real-time sliding-window predictions from live feature streams for BCI pipelines. |
| Modality Attribution | Per-vertex importance scores revealing which modality (text/audio/video) drives each brain region. |
| Cross-Subject Adaptation | Adapt the model to new subjects with minimal calibration data via ridge regression. |
Brain alignment comparison across 4 AI models (synthetic benchmark):
=== Brain Alignment Comparison Results ===
clip-vit-b32:
rsa: +0.0407 (p=0.104, CI=[0.0109, 0.2032])
cka: +0.8561 (p=0.174, CI=[0.9025, 0.9367])
dinov2-vit-s:
rsa: -0.0052 (p=0.542, CI=[-0.0421, 0.1636])
cka: +0.8434 (p=0.403, CI=[0.8948, 0.9315])
vjepa2-vit-g:
rsa: +0.0121 (p=0.333, CI=[-0.0099, 0.1662])
cka: +0.8731 (p=0.438, CI=[0.9151, 0.9442])
llama-3.2-3b:
rsa: -0.0075 (p=0.642, CI=[-0.0257, 0.1445])
cka: +0.8848 (p=0.731, CI=[0.9217, 0.9493])
Run it yourself: python -m experiments.brain_alignment_comparison --config experiments/config/brain_alignment_comparison.yaml
The pretrained TRIBE v2 model uses LLaMA 3.2-3B as its text encoder. You must accept Meta's LLaMA license before using it:
- Visit llama.meta.com and accept the license
- Request access on HuggingFace
- Authenticate:
huggingface-cli login
# From PyPI
pip install cortexlab-toolkit
# From source (with analysis extras)
git clone https://github.com/siddhant-rajhans/cortexlab.git
cd cortexlab
pip install -e ".[analysis]"Optional extras:
[analysis]— scipy for statistical helpers[viz]— plotly + kaleido for the WebGL renderer (lighter-weight than[plotting])[plotting]— full brain-viz stack (nilearn, matplotlib, pyvista, scikit-image, mne)[training]— PyTorch Lightning, W&B, torchmetrics[streaming]— av for live video capture[dev]— pytest, ruff, coverage
from cortexlab.inference.predictor import TribeModel
model = TribeModel.from_pretrained("facebook/tribev2", device="auto")
events = model.get_events_dataframe(video_path="clip.mp4")
preds, segments = model.predict(events)from cortexlab.analysis import BrainAlignmentBenchmark
bench = BrainAlignmentBenchmark(brain_predictions, roi_indices=roi_indices)
result = bench.score_model(clip_features, method="rsa")
# Statistical testing
observed, p_value = bench.permutation_test(clip_features, method="rsa", n_permutations=1000)
score, ci_lower, ci_upper = bench.bootstrap_ci(clip_features, method="rsa")from cortexlab.analysis import CognitiveLoadScorer
scorer = CognitiveLoadScorer(roi_indices)
result = scorer.score_predictions(predictions)
# result.visual_complexity, result.auditory_demand, result.language_processing, result.executive_loadfrom cortexlab.analysis import TemporalDynamicsAnalyzer
analyzer = TemporalDynamicsAnalyzer(roi_indices, tr_seconds=1.0)
result = analyzer.analyze(predictions, model_features)
# result.peak_latencies, result.temporal_correlations, result.sustained_componentsfrom cortexlab.analysis import ROIConnectivityAnalyzer
conn = ROIConnectivityAnalyzer(roi_indices)
result = conn.analyze(predictions, n_clusters=4, threshold=0.3)
# result.correlation_matrix, result.clusters, result.graph_metricsfrom cortexlab.inference import StreamingPredictor
sp = StreamingPredictor(model._model, window_trs=40, step_trs=1, device="cuda")
for features in live_feature_stream():
pred = sp.push_frame(features)
if pred is not None:
visualize(pred) # (n_vertices,)# 1. Build feature cache once (vision + text features for all 1102 stimuli).
python -m experiments.build_feature_cache \
--cache-dir $CORTEXLAB_RESULTS/features \
--vision-preset dinov2-vit-l \
--text-preset clip-text-vit-l-14
# 2. Run the lesion pipeline with parcellation + ceiling + permutation + FDR.
python -m experiments.causal_modality_ablation \
--subjects 1 2 3 4 5 6 7 8 9 10 \
--data-root $CORTEXLAB_DATA/bold_moments \
--feature-cache $CORTEXLAB_RESULTS/features \
--modalities vision,text \
--parcellation hcp-mmp --lh-annot $ATLAS/lh.HCPMMP1.annot --rh-annot $ATLAS/rh.HCPMMP1.annot \
--noise-ceiling bold-moments \
--permutations 1000 --fdr \
--device cuda --backend torch \
--output $CORTEXLAB_RESULTS/lesion/$(date +%Y%m%d_%H%M%S)
# 3. Render TRIBE-quality cortical surface PNGs (and a rotating GIF).
python scripts/plot_cortical_maps.py --results-dir $CORTEXLAB_RESULTS/lesion/<run> --engine pyvista
python scripts/animate_cortical_maps.py --results-dir $CORTEXLAB_RESULTS/lesion/<run> --format gifsrc/cortexlab/
core/ Model (return_attn, gradient checkpointing, fp16, ONNX export)
data/ Dataset loading, HCP ROI utilities, parcellations, 4 fMRI studies
features/ Foundation-model extractors (CLIP, DINOv2, SigLIP2, V-JEPA2,
PaLiGemma2 vision; CLIP-text, SigLIP2-text)
gpu/ Voxelwise ridge encoder (torch + Triton backends)
training/ PyTorch Lightning pipeline (FSDP, W&B)
inference/ Predictor, streaming, modality attribution
analysis/ Brain alignment (RSA/CKA/Procrustes + stats), causal lesion,
noise ceiling, permutation tests, BH-FDR, cognitive load,
temporal dynamics, ROI connectivity
viz/ Cortical surface renderer (matplotlib / plotly+WebGL /
pyvista+VTK), brain-region visualization
experiments/ Lesion orchestrator, feature cache builder, alignment comparison
scripts/ Cortical surface plotting + animation, post-processing
A futuristic Streamlit dashboard with glassmorphism UI, 3D brain visualization, and live inference:
- 3D Brain Viewer: rotatable fsaverage brain with activation overlays, publication-quality 4-panel views
- Brain Alignment: scores with error bars, null distributions, RDM visualization, FDR correction
- Cognitive Load: timeline with confidence bands, dimension correlation, comparison mode
- Temporal Dynamics: raw timecourses, processing hierarchy, cross-ROI lag matrix
- Connectivity: partial correlation, dendrogram, modularity, network graph
- Live Inference: real-time brain prediction from webcam, screen capture, or video file
Try the Live Demo | Dashboard Repository
pip install -e ".[dev,plotting]"
pytest tests/ -v # 280 tests, 3 CUDA-gated
ruff check src/ tests/ # lintSee CHANGELOG.md for the release history.
See CONTRIBUTING.md. Check issues labeled "good first issue" to get started.
CC BY-NC 4.0 (inherited from TRIBE v2). See LICENSE and NOTICE.
This project is for non-commercial use only. The pretrained weights are hosted by Meta at facebook/tribev2 and are not redistributed by this project.
Built on TRIBE v2 by Meta FAIR.
d'Ascoli et al., "A foundation model of vision, audition, and language for in-silico neuroscience", 2026.
See NOTICE for full attribution and third-party licenses.
