simonsobs · liuyiqiandrew · Apr 14, 2026 · Apr 14, 2026 · Apr 14, 2026 · Apr 14, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,12 @@
-bbpower/__pycache__/
-test/test_out/
+__pycache__/
+*.py[cod]
+*.egg-info/
+.pytest_cache/
+
+test/test_out*/
+
+.conda-py313/
+.claude/
+
+.nfs*
 *~
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,83 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+BBPower is a power-spectrum-based component separation pipeline for constraining primordial B-modes from multi-frequency CMB polarization data. It is built on the [BBPipe](https://github.com/simonsobs/BBPipe) framework, which defines pipeline stages with typed inputs/outputs and connects them via YAML configuration.
+
+## Build and Install
+
+```bash
+pip install -e .                  # core dependencies only
+pip install -e ".[all]"           # includes healpy, pymaster, fgbuster, getdist
+```
+
+The package uses `pyproject.toml` (setuptools backend). There is no Makefile or CI configuration.
+
+## Running Pipeline Stages
+
+Each stage is invoked via `python -m bbpower <StageName>` with explicit `--input=path` and `--config=path` arguments. There is no single `bbpipe` orchestration command used in tests; instead, stages are run individually in shell scripts.
+
+```bash
+# Example: run the sampling test end-to-end
+bash test/run_sampling_test.sh
+
+# Example: run a single stage
+python -m bbpower BBCompSep \
+  --cells_coadded=./test/test_out/cls_coadd.fits \
+  --cells_noise=./test/test_out/cls_noise.fits \
+  --cells_fiducial=./test/test_out/cls_fid.fits \
+  --cells_coadded_cov=./test/test_out/cls_coadd.fits \
+  --output_dir=./test/test_out \
+  --config_copy=./test/test_out/config_copy.yml \
+  --config=./test/test_config_sampling.yml
+```
+
+## Tests
+
+Tests are shell scripts in `test/` (no pytest). Each script generates synthetic data, runs one or more stages, checks for expected output files, then cleans up `test/test_out/`.
+
+| Script | What it tests |
+|---|---|
+| `run_sampling_test.sh` | BBCompSep (maximum_likelihood) + BBPlotter |
+| `run_power_specter_test.sh` | Full pipeline: BBPowerSpecter -> BBPowerSummarizer -> BBCompSep -> BBPlotter |
+| `run_compsep_test.sh` | BBCompSep single-point chi2 check |
+| `run_predicted_spectra_test.sh` | BBCompSep predicted spectra output |
+| `run_polychord_test.sh` | Full pipeline with PolyChord sampler |
+
+Most tests require generating 100 simulated maps first (`run_power_specter_test.sh`, `run_polychord_test.sh`), which is slow. The fastest test to validate basic functionality is `run_sampling_test.sh`.
+
+## Architecture
+
+### Pipeline Stage System
+
+All stages inherit from `bbpipe.PipelineStage` and declare typed `inputs`, `outputs`, and `config_options`. The stage registry lives in `bbpower/_stages.py` which maps stage names to modules. Stages are lazy-loaded via `__init__.py`'s `__getattr__`. File types are defined in `bbpower/types.py` (FitsFile, TextFile, NpzFile, DirFile, etc.).
+
+The four pipeline stages, in execution order:
+
+1. **BBPowerSpecter** (`power_specter.py`) - Computes all cross-frequency/split/polarization power spectra from maps using NaMaster (`pymaster`). Produces SACC-format output with bandpower windows.
+2. **BBPowerSummarizer** (`power_summarizer.py`) - Coadds split spectra, computes noise spectra (total minus cross-only), builds null tests, and estimates covariance matrices from simulations.
+3. **BBCompSep** (`compsep.py`) - Foreground-cleaning likelihood analysis. Supports multiple samplers (emcee, PolyChord, Fisher, maximum_likelihood, single_point). Uses `ParameterManager` for prior handling and `FGModel` for foreground SEDs/spectra. Likelihood evaluation is in `likelihood.py` (`Likelihood` class) and sampler backends are in `samplers.py` (dispatched via `SAMPLERS` dict).
+4. **BBPlotter** (`plotter.py`) - Generates an HTML page with diagnostic plots (bandpasses, coadded spectra, nulls, likelihood contours via getdist).
+
+### Key Internal Modules
+
+- **`likelihood.py`** (`Likelihood`) - Wraps the model function and data to compute chi-squared or Hamimeche & Lewis likelihood values. Used by all samplers.
+- **`samplers.py`** - Standalone sampler backend functions (`run_emcee`, `run_polychord`, `run_minimizer`, `run_fisher`, `run_singlepoint`, `run_timing`, `run_predicted_spectra`). Registered in `SAMPLERS` dict and dispatched from `BBCompSep.run()`.
+- **`param_manager.py`** (`ParameterManager`) - Parses YAML config to separate fixed vs. free parameters, builds prior functions (tophat/Gaussian), and maps flat parameter vectors back to named dictionaries.
+- **`fg_model.py`** (`FGModel`) - Loads foreground SED models from `fgbuster` and power spectrum models from `fgcls.py`. Handles cross-component correlations, decorrelation, and moment expansion.
+- **`fgcls.py`** - Symbolic power spectrum models using `sympy`. `ClAnalytic` parses string expressions into lambdified numpy functions. `ClPowerLaw` is the standard foreground Cl template.
+- **`bandpasses.py`** (`Bandpass`) - Bandpass convolution with SED models, including systematics (frequency shift, gain, polarization angle rotation, frequency-dependent birefringence).
+
+### Data Flow
+
+All inter-stage data uses [SACC](https://github.com/LSSTDESC/sacc) format (`.fits` files) for power spectra, covariances, tracers (bandpasses + beams), and bandpower windows. Configuration is passed via YAML files with a `global` section and per-stage sections (e.g., `BBCompSep:`).
+
+### Configuration Structure
+
+Config YAML files have two layers:
+- **Pipeline file** (e.g., `test/test_sampling.yml`): declares stages, inputs, and output directories for BBPipe orchestration.
+- **Stage config file** (e.g., `test/test_config_sampling.yml`): contains `global` parameters (nside, compute_dell) and per-stage blocks defining the CMB model, foreground model (components with SEDs, Cl templates, priors), and sampler settings.
+
+Parameter definitions in config follow the pattern: `param_name: ['internal_name', 'prior_type', [prior_args]]` where prior_type is `'fixed'`, `'tophat'`, or `'gaussian'`.
diff --git a/README.md b/README.md
@@ -1,21 +1,232 @@
-BBPower - the C_ell-based pipeline for BB
------------------------------------------
+# BBPower
 
-This repo hosts a pipeline that carries out a maps-to-params analysis of multi-frequency polarization data to constrain primordial B-modes using a power-spectrum-based component separation scheme. The pipeline is built following the BBPipe framework. [BBPipe](https://github.com/simonsobs/BBPipe) is a pipeline constructor used to connect different pipeline stages in terms of their outputs and required inputs, and it's one of BBPower's dependencies.
+Power-spectrum-based component separation pipeline for constraining primordial B-modes from multi-frequency CMB polarization data.
 
-### Dependencies
-You should install the following non-standard python packages in order to use BBPower:
-- [BBPipe](https://github.com/simonsobs/BBPipe)
-- [sacc](https://pypi.org/project/sacc/)
-- [fgbuster](https://github.com/fgbuster)
-- [emcee](https://pypi.org/project/emcee/)
+BBPower performs a maps-to-parameters analysis: it computes cross-frequency bandpower spectra from HEALPix maps, coadds splits, estimates covariances from simulations, fits a parametric foreground + CMB model, and produces diagnostic plots. The pipeline is built on the [BBPipe](https://github.com/simonsobs/BBPipe) framework.
 
-### Using the code
-First of all, have a look at the BBPipe documentation to get a broad idea of how the pipeline structure works. Then, have a look at one of the test suites in the `test` directory. For instance, a quick pipeline that takes calculated power spectra, runs an MCMC on them and creates a bunch of plots, is contained in the following files:
-- [test/test_sampling.yml](test/test_sampling.yml) describes the raw inputs of the pipeline (in this case, multi-frequency power spectra) and its stages (in this case, two stages, a likelihood sampling stage and a plotting stage).
-- [test/test_config_sampling.yml](test/test_config_sampling.yml) describes the configuration options for the different stages. This includes, for instance, the cosmological and foreground model, parameter priors, sampler options etc. All the possible options are thoroughly described there.
-- [test/run_sampling_test.sh](test/run_sampling_test.sh) contains the commands that would be needed to run this pipeline. The commands of the form `python -m bbpower ...` are output by `BBPipe` if you just run `bbpipe ./test/test_sampling.yml --dry-run`.
+## Installation
 
-### Credits and questions
-Get in touch with Max Abitbol (mabitbol), David Alonso (damonge) or anyone else in the SO BB AWG if you have questions or queries about the code.
+Create and activate a conda environment first, then install the extra dependencies that match the stages you want to run.
 
+```bash
+conda create -n bbpower -c conda-forge python=3.13 pip setuptools wheel
+conda activate bbpower
+python -m pip install --upgrade pip
+```
+
+### Choose the right install
+
+| Workflow | Stages covered | Install command |
+|---|---|---|
+| Shared code + lightweight utilities | Package import, config parsing, `BBPowerSummarizer` | `pip install -e .` |
+| Spectra-to-parameters | `BBCompSep` | `pip install -e ".[compsep]"` |
+| Spectra-to-parameters + plots | `BBCompSep`, `BBPlotter` | `pip install -e ".[compsep,plotting]"` |
+| Moment models or Fisher runs | `BBCompSep` with `use_moments: true` or `sampler: fisher` | `pip install -e ".[compsep,plotting,sampling]"` |
+| Full maps-to-parameters pipeline | All four stages | `pip install -e ".[all]"` |
+
+Notes:
+- `BBCompSep` always needs `fgbuster`, so `pip install -e .` by itself is **not** enough for component separation.
+- Moment-expanded foreground models need `pyshtools` because `BBCompSep` uses Wigner 3-j symbols for the moment terms.
+- `BBPlotter` can generate spectra plots without MCMC contours, but triangle plots require `getdist`.
+- `BBPowerSpecter` and the full maps-to-parameters workflow need `healpy` and NaMaster (`pymaster`). Prefer conda-forge binaries for these heavy dependencies: `conda install -c conda-forge healpy namaster`.
+- `sampler: polychord` requires a separate PolyChord installation; it is not installed by the package extras.
+
+```bash
+# Quick checks
+python -m bbpower --help
+python -c "import bbpower; print(bbpower.__file__)"
+```
+
+Requires Python >= 3.10. See [pyproject.toml](pyproject.toml) for the full dependency list.
+See [docs/setup.md](docs/setup.md) for a setup checklist, stage-by-stage dependency guide, and troubleshooting notes.
+
+## Quick Start
+
+The fastest way to run BBPower is on pre-computed power spectra (skipping the map-level stages):
+
+```bash
+# 1. Generate synthetic Simons Observatory bandpowers
+mkdir -p output
+python examples/generate_SO_spectra.py output
+
+# 2. Run component separation (maximum-likelihood fit)
+python -m bbpower BBCompSep \
+  --cells_coadded=output/cls_coadd.fits \
+  --cells_noise=output/cls_noise.fits \
+  --cells_fiducial=output/cls_fid.fits \
+  --cells_coadded_cov=output/cls_coadd.fits \
+  --output_dir=output \
+  --config_copy=output/config_copy.yml \
+  --config=test/test_config_sampling_legacy.yml
+
+# 3. Generate diagnostic plots
+python -m bbpower BBPlotter \
+  --cells_coadded_total=output/cls_coadd.fits \
+  --cells_coadded=output/cls_coadd.fits \
+  --cells_noise=output/cls_noise.fits \
+  --cells_null=output/cls_coadd.fits \
+  --cells_fiducial=output/cls_fid.fits \
+  --param_chains=output/chi2.npz \
+  --plots=output/plots.dir \
+  --plots_page=output/plots_page.html \
+  --config=test/test_config_sampling_legacy.yml
+```
+
+## Pipeline Stages
+
+BBPower has four stages that run in sequence. Each reads typed inputs and produces typed outputs in [SACC](https://github.com/LSSTDESC/sacc) format.
+
+| Stage | Module | Purpose |
+|---|---|---|
+| **BBPowerSpecter** | `power_specter.py` | Compute cross-frequency bandpower spectra from HEALPix Q/U maps using NaMaster |
+| **BBPowerSummarizer** | `power_summarizer.py` | Coadd splits, compute noise/null spectra, estimate covariances from simulations |
+| **BBCompSep** | `compsep.py` | Fit a parametric CMB + foreground model to the bandpowers |
+| **BBPlotter** | `plotter.py` | Generate diagnostic plots and an HTML summary page |
+
+You can run the full pipeline (maps to parameters) or enter at any stage with pre-computed inputs. See [docs/architecture.md](docs/architecture.md) for the data flow and module interactions.
+
+### Common entry points
+
+| If you already have... | Start at | Required main files |
+|---|---|---|
+| HEALPix Q/U maps, mask, beams, bandpasses | `BBPowerSpecter` | `splits_list`, `masks_apodized`, `bandpasses_list`, `beams_list`, `sims_list` |
+| Split-level SACC spectra from maps/sims | `BBPowerSummarizer` | `cells_all_splits`, `cells_all_sims`, `splits_list`, `bandpasses_list` |
+| Coadded spectra + covariance | `BBCompSep` | `cells_coadded`, `cells_noise`, `cells_coadded_cov`, stage config |
+| Existing BBPower outputs | `BBPlotter` | `cells_coadded*`, `cells_fiducial`, `param_chains`, plot paths |
+
+For most users, the lowest-friction path is to start at `BBCompSep` with pre-computed SACC spectra instead of running the map-level stages.
+
+## Configuration
+
+BBPower uses two YAML files:
+
+1. **Pipeline file** (e.g., `test/test_sampling_legacy.yml`) -- declares stages, input file paths, and output directories for BBPipe orchestration.
+2. **Stage config file** (e.g., `test/test_config_sampling_legacy.yml`) -- defines the physical model (CMB templates, foreground components, priors) and sampler settings.
+
+The stage config has a `global` section (shared by all stages) and per-stage sections:
+
+```yaml
+global:
+  nside: 64
+  compute_dell: true
+
+BBCompSep:
+  sampler: 'maximum_likelihood'     # emcee | polychord | fisher | single_point | timing
+  likelihood_type: 'h&l'            # chi2 | h&l
+  pol_channels: ['E', 'B']
+  l_min: 30
+  l_max: 300
+
+  cmb_model:
+    cmb_templates:
+      - "./examples/data/camb_lens_nobb.dat"
+      - "./examples/data/camb_lens_r1.dat"
+    params:
+      r_tensor: ['r_tensor', 'tophat', [-0.1, 0.0, 0.1]]
+      A_lens:   ['A_lens',   'tophat', [0.0,  1.0, 2.0]]
+
+  fg_model:
+    component_1:
+      name: Dust
+      sed: Dust
+      cl: { EE: ClPowerLaw, BB: ClPowerLaw }
+      sed_parameters:
+        beta_d: ['beta_d', 'Gaussian', [1.59, 0.11]]
+        temp_d: ['temp',   'fixed',    [19.6]]
+        nu0_d:  ['nu0',    'fixed',    [353.]]
+      cl_parameters:
+        BB:
+          amp_d_bb:   ['amp',   'tophat', [0., 5., 10.]]
+          alpha_d_bb: ['alpha', 'tophat', [-1., -0.2, 0.]]
+          l0_d_bb:    ['ell0',  'fixed',  [80.]]
+```
+
+See [docs/configuration.md](docs/configuration.md) for the complete reference.
+
+## What Each Stage Writes
+
+These are the outputs you will typically inspect when wiring the pipeline together:
+
+| Stage | Main outputs |
+|---|---|
+| `BBPowerSpecter` | `cells_all_splits.fits`, `cells_all_sims.txt`, `mcm*` |
+| `BBPowerSummarizer` | `cells_coadded.fits`, `cells_coadded_total.fits`, `cells_noise.fits`, `cells_null.fits` |
+| `BBCompSep` | `emcee.npz`, `chi2.npz`, `single_point.npz`, `fisher.npz`, `cells_model.npz`, `config_copy.yml` |
+| `BBPlotter` | `plots.dir/`, `plots_page.html`, and optionally `triangle.png` |
+
+`BBCompSep` always expects `--output_dir` and `--config_copy` to point to an existing writable directory. The common pattern is to create that directory before invoking the stage.
+
+## Parameter Format
+
+Every model parameter is defined as a three-element list:
+
+```yaml
+param_name: ['internal_name', 'prior_type', [prior_args]]
+```
+
+| Prior type | Args | Description |
+|---|---|---|
+| `fixed` | `[value]` | Not sampled; held constant |
+| `tophat` | `[lower, center, upper]` | Uniform prior; `center` is the initial value |
+| `Gaussian` | `[mean, sigma]` | Gaussian prior; `mean` is the initial value |
+
+## Samplers
+
+| Name | Config key | Extra options | Output |
+|---|---|---|---|
+| emcee MCMC | `emcee` | `nwalkers`, `n_iters` | `emcee.npz` (chain, names) |
+| PolyChord nested sampling | `polychord` | `nlive`, `nrepeat` | `polychord/` directory |
+| Maximum likelihood | `maximum_likelihood` | -- | `chi2.npz` (best-fit params) |
+| Fisher matrix | `fisher` | -- | `fisher.npz` (params, fisher) |
+| Single-point chi2 | `single_point` | -- | `single_point.npz` |
+| Timing benchmark | `timing` | -- | `timing.npz` |
+
+`emcee` runtime notes:
+
+- `BBCompSep` uses a thread pool by default for `sampler: emcee`. This avoids pickling failures that can appear with process pools when the likelihood contains `fgbuster` bandpass wrappers.
+- `BBPOWER_EMCEE_WORKERS` chooses the requested worker count, but BBPower caps the effective value to about half the walkers, `ceil(nwalkers / 2)`, because emcee's default stretch move updates one red-blue split at a time.
+- `BBPOWER_EMCEE_POOL=thread` is the recommended default. `serial` is useful for debugging, and `process` should only be used when the likelihood is known to be fully picklable.
+- Native math-library thread settings such as `OMP_NUM_THREADS`, `OPENBLAS_NUM_THREADS`, and `MKL_NUM_THREADS` are separate from the emcee worker count.
+- Only one emcee process may write a given `output_dir/emcee.npz.h5` at a time. Resubmission is fine after the earlier process exits, but running two jobs against the same output directory concurrently can corrupt the backend.
+- See [docs/threading.md](docs/threading.md) for the full explanation of parameter precedence, bash defaults, the worker cap, and cluster examples such as `32` CPUs with `40` walkers.
+
+## Tests
+
+Tests are shell scripts in `test/`. The fastest current smoke tests are:
+
+```bash
+bash test/run_compsep_test.sh
+bash test/run_predicted_spectra_test.sh
+```
+
+See [docs/examples.md](docs/examples.md) for descriptions of all test scripts and example workflows.
+
+For setup validation, the most useful smoke tests are:
+
+```bash
+# BBCompSep only
+bash test/run_compsep_test.sh
+
+# Legacy direct spectra -> BBCompSep + BBPlotter workflow
+bash test/run_sampling_test.sh
+
+# Predicted spectra output
+bash test/run_predicted_spectra_test.sh
+```
+
+## Documentation
+
+- [docs/setup.md](docs/setup.md) -- Installation by workflow, stage entry points, and troubleshooting
+- [docs/threading.md](docs/threading.md) -- Detailed guide to `BBCompSep` emcee workers, BLAS/OpenMP threads, and cluster setup
+- [docs/architecture.md](docs/architecture.md) -- Module interactions, data flow, and class relationships
+- [docs/configuration.md](docs/configuration.md) -- Complete configuration reference
+- [docs/examples.md](docs/examples.md) -- Step-by-step usage examples and test descriptions
+- [docs/refactor_stack.md](docs/refactor_stack.md) -- Detailed branch-by-branch refactor integration notes
+
+## Credits
+
+Developed by the Simons Observatory BB Analysis Working Group. Questions and contributions welcome -- contact Max Abitbol (mabitbol), David Alonso (damonge), or open an issue.
+
+## License
+
+BSD 3-Clause. See [LICENSE](LICENSE) for details.
diff --git a/bbpower/__init__.py b/bbpower/__init__.py
@@ -1,5 +1,19 @@
+from __future__ import annotations
+
+from importlib import import_module
+from typing import Any
+
 from bbpipe import PipelineStage  # noqa
-from .power_specter import BBPowerSpecter  # noqa
-from .power_summarizer import BBPowerSummarizer  # noqa
-from .compsep import BBCompSep  # noqa
-from .plotter import BBPlotter  # noqa
+
+from ._stages import STAGE_MODULES
+
+__all__ = ["PipelineStage", *STAGE_MODULES]
+
+
+def __getattr__(name: str) -> Any:
+    module_name = STAGE_MODULES.get(name)
+    if module_name is None:
+        raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
+
+    module = import_module(module_name)
+    return getattr(module, name)