Skip to content

Stain normalization: Macenko/Vahadane decomposition + background estimator#1196

Open
timtreis wants to merge 8 commits into
scverse:mainfrom
timtreis:feature/stain-decomposition
Open

Stain normalization: Macenko/Vahadane decomposition + background estimator#1196
timtreis wants to merge 8 commits into
scverse:mainfrom
timtreis:feature/stain-decomposition

Conversation

@timtreis
Copy link
Copy Markdown
Member

@timtreis timtreis commented Jun 1, 2026

Third PR of the H&E stain-normalization arc. Implements Macenko/Vahadane behind the #1191 dispatchers and sharpens the module's UX.

Workflow

sq.experimental.im.detect_tissue(sdata, "he")                     # prerequisite
ref = sq.experimental.im.fit_stain_reference(sdata, "he", method="macenko")

sq.experimental.im.normalize_stains(sdata, "he", ref)             # -> writes sdata.images["he_normalized"]
sq.experimental.im.decompose_stains(sdata, "he", ref)             # -> writes he_hematoxylin / he_eosin / he_residual
maps = sq.experimental.im.decompose_stains(sdata, "he", ref, inplace=False)  # -> dict of (y, x) maps
  • inplace + *_key_added like detect_tissue: inplace=True default, keys default to f"{key}_normalized" / the key prefix; inplace=False returns instead. decompose_stains validates all target keys are free before any write (atomic).
  • Renames: apply_stain_normalization -> normalize_stains; background_intensity/estimate_background_intensity -> white_point/estimate_white_point.
  • output_dtype (normalize: source dtype; decompose: float16), tissue_mask_key (default f"{key}_tissue"), preserve_background=True (non-tissue stays byte-identical).

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 91.34860% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.07%. Comparing base (da789d0) to head (1f59452).

Files with missing lines Patch % Lines
...c/squidpy/experimental/im/_stain/_decomposition.py 92.24% 5 Missing and 5 partials ⚠️
src/squidpy/experimental/im/_stain/_normalize.py 92.23% 5 Missing and 3 partials ⚠️
src/squidpy/experimental/im/_stain/_reference.py 72.72% 3 Missing and 3 partials ⚠️
src/squidpy/experimental/im/_stain/_validation.py 92.85% 2 Missing and 2 partials ⚠️
src/squidpy/experimental/im/_utils.py 78.94% 1 Missing and 3 partials ⚠️
src/squidpy/experimental/im/_stain/_white_point.py 92.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1196      +/-   ##
==========================================
+ Coverage   75.33%   76.07%   +0.74%     
==========================================
  Files          56       59       +3     
  Lines        7922     8255     +333     
  Branches     1292     1343      +51     
==========================================
+ Hits         5968     6280     +312     
- Misses       1444     1453       +9     
- Partials      510      522      +12     
Files with missing lines Coverage Δ
src/squidpy/experimental/im/_make_tiles.py 73.97% <100.00%> (+1.24%) ⬆️
src/squidpy/experimental/im/_stain/_conversion.py 100.00% <100.00%> (ø)
src/squidpy/experimental/im/_stain/_mask.py 100.00% <100.00%> (ø)
src/squidpy/experimental/im/_stain/_reinhard.py 100.00% <100.00%> (ø)
src/squidpy/experimental/im/_stain/_white_point.py 92.00% <92.00%> (ø)
src/squidpy/experimental/im/_stain/_validation.py 92.85% <92.85%> (ø)
src/squidpy/experimental/im/_utils.py 67.26% <78.94%> (+4.59%) ⬆️
src/squidpy/experimental/im/_stain/_reference.py 90.32% <72.72%> (-9.68%) ⬇️
src/squidpy/experimental/im/_stain/_normalize.py 93.70% <92.23%> (-6.30%) ⬇️
...c/squidpy/experimental/im/_stain/_decomposition.py 92.24% <92.24%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timtreis timtreis force-pushed the feature/stain-decomposition branch 4 times, most recently from 3f42347 to a088394 Compare June 1, 2026 18:27
timtreis and others added 6 commits June 2, 2026 04:37
Fills the decomposition branches the Reinhard PR left as dispatch stubs;
no public signature from that PR changes.

- _validation.py: StainFittingError (carries image_key for cohort fitting),
  validate_stain_matrix, reorder_to_canonical, complement_third_column.
- _background.py: estimate_background_intensity (per-channel high-percentile
  white point; auto-used by the decomposition fits when not supplied).
- _mask.py: absorbance_foreground_mask (OD-space tissue mask) alongside the
  luminosity variant.
- _decomposition.py: MacenkoParams/VahadaneParams, Macenko (SVD angular
  extremes) and Vahadane (sklearn sparse NMF) stain-matrix fits, fit/apply.
  Apply maps source->reference absorbance via one (3,3) operator and stays
  lazy; the source matrix is fit on a coarse level, never the full image.
- _reference.py: max_concentrations field (decomposition only) so the target
  concentration scale travels with the reference.
- _normalize.py: method-keyed param resolver, filled fit/apply branches, and
  decompose_stains -> (hematoxylin, eosin, residual) concentration channels.

Decomposition correctness is gated by synthetic-recovery tests (planted
matrix recovered within an angle tolerance); macenko/vahadane also fit, apply
and decompose end-to-end on the Visium H&E image.

Mandate a detect_tissue mask for the fits (contract change).

The module's own absorbance/luminosity threshold masks degenerate on real
data - on the Visium H&E the absorbance mask kept 99.1% of pixels (including
the dark fiducial ring), which fed the fit garbage and was the real cause of
Macenko's >45deg validation failure on that image. Squidpy already ships a
tested detect_tissue; the stain fits now consume it instead of thresholding
their own mask.

- fit_stain_reference / apply_stain_normalization / decompose_stains gain a
  tissue_mask_key argument. A tissue mask is required: the sdata-level
  functions resolve tissue_mask_key (or f"{image_key}_tissue") via
  resolve_tissue_mask(auto_create=False) and raise an actionable error asking
  the caller to run detect_tissue if none exists. This changes the Reinhard
  contract (was: auto luminosity mask).
- The DataArray-layer primitives (_tissue_od/fit/apply, fit/apply_reinhard)
  take an optional tissue_mask and fall back to the threshold mask when it is
  None, so the synthetic-image algorithm tests are unchanged.
- apply_reinhard now reduces its source statistics on a coarse fit_rgb (like
  apply_decomposition), so the mask and stats stay small on whole slides.
- Reuse: _choose_label_scale_for_image moves from _make_tiles to
  experimental/im/_utils.py; resolve_tissue_mask gains auto_create.

With a real detect_tissue mask (33% of the H&E, fiducial ring dropped)
Macenko fits cleanly and agrees with Vahadane - the earlier "Macenko fails on
this image" was a masking artifact - so the H&E smoke test exercises both
methods again.

Keep the background white: fixed I_0 default + output composite.

Two parity fixes so normalization no longer tints non-tissue/white pixels
(matching HistomicsTK):

- Default I_0 (background_intensity) is now a fixed full-white [255, 255, 255]
  (DEFAULT_BACKGROUND_INTENSITY), not an image-derived high percentile. The
  percentile estimate returned ~130 on a dim scan, so true-white pixels got
  negative absorbance and could only reconstruct as far as 130 (grey, then
  tinted). estimate_background_intensity stays as an opt-in helper for slides
  with a known non-white background.
- apply_stain_normalization gains preserve_background (default True): the
  global colour map would recolour every non-I_0 pixel, so non-tissue pixels
  are composited back from the source verbatim (HistomicsTK's mask_out). The
  composite stays lazy via an output-resolution tissue mask. Set
  preserve_background=False for full-frame normalization.

Verified: a colour-cast query slide normalized through the public API keeps
its background byte-identical to the input ([160.5,101.1,141.1] vs
[161.9,101.4,141.2]) while the tissue is recoloured.

decompose_stains: store each stain as its own image; optional residual.

Rather than one 3-channel image, decompose_stains now writes a separate
single-channel image per stain (image_key_added as a prefix ->
f"{prefix}_hematoxylin", f"{prefix}_eosin", f"{prefix}_residual") and returns
a dict of named (y, x) maps when not writing. include_residual (default True)
drops the residual - a decomposition-quality diagnostic (absorbance not
explained by H or E: extra chromogen, artifacts, or a poor fit), not a
biological stain. Provenance (method, stain matrix, white point) lives on the
StainReference, not on the element (custom element attrs don't survive the
zarr round-trip).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Behaviour-preserving renames from the UX-sharpening pass (plans/
stain-pr3-decomposition.md):

- apply_stain_normalization -> normalize_stains (verb, not noun-phrase)
- background_intensity -> white_point (the I_0 reference, not the image's
  background region) across params, the StainReference field, rgb_to_sda,
  and docstrings
- estimate_background_intensity -> estimate_white_point;
  DEFAULT_BACKGROUND_INTENSITY -> DEFAULT_WHITE_POINT
- _stain/_background.py -> _stain/_white_point.py (+ its test)

No behaviour change; 121 stain tests pass unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Default white point is now dtype-aware: dtype_max() gives the full-white
  value (255 / 65535 / 1.0) and default_white_point() raises with guidance when
  the data clearly doesn't match its dtype's range (8-bit-in-uint16, 0-255 float).
- Bit-depth-agnostic reconstruction: sda_to_rgb / lab_ruderman_to_rgb take an
  out_dtype and clip to that dtype's valid range (dtype_max) rather than a
  hardcoded 255 - threaded from the source image dtype through apply. Per review,
  the dtype (not a derived max_value float) is the threaded parameter.
- estimate_white_point is now sdata-level and samples the per-channel MEDIAN over
  non-tissue (background) pixels via the tissue mask (HistomicsTK semantics),
  replacing the whole-image percentile that under-estimated on dim scans.
- Renamed _background.py docstring/semantics to white-point; test fixtures are
  uint8 (real H&E) rather than float-0-255 (which the [0,1]-float convention
  would flag).

125 stain tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- FIX (correctness): the bit-depth range check now runs on the APPLY/estimate
  paths too, not only fit. A float image holding 0-255 values previously slipped
  through normalize_stains and clipped its reconstruction to [0,1]
  (dtype_max(float)=1.0), silently destroying the output. Extracted the check
  into validate_rgb_range() and call it from fit / normalize_stains /
  estimate_white_point. + regression test.
- default_white_point() is now a pure defaulter (no max() reduction, no raise) -
  validation lives in validate_rgb_range(), separating the two concerns.
- Extracted _resolve_mask_key_and_scale() shared by the two tissue-mask
  consumers (dedup).
- Documented that estimate_white_point materialises its level (keep it coarse).

128 stain tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (Step 3)

Align the stain entry points with the detect_tissue inplace+key idiom and
finish the bit-depth cast that Step 2 deferred.

normalize_stains:
- inplace=True (default) writes the result to sdata.images[image_key_added],
  image_key_added defaulting to f"{image_key}_normalized"; inplace=False
  returns the lazy DataArray and leaves sdata untouched.
- output_dtype (default = source dtype) is the clip range and the final cast.
- cast-at-boundary: the reconstruction stayed in float (clipped to range);
  it is now rounded (integer dtypes) and cast at the write boundary, so the
  stored image is the requested dtype and integer background is byte-identical.

decompose_stains:
- inplace=True (default) writes each stain as a single-channel image under the
  image_key_added prefix (default = image_key); the write is atomic - all
  target keys are validated free before any is written.
- output_dtype (default float16; float32 for strict quantification).

_conversion.cast_to_image_dtype performs the deferred rounding+cast, kept lazy.

Tests updated to the inplace=True default; added coverage for the derived-key
defaults, output_dtype overrides, and the atomic-abort path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…uild)

The tissue-mask mandate added `:func:` cross-references to detect_tissue in the
now-documented fit_stain_reference / estimate_white_point docstrings. detect_tissue
is not in docs/api.md (documenting it would cascade into its FelzenszwalbParams /
WekaParams / BackgroundDetectionParams / DetectTissueMethod surface - out of scope,
deferred to the docs PR), so the references resolved to nothing and the `-W` docs
build failed on 3 warnings. Suppress the cross-reference with the `!` prefix; the
name still renders, just without a (dead) link.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@timtreis timtreis force-pushed the feature/stain-decomposition branch from ccc8e77 to e7abe1d Compare June 2, 2026 02:39
timtreis and others added 2 commits June 2, 2026 15:24
…lt (Step 4)

Three guard/UX changes to fit_stain_reference and the channel check:

- Default `method` flips reinhard -> **macenko**: the no-choice path now supports
  both normalize and decompose, and macenko's one documented weakness (artifact
  pixels) is exactly what the mandatory tissue mask removes. reinhard stays the
  explicit fast colour-transfer opt-out.
- Expose the H/E sanity gate: `max_angle_deg` (deviation tolerance) and
  `canonical_reference` (the Ruifrok H/E vectors) are now documented kwargs on
  fit_stain_reference, threaded into reorder_to_canonical + validate_stain_matrix
  for the decomposition methods. Defaults unchanged (45 deg / Ruifrok).
- Strict 3-channel RGB: the channel-dim check now raises a clear, RGB-specific
  message naming the RGBA/multi-channel case instead of a generic "length 3".

Tests: pin method="reinhard" on the reinhard-oriented cases (random fixture /
reinhard smoke+visual), add default-is-macenko, max_angle_deg-too-strict,
canonical_reference passthrough, and an RGBA-rejection test; update the two
mask/conversion channel-length assertions to the new message.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-W build)

Step 4 added a `:class:`StainFittingError`` reference to the fit_stain_reference
docstring. StainFittingError is exported from the _stain package but not surfaced
at the public squidpy.experimental.im level, so it has no autosummary target and
the `-W` docs build failed. Surfacing it publicly is a deliberate API decision for
the docs PR; for now suppress the cross-reference with `!`, consistent with the
detect_tissue refs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@timtreis timtreis requested a review from selmanozleyen June 3, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant