Stain normalization: Macenko/Vahadane decomposition + background estimator#1196
Open
timtreis wants to merge 8 commits into
Open
Stain normalization: Macenko/Vahadane decomposition + background estimator#1196timtreis wants to merge 8 commits into
timtreis wants to merge 8 commits into
Conversation
3f42347 to
a088394
Compare
Fills the decomposition branches the Reinhard PR left as dispatch stubs;
no public signature from that PR changes.
- _validation.py: StainFittingError (carries image_key for cohort fitting),
validate_stain_matrix, reorder_to_canonical, complement_third_column.
- _background.py: estimate_background_intensity (per-channel high-percentile
white point; auto-used by the decomposition fits when not supplied).
- _mask.py: absorbance_foreground_mask (OD-space tissue mask) alongside the
luminosity variant.
- _decomposition.py: MacenkoParams/VahadaneParams, Macenko (SVD angular
extremes) and Vahadane (sklearn sparse NMF) stain-matrix fits, fit/apply.
Apply maps source->reference absorbance via one (3,3) operator and stays
lazy; the source matrix is fit on a coarse level, never the full image.
- _reference.py: max_concentrations field (decomposition only) so the target
concentration scale travels with the reference.
- _normalize.py: method-keyed param resolver, filled fit/apply branches, and
decompose_stains -> (hematoxylin, eosin, residual) concentration channels.
Decomposition correctness is gated by synthetic-recovery tests (planted
matrix recovered within an angle tolerance); macenko/vahadane also fit, apply
and decompose end-to-end on the Visium H&E image.
Mandate a detect_tissue mask for the fits (contract change).
The module's own absorbance/luminosity threshold masks degenerate on real
data - on the Visium H&E the absorbance mask kept 99.1% of pixels (including
the dark fiducial ring), which fed the fit garbage and was the real cause of
Macenko's >45deg validation failure on that image. Squidpy already ships a
tested detect_tissue; the stain fits now consume it instead of thresholding
their own mask.
- fit_stain_reference / apply_stain_normalization / decompose_stains gain a
tissue_mask_key argument. A tissue mask is required: the sdata-level
functions resolve tissue_mask_key (or f"{image_key}_tissue") via
resolve_tissue_mask(auto_create=False) and raise an actionable error asking
the caller to run detect_tissue if none exists. This changes the Reinhard
contract (was: auto luminosity mask).
- The DataArray-layer primitives (_tissue_od/fit/apply, fit/apply_reinhard)
take an optional tissue_mask and fall back to the threshold mask when it is
None, so the synthetic-image algorithm tests are unchanged.
- apply_reinhard now reduces its source statistics on a coarse fit_rgb (like
apply_decomposition), so the mask and stats stay small on whole slides.
- Reuse: _choose_label_scale_for_image moves from _make_tiles to
experimental/im/_utils.py; resolve_tissue_mask gains auto_create.
With a real detect_tissue mask (33% of the H&E, fiducial ring dropped)
Macenko fits cleanly and agrees with Vahadane - the earlier "Macenko fails on
this image" was a masking artifact - so the H&E smoke test exercises both
methods again.
Keep the background white: fixed I_0 default + output composite.
Two parity fixes so normalization no longer tints non-tissue/white pixels
(matching HistomicsTK):
- Default I_0 (background_intensity) is now a fixed full-white [255, 255, 255]
(DEFAULT_BACKGROUND_INTENSITY), not an image-derived high percentile. The
percentile estimate returned ~130 on a dim scan, so true-white pixels got
negative absorbance and could only reconstruct as far as 130 (grey, then
tinted). estimate_background_intensity stays as an opt-in helper for slides
with a known non-white background.
- apply_stain_normalization gains preserve_background (default True): the
global colour map would recolour every non-I_0 pixel, so non-tissue pixels
are composited back from the source verbatim (HistomicsTK's mask_out). The
composite stays lazy via an output-resolution tissue mask. Set
preserve_background=False for full-frame normalization.
Verified: a colour-cast query slide normalized through the public API keeps
its background byte-identical to the input ([160.5,101.1,141.1] vs
[161.9,101.4,141.2]) while the tissue is recoloured.
decompose_stains: store each stain as its own image; optional residual.
Rather than one 3-channel image, decompose_stains now writes a separate
single-channel image per stain (image_key_added as a prefix ->
f"{prefix}_hematoxylin", f"{prefix}_eosin", f"{prefix}_residual") and returns
a dict of named (y, x) maps when not writing. include_residual (default True)
drops the residual - a decomposition-quality diagnostic (absorbance not
explained by H or E: extra chromogen, artifacts, or a poor fit), not a
biological stain. Provenance (method, stain matrix, white point) lives on the
StainReference, not on the element (custom element attrs don't survive the
zarr round-trip).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Behaviour-preserving renames from the UX-sharpening pass (plans/ stain-pr3-decomposition.md): - apply_stain_normalization -> normalize_stains (verb, not noun-phrase) - background_intensity -> white_point (the I_0 reference, not the image's background region) across params, the StainReference field, rgb_to_sda, and docstrings - estimate_background_intensity -> estimate_white_point; DEFAULT_BACKGROUND_INTENSITY -> DEFAULT_WHITE_POINT - _stain/_background.py -> _stain/_white_point.py (+ its test) No behaviour change; 121 stain tests pass unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Default white point is now dtype-aware: dtype_max() gives the full-white value (255 / 65535 / 1.0) and default_white_point() raises with guidance when the data clearly doesn't match its dtype's range (8-bit-in-uint16, 0-255 float). - Bit-depth-agnostic reconstruction: sda_to_rgb / lab_ruderman_to_rgb take an out_dtype and clip to that dtype's valid range (dtype_max) rather than a hardcoded 255 - threaded from the source image dtype through apply. Per review, the dtype (not a derived max_value float) is the threaded parameter. - estimate_white_point is now sdata-level and samples the per-channel MEDIAN over non-tissue (background) pixels via the tissue mask (HistomicsTK semantics), replacing the whole-image percentile that under-estimated on dim scans. - Renamed _background.py docstring/semantics to white-point; test fixtures are uint8 (real H&E) rather than float-0-255 (which the [0,1]-float convention would flag). 125 stain tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- FIX (correctness): the bit-depth range check now runs on the APPLY/estimate paths too, not only fit. A float image holding 0-255 values previously slipped through normalize_stains and clipped its reconstruction to [0,1] (dtype_max(float)=1.0), silently destroying the output. Extracted the check into validate_rgb_range() and call it from fit / normalize_stains / estimate_white_point. + regression test. - default_white_point() is now a pure defaulter (no max() reduction, no raise) - validation lives in validate_rgb_range(), separating the two concerns. - Extracted _resolve_mask_key_and_scale() shared by the two tissue-mask consumers (dedup). - Documented that estimate_white_point materialises its level (keep it coarse). 128 stain tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (Step 3)
Align the stain entry points with the detect_tissue inplace+key idiom and
finish the bit-depth cast that Step 2 deferred.
normalize_stains:
- inplace=True (default) writes the result to sdata.images[image_key_added],
image_key_added defaulting to f"{image_key}_normalized"; inplace=False
returns the lazy DataArray and leaves sdata untouched.
- output_dtype (default = source dtype) is the clip range and the final cast.
- cast-at-boundary: the reconstruction stayed in float (clipped to range);
it is now rounded (integer dtypes) and cast at the write boundary, so the
stored image is the requested dtype and integer background is byte-identical.
decompose_stains:
- inplace=True (default) writes each stain as a single-channel image under the
image_key_added prefix (default = image_key); the write is atomic - all
target keys are validated free before any is written.
- output_dtype (default float16; float32 for strict quantification).
_conversion.cast_to_image_dtype performs the deferred rounding+cast, kept lazy.
Tests updated to the inplace=True default; added coverage for the derived-key
defaults, output_dtype overrides, and the atomic-abort path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…uild) The tissue-mask mandate added `:func:` cross-references to detect_tissue in the now-documented fit_stain_reference / estimate_white_point docstrings. detect_tissue is not in docs/api.md (documenting it would cascade into its FelzenszwalbParams / WekaParams / BackgroundDetectionParams / DetectTissueMethod surface - out of scope, deferred to the docs PR), so the references resolved to nothing and the `-W` docs build failed on 3 warnings. Suppress the cross-reference with the `!` prefix; the name still renders, just without a (dead) link. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ccc8e77 to
e7abe1d
Compare
…lt (Step 4) Three guard/UX changes to fit_stain_reference and the channel check: - Default `method` flips reinhard -> **macenko**: the no-choice path now supports both normalize and decompose, and macenko's one documented weakness (artifact pixels) is exactly what the mandatory tissue mask removes. reinhard stays the explicit fast colour-transfer opt-out. - Expose the H/E sanity gate: `max_angle_deg` (deviation tolerance) and `canonical_reference` (the Ruifrok H/E vectors) are now documented kwargs on fit_stain_reference, threaded into reorder_to_canonical + validate_stain_matrix for the decomposition methods. Defaults unchanged (45 deg / Ruifrok). - Strict 3-channel RGB: the channel-dim check now raises a clear, RGB-specific message naming the RGBA/multi-channel case instead of a generic "length 3". Tests: pin method="reinhard" on the reinhard-oriented cases (random fixture / reinhard smoke+visual), add default-is-macenko, max_angle_deg-too-strict, canonical_reference passthrough, and an RGBA-rejection test; update the two mask/conversion channel-length assertions to the new message. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-W build) Step 4 added a `:class:`StainFittingError`` reference to the fit_stain_reference docstring. StainFittingError is exported from the _stain package but not surfaced at the public squidpy.experimental.im level, so it has no autosummary target and the `-W` docs build failed. Surfacing it publicly is a deliberate API decision for the docs PR; for now suppress the cross-reference with `!`, consistent with the detect_tissue refs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Third PR of the H&E stain-normalization arc. Implements Macenko/Vahadane behind the #1191 dispatchers and sharpens the module's UX.
Workflow
inplace+*_key_addedlikedetect_tissue:inplace=Truedefault, keys default tof"{key}_normalized"/ thekeyprefix;inplace=Falsereturns instead.decompose_stainsvalidates all target keys are free before any write (atomic).apply_stain_normalization->normalize_stains;background_intensity/estimate_background_intensity->white_point/estimate_white_point.output_dtype(normalize: source dtype; decompose:float16),tissue_mask_key(defaultf"{key}_tissue"),preserve_background=True(non-tissue stays byte-identical).