[2/4] Add calculate_image_features with skimage + squidpy-native features#1190
[2/4] Add calculate_image_features with skimage + squidpy-native features#1190timtreis wants to merge 8 commits into
Conversation
Introduces sq.experimental.im.calculate_image_features for per-cell feature extraction from segmentation masks. This PR ships the orchestration scaffold and the two non-cp_measure feature paths: - skimage: regionprops (mask + label+image, group-level and fine-grained) - squidpy: summary statistics, GLCM texture, colour histograms cp_measure integration (`cpmeasure:*` feature flags) and the `align_mode="rasterize"` branch land in follow-up PRs. Both are recognised here so users get a clear `NotImplementedError` rather than "unknown feature". `features=None` likewise raises -- the cp_measure-as-default behaviour from PR scverse#982 lands with cp_measure. `_classify_dropped_cells` (the per-cell drop-reason diagnostic) is deferred to a follow-up; `DropReport` still tracks `empty_tiles`. No new hard dependencies. Test plan: 29 tests in tests/experimental/test_calculate_image_features.py cover orchestration, skimage features, squidpy-native features, align_mode='strict' (happy path + raises-on-mismatch), cpmeasure:* rejection, align_mode='rasterize' rejection, channel selection, tiled-vs-single equivalence.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1190 +/- ##
==========================================
- Coverage 75.33% 75.28% -0.06%
==========================================
Files 56 57 +1
Lines 7922 8212 +290
Branches 1292 1360 +68
==========================================
+ Hits 5968 6182 +214
- Misses 1444 1488 +44
- Partials 510 542 +32
🚀 New features to boost your workflow:
|
- Trim DropReport to `empty_tiles` (the only field this code ever increments) - Narrow `align_mode` Literal to "strict" and add a runtime guard that catches dynamic callers passing other values - Exclude `cpmeasure:*` flag names from the "available" list in the unknown-feature error (they always raise NotImplementedError; listing them as available is misleading) - Raise on ambiguous mixes of `skimage:label` with `skimage:label:<prop>` (and same for `skimage:label+image`); the previous order-dependent behaviour silently expanded the narrowing form - Collapse `pd.concat -> replace([inf,-inf],0) -> fillna(0) -> .values.astype(float32)` into a single numpy pass via `np.nan_to_num`; saves two full-table copies - Use `pd.Categorical.from_codes` for the region column to avoid allocating an N-element Python list for a one-level categorical - Hoist the labels_key/shapes_key XOR pick to one local - Add `experimental.im.calculate_image_features` to docs/api.md Also: remove all references to the multi-PR split (follow-up PRs, PR-2, PR-4, "in this PR", "cp_measure-as-default behaviour from PR scverse#982", TestPR982Concerns, "Concern N" markers) from code and tests.
| labels_key: str | None = None, | ||
| shapes_key: str | None = None, | ||
| scale: str | None = None, | ||
| channels: list[str] | list[int] | None = None, |
There was a problem hiding this comment.
list[int] errors I think, is this defered to the next PR's?
There was a problem hiding this comment.
Yeah exactly. Splitting the functionality up in some many PRs massively bloats the PR quite a bit so I didn't want to add code just to remove it tomorrow
| features | ||
| Which features to compute. Required (``None`` is rejected). | ||
| Accepts a list of strings: | ||
|
|
||
| - ``"skimage:label"`` (all mask props), ``"skimage:label:area"`` | ||
| (single prop), ``"skimage:label+image"`` (all intensity props), | ||
| ``"skimage:label+image:intensity_mean"`` (single prop) | ||
| - ``"squidpy:summary"``, ``"squidpy:texture"``, | ||
| ``"squidpy:color_hist"`` | ||
|
|
||
| ``cpmeasure:*`` flag names are recognised but currently raise |
There was a problem hiding this comment.
There should be ideally an explanation on each one here.
There was a problem hiding this comment.
Will defer this to the notebook that showcases the functionality. The names are fairly descriptive, these are distinct feature groups within the packages
selmanozleyen
left a comment
There was a problem hiding this comment.
label + image naming
Names skimage:label and skimage:label+image really don't make sense to me. If we are always going to need a label and an image why would we need to name it label+image? Why not call it morphology for "label" and "intensity" for "label+image" for example? Adding the + sign makes the syntax more confusing and is named based on implementation specific jargon.
mandatory image_key
I can see users getting annoyed when they just want "skimage:label" but don't want to provide image_key. Why is it not optional? I am not sure why the implementation here always needs an image for features that don't need an image. This needs to be solved somehow. Would this make sense calculate_image_features(sdata, labels_key="seg", features=["skimage:morphology"]) ?
|
Another comment I'd like to make is the folder and file structure. At this point I would rather have smaller but more files and folder nesting. I am saying this in general but also for this function I am not sure why it's not called |
| for ch_idx, ch_name in enumerate(channel_names): | ||
| regions = measure.regionprops(labels, intensity_image=image[ch_idx]) | ||
| rows = {r.label: _regionprops_to_row(r, intensity_props) for r in regions} |
There was a problem hiding this comment.
This is potentially very slow!
Use skimage.measure.regionprops_table (which the legacy API used). It executes the loops internally in Cython, returning flat NumPy arrays directly. You can pass the multi-channel image as intensity_image=np.moveaxis(image, 0, -1) to get all channels at once, which is orders of magnitude faster and memory-efficient.
Check out ImageContainer, I think this is done in a more optimized way there
There was a problem hiding this comment.
Yeah, I think I was doing it that way for the granularity in selecting features, but if it's a lot slower, I might engineer around that
|
|
||
| # Large single-scale → tiled centroid computation | ||
| logg.info("Computing centroids in tiled mode (large single-scale labels).") | ||
| return compute_cell_info_tiled(labels_da) |
There was a problem hiding this comment.
compute_cell_info_tiled might need a second look for optimization.
There was a problem hiding this comment.
Yeah, I think once everyhting is in, I'll do another full optimisation scan
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def _compute_squidpy_per_cell( |
There was a problem hiding this comment.
This function can be numbafied. In fact I can do it if you want. (Maybe even in a separate PR)
There was a problem hiding this comment.
Yeah let's do that in a follow-up
Responds to @selmanozleyen's review of calculate_image_features. - Perf (L322): rewrite _compute_skimage_features to skimage.measure.regionprops_table - one vectorised Cython pass per branch instead of a per-region Python getattr loop; intensity props use a single multichannel call (moveaxis -> (y,x,c)) computing all channels at once, then remapped to <prop>_<channel>. Morphology props now use skimage's native flattened names (centroid-0, inertia_tensor-0-0). Verified value parity against the old path on an adversarial fixture (degenerate 1-2px cells, empty tile, multi-dim props, multi/single channel); regionprops_table is robust on degenerate cells (no raise/NaN), so no special-casing needed. Deletes _regionprops_to_row (+ its now-unused Any import). - API (L587): narrow `channels: list[str] | list[int]` -> `list[str]` (the body already rejects ints and the docstring already says names-only; the hint was lying). - Docs (L632): rewrite the `features` docstring into a brief grouping paragraph saying what each flag computes; add a short Examples block; drop the `cpmeasure:*` mention (the code still recognizes them and raises NotImplementedError - just undocumented until the cp-measure PR lands). Deferred (discussed separately with reviewer): numba-ifying _compute_squidpy_per_cell (L337) and the compute_cell_info_tiled centroid path (L573, in already-merged _tiling.py). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both branches of _compute_skimage_features repeated the same
table.pop('label') + pd.DataFrame(table, index=index) idiom (the intensity
branch with an extra column rename). Fold it into a small helper with an
optional rename callable. Behaviour-preserving (parity unchanged).
…s-scaffold # Conflicts: # src/squidpy/experimental/im/__init__.py
PR 2/4 of the #982 split (plan: PR-1 tests, PR-2 skimage+squidpy features, PR-3 alignment, PR-4 cp_measure).
Introduces
squidpy.experimental.im.calculate_image_featuresfor per-cell feature extraction from segmentation masks. This is the foundation that PR-3 (alignment) and PR-4 (cp_measure) build on; cp_measure integration itself lands in PR-4 so the new hard-dependency conversation (cp-measure,centrosome) is isolated.What's in
regionpropsfeatures (skimage:label,skimage:label+image, fine-grained per-property).squidpy:summary,squidpy:texture(GLCM),squidpy:color_hist.DropReport, validation, lazy_prepare_lazy, centroid computation, parallel tile dispatch viajoblib.calculate_image_features(sdata, image_key, labels_key=None, shapes_key=None, features=..., align_mode="strict", ...).What's deferred to follow-up PRs
cpmeasure:*feature flags (PR-4). Flag names are recognised here and raiseNotImplementedErrorpointing to the follow-up, not "unknown feature".align_mode="rasterize"(PR-3). RaisesNotImplementedError. PR-2 supports onlyalign_mode="strict"(image and labels must share the same pixel grid)._classify_dropped_cells(per-cell drop-reason diagnostic). Deferred;DropReportstill tracksempty_tiles.features=Nonenow raises. The cp_measure-as-default behaviour lands with PR-4.No new hard dependencies
PR-2 adds zero new
pyproject.tomldeps.cp-measureandcentrosomeland with PR-4.