Skip to content

[2/3] Add assign_stitch_groups (tile-cut stitching algorithm)#1193

Open
timtreis wants to merge 17 commits into
scverse:mainfrom
timtreis:feature/tiling-stitch-algo
Open

[2/3] Add assign_stitch_groups (tile-cut stitching algorithm)#1193
timtreis wants to merge 17 commits into
scverse:mainfrom
timtreis:feature/tiling-stitch-algo

Conversation

@timtreis
Copy link
Copy Markdown
Member

@timtreis timtreis commented May 29, 2026

What

Second PR of the 3-way split of #1170. Adds sq.experimental.tl.assign_stitch_groups (+ StitchParams): given the is_outlier=True cells flagged by calculate_tiling_qc (PR-A, #1188), it pairs facing cut edges across tile boundaries, scores each candidate pair, and assembles high-confidence pairs into stitch groups via union-find.

It only annotates .obs (four columns) plus a .uns["tiling_stitch"] audit block; the labels element is never modified. Materialising a stitched labels element is PR-C (make_stitched_labels, follows this one).

sq.experimental.tl.calculate_tiling_qc(sdata, "labels")      # PR-A (merged)
sq.experimental.tl.assign_stitch_groups(sdata, "labels")     # this PR
# sq.experimental.im.make_stitched_labels(sdata, "labels")   # PR-C (next)

The score: a transparent flat mean, not a fitted model

@selmanozleyen asked to review the math/intuition of stitch_confidence and whether it is a novel/fitted formula. Short answer: it is not fitted. An earlier prototype on this branch (ba60119f) shipped an L2-regularised logistic regression with coefficients fit on 2197 synthetic disk pairs. We removed it deliberately — fixture-fit weights claim a calibration they cannot honour on real data, and squidpy should not silently encode a synthetic distribution. The score is now the flat (unweighted) mean of five standard, bounded geometric descriptors, each in [0, 1]. The features are recorded in .uns["tiling_stitch"].

For a facing cut-edge pair (each a 1-D chord extent=(lo, hi) at perpendicular coord):

  1. iou — 1-D extent IoU of the two chords: overlap / (len_a + len_b - overlap). Do the two cut faces line up laterally?
  2. endpoint_matchmax(0, 1 - (|a_lo-b_lo| + |a_hi-b_hi|) / max(len_a, len_b)). Do the faces start/end at the same lateral positions?
  3. merge_compactnessmin(4*pi*A / P^2, 1) of the closed union mask's largest component. A re-joined real cell is compact; a false merge is elongated. Strongest single discriminator.
  4. merge_solidityarea / convex_hull_area of the same component (clamped to 1). A false merge creates a concave waist at the join.
  5. gap_proximityclip(1 - gap / (2*close_radius), 0, 1). The seam gap relative to the closing reach (the scale at which morphological closing could actually bridge it) — independent of the max_gap search radius; neutral (1.0) when closing is disabled.

stitch_confidence(pair) = mean(the five features). Per-cell group confidence is the min over the group's pairwise confidences (weakest link).

Runnable MREs (pure numpy/scipy/skimage, no squidpy)

The 1-D edge features:

def extent_iou(a, b):
    o = max(0.0, min(a[1], b[1]) - max(a[0], b[0]))
    u = (a[1] - a[0]) + (b[1] - b[0]) - o
    return o / u if u > 0 else 0.0

def endpoint_match(a, b):
    d = abs(a[0] - b[0]) + abs(a[1] - b[1]); m = max(a[1] - a[0], b[1] - b[0])
    return max(0.0, 1.0 - d / m) if m > 0 else 0.0

# true cut: two faces span the same range          false pair: barely overlapping
assert extent_iou((0, 10), (0, 10)) == 1.0
assert extent_iou((0, 10), (8, 18)) < 0.2
assert endpoint_match((0, 10), (0, 10)) == 1.0

The 2-D merged-shape features (error handling elided):

import numpy as np
from scipy.ndimage import binary_closing
from skimage.morphology import disk
from skimage.measure import label, regionprops

def merge_shape(mask, close_radius=3):
    closed = binary_closing(mask, structure=disk(close_radius))
    cc = label(closed, connectivity=2); s = np.bincount(cc.ravel()); s[0] = 0
    r = regionprops((cc == s.argmax()).astype(np.uint8))[0]; p = max(r.perimeter, 1.0)
    return min(r.solidity, 1.0), min(4 * np.pi * r.area / (p * p), 1.0)

# true cut: two halves of one disk, 2px seam -> high solidity + compactness
g = np.zeros((40, 40), bool); yy, xx = np.ogrid[:40, :40]
disk_mask = (yy - 20) ** 2 + (xx - 20) ** 2 <= 12 ** 2
true_cut = disk_mask.copy(); true_cut[19:21, :] = False     # cut into two halves
# false merge: two separate disks side by side -> concave waist, low solidity
false_merge = ((yy - 20) ** 2 + (xx - 12) ** 2 <= 7 ** 2) | ((yy - 20) ** 2 + (xx - 30) ** 2 <= 7 ** 2)
print("true  cut :", merge_shape(true_cut))      # ~ (solidity 0.96, compactness 0.92)
print("false pair:", merge_shape(false_merge))   # ~ (solidity 0.73, compactness 0.47) - concave join
shape_features

Validation (sweep, not fit)

Weights are fixed by reasoning (flat-equal); the ground-truth fixture only validates them. Sweeping min_confidence over 3 synthetic layouts (recall = fraction of cut pieces stitched; precision = cut / (cut + intact-false-merge)):

min_confidence recall (5-feature) precision intact false-merges
0.3 – 0.7 0.64 1.00 0
0.8 0.58 1.00 0
0.9 0.06 1.00 0

(The recall ceiling is < 1 because not every cut piece has a stitchable partner.) Default min_confidence=0.7 gives full attainable recall with zero false merges and headroom before the high-threshold drop-off. Users should still tune it for their data — the score is heuristic, not a calibrated probability.

validation_sweep

Obs / uns contract (consumed additively by PR-C)

  • .obs: stitch_group_id (int), is_stitched (bool), n_pieces (int32), stitch_confidence (float64; NaN = not an outlier, 1.0 = evaluated-solo, composite = stitched).
  • .uns["tiling_stitch"]: params, score_features, and the run counts.

PR-C's make_stitched_labels reads only this contract (it does not import _tiling_stitch).

Notes for review

  • Codecov: expect the patch % to dip on _tiling_stitch.py (defensive branches in the geometry helpers); the algorithm paths are covered by tests/experimental/test_tiling_stitch.py.
  • hatch-test.py3.13-pre: the pre-release-deps job is a known upstream flake (pre-existing on Add tile-cut stitching follow-up to calculate_tiling_qc #1170/[1/3] Bundle TilingQCParams; add resolve_labels_array helper #1188), not caused by this PR.
  • Visual test: TestStitchVisual renders a seam recolour (before by label_id, after by stitch_group_id); its baseline PNG is bootstrapped from the first CI run's artifact (not generated locally).
  • Also extracts three shared helpers (resolve_params, equivalent_diameter + largest_contour, iter_chunked_regionprops) into experimental/utils/, reused by _tiling_qc/_tiling to remove duplication. QC numerics are unchanged — the committed QC/tiling visual baselines pass.

Out of scope

  • make_stitched_labels + materialisation — PR-C (next, stacks on this).

@selmanozleyen
Copy link
Copy Markdown
Member

selmanozleyen commented Jun 1, 2026

I have a problem with the uns keys we give so far like: .uns["tiling_stitch"]. Can't we make a rule that whatever we write to uns as metadata we use the name of the function that wrote it? I would be more obvious imo. Also a test fails atm

Comment thread src/squidpy/experimental/tl/_tiling_stitch.py
Comment thread src/squidpy/experimental/tl/_tiling_stitch.py
Copy link
Copy Markdown
Member

@selmanozleyen selmanozleyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be probably more performance issues but I think we just need to see how this performs in real life. So we can tackle the real bottlenecks

timtreis and others added 10 commits June 2, 2026 23:16
Add sq.experimental.tl.assign_stitch_groups + StitchParams: groups tile-cut
cell pieces flagged by calculate_tiling_qc by pairing facing cut edges and
scoring each pair with a transparent weighted mean of five geometric features
(iou, endpoint_match, merge_compactness, merge_solidity, gap_proximity). Only
.obs columns + a tiling_stitch audit block are written; the labels element is
never modified. Weights default flat-equal and are tunable via feature_weights;
no coefficients are fitted or shipped.

Also:
- wire the calculate_tiling_qc hook warning about dropped stitch columns
- extract shared helpers (resolve_params, equivalent_diameter, largest_contour,
  iter_chunked_regionprops) into experimental/utils, reused by _tiling_qc and
  _tiling to remove duplication; QC numerics unchanged (visual baselines pass)
- save_diagnostics writes a zarr-safe per-pair dict-of-arrays to .uns
- clamp merge_solidity and keep gap_proximity neutral when closing is disabled

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Lock the validation-sweep outcome: at min_confidence=0.5 the deterministic
fixture recovers >=50% of cut pieces with no intact false-merges. min_confidence
default stays 0.7 (full attainable recall, zero false merges); gap_proximity
kept in the 5-feature score.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two figures answering the reviewer's request to understand the stitch_confidence
math: merge_compactness/merge_solidity separating a true cut from a false merge,
and the min_confidence validation sweep (recall/precision over synthetic layouts).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop granular unit tests of private internals (gap_proximity arithmetic,
feature_weights validation matrix, param-resolution already covered by PR-1,
determinism, numpy coercion). Keep the obs/uns contract PR-C consumes, error
paths, idempotency/inplace, the QC-rerun hook, multiscale, the diagnostics
zarr round-trip, and the visual. 33 tests/552 lines -> 15 tests/259 lines.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… not the repo)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per /simplify: revert the merged _tiling.py and _tiling_qc.py back to main
(keep only the _warn_if_dropping_stitch_columns hook the scverse#1170 review asked
for); the shared util helpers are now consumed by the new stitch module only,
no churn in merged files. Also drop the redundant score_formula string (it is
derivable from the recorded feature_weights + score_features) and loop the
per-feature diagnostics fill instead of one line per feature.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slim the v1 surface to the core feature: stitch_confidence is the flat
(unweighted) mean of the five geometric features; no feature_weights knob and
no save_diagnostics path. Both can land later if a concrete need appears. Also
drop the changelog entry (batch docs separately).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Consolidate the test file into one behavioural class + a visual class (matching
test_tiling_qc.py) with a one-line module docstring. Drop the `# ----` banner
dividers (used only by the recent tiling modules, in no core squidpy file) for
plain section comments, and rename the "Stage N" section labels to plain
descriptions (the numbering had a confusing gap).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the `# ----...` section-divider banners (used in only a few experimental
files, in no core squidpy module) for plain single-line section comments.
Comment-only change; no behaviour affected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… crops

Two performance fixes from @selmanozleyen's review of assign_stitch_groups,
both behavior-preserving (the existing public tests pin the results).

- Early-prune in _score_pairs: compute an optimistic upper bound on the
  flat-mean score from the cheap geometry features (the two shape features are
  each <= 1) and skip the costly union reconstruction when even the best case
  can't reach min_confidence. Sound: the bound never underestimates the real
  score, so no passing pair is dropped. New unit test asserts the shape step is
  skipped for a below-threshold pair and still runs for a passing one.

- Stop re-reading the labels array per pair: _extract_cut_edges already reads
  each outlier cell's bbox crop for contour tracing, so it now also returns a
  {label_id -> boolean bbox mask} dict. _merge_shape_features reconstructs the
  merge union in memory from those crops (placing each at its bbox offset, exact
  border clamping preserved) instead of fetching a fresh union crop from the
  (dask-backed) array on every candidate. Fetches are now bounded by the number
  of outlier cells, not the number of candidate pairs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@timtreis timtreis force-pushed the feature/tiling-stitch-algo branch from f65dd9b to 08e9fdf Compare June 2, 2026 21:26
timtreis and others added 2 commits June 2, 2026 23:41
…ake_stitched_labels xref)

The -W docs build failed on two unresolved references in autodoc'd symbols:

- Each StitchParams field docstring started with "Advanced: ...". napoleon parses
  a leading "word:" as a typed field, so it tried to cross-reference "Advanced"
  as a class. Drop the redundant prefix (the class docstring already frames these
  as advanced knobs); verified locally with an isolated sphinx build that the
  six warnings disappear.
- assign_stitch_groups referenced make_stitched_labels, which is the unmerged
  PR-C function (not in the API docs). Suppress the link with the `!` prefix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The early-prune in _score_pairs hardcoded the scoring internals (the `+2.0` for
the two shape features and the `/n_features` flat mean). Extract _max_achievable_score,
built on _score_pair_features with the deferred _SHAPE_FEATURES assumed at their
1.0 max, so the bound and the real score share one definition and stay in sync if
the feature set changes. Also dedupes the per-candidate feature dict (`known`
reused for both the bound and the final score). Behavior unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@timtreis
Copy link
Copy Markdown
Member Author

timtreis commented Jun 2, 2026

There can be probably more performance issues but I think we just need to see how this performs in real life. So we can tackle the real bottlenecks

Agree, the chopping into sub-PR thing bloated everything a bit. The final 3/3 PR now will hopefully tie everything back together so that we can actually optimise

The StitchVisual_seam_group_recolor PlotTester had no committed baseline, so the
stable CI jobs failed with 'Baseline image ... does not exist' (1 failed, 1036
passed). Add the baseline rendered by CI (ubuntu-latest artifact), per the
project's reference-image workflow - never generated locally (a local macOS
render differs by ~RMS 53 vs the tolerance 50 from font/AA platform variance;
the Linux baseline matches the Linux CI run).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 73.83367% with 129 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.30%. Comparing base (da789d0) to head (5ea2cb1).

Files with missing lines Patch % Lines
src/squidpy/experimental/tl/_tiling_stitch.py 75.29% 65 Missing and 39 partials ⚠️
src/squidpy/experimental/utils/_params.py 41.17% 9 Missing and 1 partial ⚠️
src/squidpy/experimental/utils/_labels.py 68.00% 6 Missing and 2 partials ⚠️
src/squidpy/experimental/tl/_tiling_qc.py 75.00% 2 Missing and 3 partials ⚠️
src/squidpy/experimental/utils/_geometry.py 80.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1193      +/-   ##
==========================================
- Coverage   75.33%   75.30%   -0.03%     
==========================================
  Files          56       59       +3     
  Lines        7922     8415     +493     
  Branches     1292     1392     +100     
==========================================
+ Hits         5968     6337     +369     
- Misses       1444     1524      +80     
- Partials      510      554      +44     
Files with missing lines Coverage Δ
src/squidpy/experimental/im/_tiling.py 86.06% <ø> (ø)
src/squidpy/experimental/utils/_geometry.py 80.00% <80.00%> (ø)
src/squidpy/experimental/tl/_tiling_qc.py 71.33% <75.00%> (+1.30%) ⬆️
src/squidpy/experimental/utils/_labels.py 73.68% <68.00%> (+4.45%) ⬆️
src/squidpy/experimental/utils/_params.py 41.17% <41.17%> (ø)
src/squidpy/experimental/tl/_tiling_stitch.py 75.29% <75.29%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

timtreis and others added 4 commits June 3, 2026 00:21
The macOS CI render differed from the Linux baseline by RMS 53 (> tol 50), and
the failed-diff showed the *whole* imshow region misaligned - including the
"before" panel, which has no stitching, so it could only be a rendering shift,
not an algorithm difference. Cause: tight_layout sizes the axes from the title
text extents, which differ across platforms (fonts), shifting the image
sub-pixel so every high-contrast cell edge mismatches.

Fix the layout instead of papering over it with a high tolerance: drop the
titles and tight_layout, pin the geometry with a fixed subplots_adjust. The
figure now has no text and the image renders to identical pixels everywhere, so
the default tolerance passes on both platforms. Baseline regenerated from CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pling

Even after dropping titles/tight_layout, Linux vs macOS still differed by RMS ~28,
localized to a row band - the signature of nearest-neighbour resampling when the
100px zoom is upscaled to the axes height: the two matplotlib versions round the
source row differently at the boundary. (Confirmed it was rendering, not the
algorithm: the "before" panel, which has no stitching, differed *more* than the
"after" panel.)

Build the before/after panels as numpy arrays, draw the dashed seam into the
array, and imshow on a full-figure axis sized exactly to the data (figsize * DPI
== array shape). 1:1 nearest with no text and no line AA renders identically on
every platform/matplotlib version, so the default tolerance passes with no
per-test override.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Linux render from the ubuntu CI artifact (204x100, 1:1). Cross-platform delta to
the macOS render is RMS ~30 < the default tolerance 50 (down from the original 53
once the tight_layout/title layout shift was removed); the residual is matplotlib
Agg cross-version edge rasterization, not algorithm (verified: the no-stitching
'before' panel differs as much, and no integer pixel shift reconciles them).
The min_confidence early-prune is behavior-preserving, so the existing public
tests already lock the result; a dedicated test that monkeypatches a private
function and hand-builds _CutEdge objects is exactly the internal coupling this
suite was trimmed away from. _max_achievable_score (sharing _score_pair_features)
makes the prune's soundness self-evident without it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants