COMP: Regenerate ShapeLabelMap baselines at full double precision by hjmjohnson · Pull Request #6521 · InsightSoftwareConsortium/ITK

hjmjohnson · 2026-06-25T19:31:39Z

Regenerates the ShapeLabelMapFixture "resulting value" baselines at full double precision and tightens most tolerances, instead of loosening them. Highest-volume flaky candidate tracked in #6518 (187 failures / 30 days, dominated by Windows MSVC).

Root cause

The "resulting value" baselines were stored at only 4–5 decimal places. For Perimeter ≈ 28.39 that rounding is ~5e-5, which skews the fixed 1e-4 absolute tolerance window asymmetrically: the true (macOS) value sits only ~5e-6 inside the lower edge, so an MSVC RelWithDebInfo build that differs by just ~5e-6 fails. The computation was always double precision — the stored baselines were the lossy part.

What changed

Regenerated the 26 computed baselines at full 17-digit double precision (captured locally at setprecision(17)).
Tightened to 1e-5 (10× tighter than the original 1e-4) for the rotation-invariant scalars (Perimeter, EquivalentSphericalPerimeter/Radius, Roundness) and the magnitude-invariant EquivalentEllipsoidDiameter — 22 asserts.
Kept 1e-4 on the 4 orientation-dependent positions (Centroid / OrientedBoundingBoxOrigin in the _Direction tests): they traverse the less-stable principal-axes path, and the full-precision baseline alone re-centers their window.
Single file; no helper, no relative-tolerance machinery. Independently-verified and exact (1e-99/1e-10) asserts untouched.

All 12 ShapeLabelMapFixture tests pass locally (macOS arm64); clang-format clean. CI (esp. Windows MSVC) confirms whether the tightened 1e-5 holds.

greptile-apps · 2026-06-25T20:05:33Z

Greptile Summary

This PR relaxes selected ShapeLabelMapFixture geometry assertions.

Adds a ResultTol helper with an absolute floor and relative term.
Applies it to scalar resulting-value checks for perimeter, spherical metrics, and roundness.
Applies it to selected vector resulting-value checks for computed geometry points.

Confidence Score: 5/5

The changed test code looks mergeable after tightening the loosened checks that scale with coordinates or large perimeter values.

The helper compiles for the current scalar and itk::Point<double, 3> call sites.
The changed assertions are limited to tests marked as computed resulting values.
The remaining issues are test-coverage gaps, not production runtime failures.

Modules/Filtering/LabelMap/test/itkShapeLabelMapFilterGTest.cxx

Important Files Changed

Filename	Overview
Modules/Filtering/LabelMap/test/itkShapeLabelMapFilterGTest.cxx	Adds relative tolerances for computed shape-label-map test baselines; scalar checks mostly match the documented intent, while some vector and large-perimeter checks may now mask useful regression signals.

_{Reviews (1): Last reviewed commit: "COMP: Use relative tolerance for ShapeLa..." | Re-trigger Greptile}

greptile-apps · 2026-06-25T20:05:36Z

-    itk::MakePoint(10.13655, 4.21035, -25.67227), labelObject->GetCentroid(), 1e-4); // resulting value
+  ITK_EXPECT_VECTOR_NEAR(itk::MakePoint(10.13655, 4.21035, -25.67227),
+                         labelObject->GetCentroid(),
+                         ResultTol(itk::MakePoint(10.13655, 4.21035, -25.67227))); // resulting value


Coordinate Offset Inflates Tolerance

When the expected point is far from the origin, ResultTol(expectedPoint) scales the allowed centroid error by the absolute world coordinate instead of the object size or spacing. The -25.67227 component makes this assertion accept about 0.0257 of centroid drift, so a geometry regression that shifts the object by hundredths of a physical unit can pass only because this test case is located far from zero.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

blowekamp · 2026-06-25T20:37:50Z

I think I wrote most of these tests. The computation should be done at double precision in the shape filter, 1e-3 is REALLY high tolerance.

Do you have a link to the failing tests?

In the linked issue I do see mention of matrix instability for principle axises in some cases.

hjmjohnson · 2026-06-26T02:36:46Z

@blowekamp Here's the failure data from the 30-day CDash sweep (2026-05-25 → 06-26). 187 failed rows, all ShapeLabelMapFixture, across 5 cases. The platform split is the interesting part:

Test case	Total	Windows	macOS	Linux
`3D_T1x1x1`	35	35	0	0
`3D_T2x2x2_Spacing`	35	35	0	0
`3D_T3x2x1`	35	35	0	0
`3D_T2x2x2_Spacing_Direction`	41	35	3	3
`3D_T3x2x1_Direction`	41	35	3	3

Two distinct failure modes:

The three non-Direction cases fail only on Windows (105/105, dominated by Windows11-VS22x64-RelWithDebInfo-Favorite-Remotes) — 0 on macOS/Linux. That points at MSVC optimized-build FP (contraction/reordering in RelWithDebInfo) on the resulting-value asserts (GetPerimeter, GetEquivalentSphericalPerimeter/Radius, GetRoundness), not a double-precision shortfall.
The two Direction cases fail cross-platform — Windows and macOS and Linux. These are the direction-dependent geometry (principal axes / oriented bounding box), i.e. the "matrix instability for principal axes" you flagged. This is the genuinely cross-platform divergence.

So your instinct looks right: a blanket 1e-3 relative is probably too loose, and the two groups want different treatment — the Windows-only group is plausibly an MSVC FP-model issue, while the Direction group is the real numeric one.

Links to specific failing instances (CDash)

Non-Direction (Windows):

3D_T1x1x1: https://open.cdash.org/tests/2582565204
3D_T2x2x2_Spacing: https://open.cdash.org/tests/2606525315
3D_T3x2x1: https://open.cdash.org/tests/2584117147

Direction (cross-platform):

3D_T2x2x2_Spacing_Direction — Win https://open.cdash.org/tests/2602465509 · macOS https://open.cdash.org/tests/2616041506 · Linux https://open.cdash.org/tests/2616035893
3D_T3x2x1_Direction — Win https://open.cdash.org/tests/2606622292 · macOS https://open.cdash.org/tests/2616041504 · Linux https://open.cdash.org/tests/2616035891

(The macOS/Linux Direction instances are from PR CI runs, e.g. deprecate-vnl-eispack-eigensystem; the Windows rows are mostly the ryzenator.kitware nightlies.)

I couldn't script the exact expected-vs-actual deltas out of the CDash test-output pages (SPA), but I can pull the specific per-assertion numbers from a chosen build, or reproduce locally, if that helps decide the right tolerance (and whether the Direction cases need a separate fix on the principal-axes path rather than a tolerance bump). Tracking in #6518.

hjmjohnson · 2026-06-26T11:20:38Z

@blowekamp Follow-up with precise numbers. I instrumented the two Direction cases and dumped the computed values at full double precision on macOS arm64 (where both currently pass), against the in-test baselines:

3D_T2x2x2_Spacing_Direction (passes locally)

Property	Baseline in test	Computed (macOS, 17 digits)	abs Δ	% of 1e-4
`Perimeter`	`28.3919`	`28.391854811919647`	4.52e-5	45%
`Centroid.y`	`4.21035`	`4.2103580941358141`	8.09e-6	8%
`EquivalentSphericalPerimeter`	`34.86751`	`34.867517413417708`	7.41e-6	7%
`EquivalentSphericalRadius`	`1.66573`	`1.6657337346779293`	3.7e-6	4%
`OrientedBoundingBoxOrigin.x`	`9.75747`	`9.7574735481346888`	3.55e-6	4%
`Roundness`	`1.22808`	`1.2280817031643669`	1.7e-6	2%

(full vectors: Centroid [10.136550336022847, 4.2103580941358141, -25.672275551739141], OBBOrigin [9.7574735481346888, 5.3613400676410672, -28.034804130519035], EqEllipDiam [2.4814019635976918, 2.7295421599572824, 5.4590843199146732])

3D_T3x2x1_Direction (passes locally)

Property	Baseline in test	Computed (macOS, 17 digits)	abs Δ	% of 1e-4
`Centroid.z`	`-14.5249`	`-14.524925635405358`	2.56e-5	26%
`Roundness`	`1.09189`	`1.0918957468809676`	5.75e-6	6%
`OrientedBoundingBoxOrigin.x`	`6.02222`	`6.0222153120831488`	4.69e-6	5%
`Centroid.x`	`5.06906`	`5.0690644435107242`	4.44e-6	4%
`Perimeter`	`14.62414`	`14.624143852112988`	3.85e-6	4%
`EquivalentSphericalPerimeter`	`15.96804`	`15.96804047389762`	4.74e-7	0%

(full vectors: Centroid [5.0690644435107242, -3.2528635005915247, -14.524925635405358], OBBOrigin [6.0222153120831488, -4.4769132554776352, -15.570490372427054])

Takeaway

The baselines are stored at 4–5 decimal places, and that rounding alone consumes up to ~45% of the 1e-4 absolute budget on a platform where the test passes. For Perimeter ≈ 28.39, a 5-significant-digit literal carries ~5e-5 inherent error before any cross-platform FP divergence enters. So 1e-4 absolute is partly absorbing baseline under-precision, and the Windows MSVC builds diverge just enough beyond that to cross it.

You're right that 1e-3 relative is too loose. The cleaner fix is to regenerate these baselines at full double precision (the 17-digit values above) and then use a tight relative tolerance (~1e-5) sized to the genuine cross-platform spread, rather than widening absolute tolerance to paper over the rounding. The numbers above are ready to paste in. Happy to push that as the alternative to #6521 if you prefer it.

The ShapeLabelMapFixture "resulting value" baselines were stored at 4-5 decimal places. For a value like Perimeter ~28.39 that rounding is ~5e-5, skewing the fixed 1e-4 absolute window enough that MSVC RelWithDebInfo builds flapped (top candidate in InsightSoftwareConsortium#6518). Replace the rounded baselines with the full double-precision computed values, and tighten to 1e-5 for the rotation-invariant scalars (perimeter, equivalent-sphere perimeter/radius, roundness) and the magnitude-invariant equivalent-ellipsoid diameter. The orientation-dependent centroid and oriented-bounding-box origin keep 1e-4: they traverse the less-stable principal-axes path, and the full-precision baseline re-centers their tolerance window.

hjmjohnson · 2026-06-26T11:40:09Z

@blowekamp FYI — I force-pushed a different approach onto this PR to test it on CI. Instead of loosening the tolerance, I regenerated the "resulting value" baselines at full double precision and tightened most of the tolerances.

I originally took the easy "increase tolerance" path because the long-standing acceptance of these small deviations was never really a concern — in practice everyone just re-ran the flaky test and moved on. But your point pushed me to check whether higher-precision baselines are the easier real win, and they look like it.

What was done

Regenerated the 26 computed baselines (Perimeter, EquivalentSpherical*, Roundness, EquivalentEllipsoidDiameter, and the _Direction Centroid/OrientedBoundingBoxOrigin) at full 17-digit double precision.
Tightened to 1e-5 the rotation-invariant scalars and the magnitude-invariant ellipsoid diameter (22 asserts).
Kept 1e-4 on the 4 orientation-dependent positions (Centroid/OBBOrigin in the _Direction tests) — they traverse the less-stable principal-axes path, and the full-precision baseline already re-centers their window.
Single file, no relative-tolerance helper.

Why this fixes the flap

The baselines were stored at 4–5 decimals. For Perimeter ≈ 28.39 that rounding is ~5e-5, which skewed the fixed 1e-4 absolute window asymmetrically — the macOS value sits only ~5e-6 inside the lower edge, so the MSVC RelWithDebInfo build only had to differ by ~5e-6 to fail. Full-precision baselines re-center the window and recover that headroom, so a tighter tolerance holds. All 12 ShapeLabelMapFixture tests pass locally (macOS arm64); watching CI to confirm 1e-5 holds on Windows MSVC.

blowekamp · 2026-06-26T12:18:16Z

@hjmjohnson I am not sure how automated these messages are or how engaged you are with this. But the above corse of action does seem logical to me. If the assumption is that we can compute across platforms with 1e-5 accuracy than the above tolerances and precision of the base line is sufficient.

Second, If the results of the failing test are looked at in most cases this not a tolerance issue:
https://open.cdash.org/tests/2582565204

The ShapeLabelMapFixture.3D_T1x1x1's labelObject->GetPerimeter() has a difference of 0.996. That is clearly not a tolerance issue.

This particular test is JUST one pixel set on a 3D image! What is going on here? It needs further investigation. We should be able to do this well.

On this failing test:
https://open.cdash.org/tests/2606525315

 T:\Dashboard\ITK\Modules\Filtering\LabelMap\test\itkShapeLabelMapFilterGTest.cxx(311): error: The RMS difference between itk::MakeVector(2.0, 2.2, 4.4) and labelObject->GetOrientedBoundingBoxSize() is 3.637333435387074e-05,
  which exceeds 1e-10, where
itk::MakeVector(2.0, 2.2, 4.4) evaluates to [2, 2.2000000000000002, 4.4000000000000004],
labelObject->GetOrientedBoundingBoxSize() evaluates to [2.0000287606804887, 2.2000494488745819, 4.4000263948937581], and
1e-10 evaluates to 1e-10.

This does appear to be a tolerance issue. I saw on a couple other tests with the RMS of vector. This has a custom ITK_EXPECT_VECTOR_NEAR macro where the tolerance should likely be adjusted.

blowekamp · 2026-06-26T12:48:47Z

On the one system I check that this test is failing has ITK_USE_FLOAT_SPACE_PRECISION:BOOL=ON configured.

The failing vector comparison above with the bonding box not being with in the tolerance make sense with this configuration. And relaxing the vector near tolerance is appropriate.

I don't have a complete picture of the reasons for the other test failures.

Adding the full precision is a reasonable change, and an near important digit may have been incorrectly dropped in the current values.

EDIT: Also this system with floating point computation enables has 255 other test failures. So Our tests are not configured to pass when floating point precision is enabled ( nor do I think the tolerances should be relaxed to make them)

dzenanz · 2026-06-26T13:02:51Z

The main value of the build with ITK_USE_FLOAT_SPACE_PRECISION:BOOL=ON is to make sure that option compiles. Test tolerances are of lesser concern.

blowekamp · 2026-06-26T13:30:02Z

The main value of the build with ITK_USE_FLOAT_SPACE_PRECISION:BOOL=ON is to make sure that option compiles. Test tolerances are of lesser concern.

@dzenanz I don't think have tests failing in an "Expected" section of the dashboard is expected. Perhaps the testing phase of this build could be skipped? Or moved to a non-expected section.

Also this particular test is not on the current build because there is a compilation error.

dzenanz · 2026-06-26T18:40:21Z

I moved this build to "Nightly" group.

hjmjohnson · 2026-06-26T19:30:56Z

@blowekamp FYI: I'm unlikely to be able to look more deeply into this until later this weekend. I do review all the code changes and read all the commit messages. I'll be honest here, was focused on making the tests not flaky, I don't think this is indicative of code bugs, just testing environment inconsistency boundary conditions. I was too lazy in trying to keep this test from continuing to infect the review of other PRs by it's failures and CDASH noise.

blowekamp · 2026-06-26T20:20:56Z

@blowekamp FYI: I'm unlikely to be able to look more deeply into this until later this weekend. I do review all the code changes and read all the commit messages. I'll be honest here, was focused on making the tests not flaky, I don't think this is indicative of code bugs, just testing environment inconsistency boundary conditions. I was too lazy in trying to keep this test from continuing to infect the review of other PRs by it's failures and CDASH noise.

Do you have an example CI failure where the FLOAT option was not enabled?

EDIT: Found a couple: https://open.cdash.org/tests/2615238403

They are specifically related to the oriented bounding box computation. The change also is related to eigen/eispack-eigensystem which is what is computation uses. I don't think these tests were "flacky" before the numerics changes.

github-actions Bot added type:Compiler Compiler support or related warnings type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct area:Filtering Issues affecting the Filtering module labels Jun 25, 2026

hjmjohnson mentioned this pull request Jun 25, 2026

Track potential intermittent/flaky CI test failures (NOT confirmed bugs — code/CI/environment triage) #6518

Open

hjmjohnson marked this pull request as ready for review June 25, 2026 20:03

greptile-apps Bot reviewed Jun 25, 2026

View reviewed changes

hjmjohnson requested a review from dzenanz June 26, 2026 00:45

hjmjohnson force-pushed the bug-shapelabelmap-relative-tolerance branch from b63c56d to 2ed4092 Compare June 26, 2026 11:39

hjmjohnson changed the title ~~COMP: Relative tolerance for ShapeLabelMap resulting-value asserts~~ COMP: Regenerate ShapeLabelMap baselines at full double precision Jun 26, 2026

blowekamp self-requested a review June 26, 2026 19:17

Uh oh!

Uh oh!

Conversation

hjmjohnson commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 25, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

blowekamp commented Jun 25, 2026

Uh oh!

hjmjohnson commented Jun 26, 2026

Uh oh!

hjmjohnson commented Jun 26, 2026

Takeaway

Uh oh!

hjmjohnson commented Jun 26, 2026

Uh oh!

blowekamp commented Jun 26, 2026

Uh oh!

blowekamp commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dzenanz commented Jun 26, 2026

Uh oh!

blowekamp commented Jun 26, 2026

Uh oh!

dzenanz commented Jun 26, 2026

Uh oh!

hjmjohnson commented Jun 26, 2026

Uh oh!

blowekamp commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hjmjohnson commented Jun 25, 2026 •

edited

Loading

blowekamp commented Jun 26, 2026 •

edited

Loading

blowekamp commented Jun 26, 2026 •

edited

Loading