Skip to content

Prepare for large-tree support#193

Open
ms609 wants to merge 6 commits intomainfrom
large-tree-support
Open

Prepare for large-tree support#193
ms609 wants to merge 6 commits intomainfrom
large-tree-support

Conversation

@ms609
Copy link
Copy Markdown
Owner

@ms609 ms609 commented Apr 1, 2026

…port

The two splitbit b_complement[SL_MAX_SPLITS][SL_MAX_BINS] stack arrays in robinson_foulds_distance() and robinson_foulds_info() would overflow the stack when compiled against TreeTools with SL_MAX_TIPS = 32768 (128 GB).

Replace with std::vector sized to actual dimensions (b.n_splits * n_bins). These are serial per-pair paths (reportMatching = TRUE), so heap allocation cost is negligible.

Also upgrade assert() to static_assert() in tree_distances.h for the int16 width checks — these now fire at compile time rather than silently passing in release builds.

ms609 added 3 commits April 1, 2026 14:08
…port

The two splitbit b_complement[SL_MAX_SPLITS][SL_MAX_BINS] stack arrays in
robinson_foulds_distance() and robinson_foulds_info() would overflow the
stack when compiled against TreeTools with SL_MAX_TIPS = 32768 (128 GB).

Replace with std::vector<splitbit> sized to actual dimensions (b.n_splits *
n_bins).  These are serial per-pair paths (reportMatching = TRUE), so heap
allocation cost is negligible.

Also upgrade assert() to static_assert() in tree_distances.h for the int16
width checks — these now fire at compile time rather than silently passing
in release builds.
All 7 serial per-pair distance functions in tree_distances.cpp now check
for user interrupt every 1024 iterations of the outer split loop.  This
allows Ctrl+C to break long-running single-pair computations on large
trees (e.g. 25 000 tips) that previously ran uninterruptibly.

The LAP solver already had interrupt support (allow_interrupt = true).
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 95.38462% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.92%. Comparing base (8803015) to head (3d9e1d1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/tree_distances.cpp 91.42% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #193      +/-   ##
==========================================
+ Coverage   95.78%   95.92%   +0.13%     
==========================================
  Files          57       57              
  Lines        5508     5549      +41     
==========================================
+ Hits         5276     5323      +47     
+ Misses        232      226       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

⚠️ This benchmark result is outdated. See the latest comment below.

Performance benchmark results

Call Status Change Time (ms)
ClusteringInfoDistance(tr200) ⚪ NSD -1.13% 15.2 →
15.4, 15.4
ClusteringInfoDistance(tr50) ⚪ NSD 0.29% 11.7 →
11.5, 11.6
LAPJV(test2000) ⚪ NSD -0.87% 93.6 →
96.4, 93.6
LAPJV(test40) ⚪ NSD -0.51% 0.0177 →
0.0179, 0.0178
LAPJV(test400) ⚪ NSD -0.02% 3.19 →
3.18, 3.2
MutualClusteringInfo(tr200) ⚪ NSD 1.56% 22.4 →
22.3, 21.9
MutualClusteringInfo(tr50) ⚪ NSD 0.93% 23.3 →
23.3, 22.8
PathDist(postTrees) ⚪ NSD 3.64% 3.47 →
3.35, 3.34
PhylogeneticInfoDistance(tr200) 🟣 ~same -1.48% 235 →
238, 239
PhylogeneticInfoDistance(tr50) ⚪ NSD -0.23% 81.2 →
81.5, 81.4
RobinsonFoulds(tr200) ⚪ NSD -1.23% 2.87 →
2.93, 2.88
RobinsonFoulds(tr200) ⚪ NSD -2.36% 2.63 →
2.68, 2.7
RobinsonFoulds(tr50) ⚪ NSD -2.26% 4.35 →
4.44, 4.44

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

⚠️ This benchmark result is outdated. See the latest comment below.

Performance benchmark results

Call Status Change Time (ms)
ClusteringInfoDistance(tr200) ⚪ NSD 0.69% 15.3 →
15.2, 15.2
ClusteringInfoDistance(tr50) ⚪ NSD -0.74% 11.5 →
11.6, 11.5
LAPJV(test2000) 🟠 Slower 🙁 -5.76% 90.3 →
95.5, 95.2
LAPJV(test40) ⚪ NSD -0.06% 0.0176 →
0.0176, 0.0177
LAPJV(test400) ⚪ NSD -0.08% 3.18 →
3.18, 3.18
MutualClusteringInfo(tr200) ⚪ NSD 0.27% 21.5 →
21.4, 21.4
MutualClusteringInfo(tr50) ⚪ NSD -2.45% 21.2 →
22.5, 21.3
PathDist(postTrees) ⚪ NSD -1.41% 3.33 →
3.41, 3.36
PhylogeneticInfoDistance(tr200) ⚪ NSD -0.31% 233 →
233, 234
PhylogeneticInfoDistance(tr50) ⚪ NSD 0.47% 81.6 →
81.5, 81
RobinsonFoulds(tr200) ⚪ NSD -0.66% 2.84 →
2.87, 2.85
RobinsonFoulds(tr200) ⚪ NSD -0.23% 2.63 →
2.65, 2.63
RobinsonFoulds(tr50) ⚪ NSD -1.56% 4.34 →
4.41, 4.39

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Performance benchmark results

Call Status Change Time (ms)
ClusteringInfoDistance(tr200) ⚪ NSD -0.18% 15.2 →
15.3, 15.1
ClusteringInfoDistance(tr50) ⚪ NSD -1.16% 11.4 →
11.3, 11.6
LAPJV(test2000) 🟣 ~same -3.76% 89.2 →
92, 93.4
LAPJV(test40) ⚪ NSD -1.88% 0.0176 →
0.018, 0.0179
LAPJV(test400) 🟣 ~same -1.38% 3.17 →
3.22, 3.21
MutualClusteringInfo(tr200) ⚪ NSD -1.4% 21.1 →
21.4, 21.3
MutualClusteringInfo(tr50) ⚪ NSD 2.39% 21.3 →
20.9, 20.7
PathDist(postTrees) ⚪ NSD -3.39% 3.31 →
3.42, 3.38
PhylogeneticInfoDistance(tr200) 🟣 ~same 0.38% 234 →
233, 233
PhylogeneticInfoDistance(tr50) 🟣 ~same -0.63% 80.7 →
80.9, 81.2
RobinsonFoulds(tr200) ⚪ NSD -0.98% 2.83 →
2.85, 2.86
RobinsonFoulds(tr200) ⚪ NSD -1.65% 2.61 →
2.64, 2.66
RobinsonFoulds(tr50) ⚪ NSD -0.91% 4.3 →
4.35, 4.32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant