Skip to content

fix: CUDA bitpacked sliced output allocation#8622

Merged
0ax1 merged 1 commit into
developfrom
ad/fix-cuda-bitpacked-slice-offset
Jun 29, 2026
Merged

fix: CUDA bitpacked sliced output allocation#8622
0ax1 merged 1 commit into
developfrom
ad/fix-cuda-bitpacked-slice-offset

Fix CUDA bitpacked sliced output allocation

f8a35cc
Select commit
Loading
Failed to load commit list.
CodSpeed HQ / CodSpeed Performance Analysis failed Jun 29, 2026 in 0s

1 benchmark regressed

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 2 improved benchmarks
❌ 1 regressed benchmark
✅ 1592 untouched benchmarks
⏩ 4 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation slice_empty_vortex 339.4 ns 397.8 ns -14.66%
Simulation chunked_bool_canonical_into[(1000, 10)] 26.3 µs 15.9 µs +65.8%
Simulation encode_varbin[(1000, 32)] 163.7 µs 146.9 µs +11.45%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ad/fix-cuda-bitpacked-slice-offset (f8a35cc) with develop (a9f77d1)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.