perf: cache static identifiers to avoid per-call interner locks#8614
Conversation
`id()` methods interned their `Id` via `Id::new`/`Id::new_static` on every call, locking the global interner on hot paths. Use a `CachedId` static that interns once and returns the cached `Id` lock-free; same value. Refs: vortex-data#8380 Signed-off-by: Han Damin <miniex@daminstudio.net>
Merging this PR will improve performance by 10.1%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
169.1 µs | 205.6 µs | -17.73% |
| ❌ | Simulation | slice_empty_vortex |
339.4 ns | 397.8 ns | -14.66% |
| ❌ | Simulation | compact_sliced[(4096, 90)] |
750 ns | 866.7 ns | -13.46% |
| ⚡ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
26.3 µs | 16.7 µs | +57.19% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 186.1 ns | +31.34% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 246.4 ns | +23.68% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[2048] |
398.6 ns | 340.3 ns | +17.14% |
| ⚡ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
259.7 µs | 224.3 µs | +15.78% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
306.7 µs | 271.5 µs | +12.98% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing miniex:perf/cache-static-ids (ab41b71) with develop (bf2be52)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Summary
id()methods for encodings, scalar/aggregate functions, layouts and extension dtypes built theirIdviaId::new/Id::new_static, which interns into a globally locked table (ThreadedRodeo) on every call. As noted in #8380 this made identifier construction a hot-path bottleneck under multi-threaded and random access. They now read aCachedIdstatic that interns once on first access and returns the cachedIdlock-free thereafter; the interned value is unchanged, so encoding and serialization behaviour is identical.This is the migration half of #8380; the workspace
clippylint that forbids the locking constructors (so the regression can't return) follows in a separate PR.Refs: #8380
Testing
cargo nextest run -p vortex-session -p vortex-array -p vortex-layout -p vortex-geo -p vortex-tensor -p vortex-jsonpasses;cargo +nightly fmt --all --checkandcargo clippy --all-targetsclean.vortex-cudacould not be built locally (no CUDA toolchain); its one migratedid()uses the same pattern.I'm Korean, so sorry if any wording reads a little awkward.