Skip to content

Reduce runtime allocation churn#348

Merged
KKould merged 6 commits into
mainfrom
optimize-runtime-allocations
Jun 6, 2026
Merged

Reduce runtime allocation churn#348
KKould merged 6 commits into
mainfrom
optimize-runtime-allocations

Conversation

@KKould
Copy link
Copy Markdown
Member

@KKould KKould commented Jun 5, 2026

What problem does this PR solve?

Reduce allocator churn in binder/planner/optimizer/storage hot paths observed in TPCC LMDB heaptrack runs.

Issue link:

What is changed and how it works?

This PR reduces short-lived allocation pressure by:

  • reusing column-pruning outcome buffers and required-column state
  • avoiding repeated metadata/container clones in DML execution and table scan planning
  • caching histogram bound comparators
  • reusing HEP optimizer local-rule state across batches
  • merging primary-key column inclusion into storage deserializer construction
  • avoiding unnecessary lowercase string allocations when identifiers are already lowercase
  • adding a tpcc-lmdb-heaptrack Makefile target for repeatable profiling

heaptrack_print comparison, fresh main vs this branch with columns_len deserializer capacity:

metric main this branch diff
allocation calls 918,551,655 801,203,429 -117,348,226 (-12.78%)
temporary allocations 185,423,184 135,002,184 -50,421,000 (-27.19%)
peak heap 578.98M 579.19M +217.53K (+0.036%)
runtime 337.55s 332.01s -5.54s

Notable stack changes from the same reports:

  • Transaction::create_deserializers -> RawVec::grow_one: 1,007,139 allocation calls on main; no longer appears in this branch's report after using table.columns_len() capacity.
  • HepOptimizer::apply_local_rules: -52,118,907 allocation calls, -46,689,484 temporary allocations.
  • BTreeMap::clone_subtree / TableCatalog::clone: -51,039,510 allocation calls.
  • TableScanOperator::build: 39,275,715 -> 22,915,483 allocation calls (-41.65%).

TPCC 720s comparison using the current benchmark fixes and RocksDB default RepeatableRead on both sides (origin/main ef9a534 + benchmark/default-RR patch vs this branch):

build backend TpmC vs patched main
patched main LMDB 63,246 baseline
this branch LMDB 68,394 +5,148 (+8.1%)
patched main RocksDB 27,540 baseline
this branch RocksDB 30,387 +2,847 (+10.3%)

TPCC p90 latency from the same 720s runs:

build backend New-Order Payment Order-Status Delivery Stock-Level
patched main LMDB 0.001s 0.001s 0.001s 0.002s 0.001s
this branch LMDB 0.001s 0.001s 0.001s 0.002s 0.001s
patched main RocksDB 0.001s 0.001s 0.002s 0.015s 0.003s
this branch RocksDB 0.001s 0.001s 0.001s 0.015s 0.002s

Code changes

  • Has Rust code change
  • Has CI related scripts change

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Manual test / profiling:

cargo fmt --check
cargo test --lib storage::
heaptrack_print -f /tmp/tpcc_lmdb_heaptrack_main_fresh.zst > /tmp/tpcc_lmdb_heaptrack_main_fresh.report.txt
heaptrack_print -f /tmp/tpcc_lmdb_heaptrack_current_columns_len.zst > /tmp/tpcc_lmdb_heaptrack_current_columns_len.report.txt
heaptrack_print -f /tmp/tpcc_lmdb_heaptrack_current_columns_len.zst --diff /tmp/tpcc_lmdb_heaptrack_main_fresh.zst > /tmp/tpcc_lmdb_heaptrack_current_columns_len.diff_main.txt

# TPCC 720s comparison, num_ware=1, default max_retry=5.
target/release/tpcc --backend kitesql-lmdb --measure-time 720 --num-ware 1 --path /tmp/kitesql_tpcc_bench_branch_lmdb
target/release/tpcc --backend kitesql-rocksdb --measure-time 720 --num-ware 1 --path /tmp/kitesql_tpcc_bench_branch_rocksdb
# Repeated in a temporary origin/main worktree with the same benchmark/default-RR patch applied.
target/release/tpcc --backend kitesql-lmdb --measure-time 720 --num-ware 1 --path /tmp/kitesql_tpcc_bench_main_lmdb
target/release/tpcc --backend kitesql-rocksdb --measure-time 720 --num-ware 1 --path /tmp/kitesql_tpcc_bench_main_rocksdb

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Note for reviewer

The optimization mainly reduces short-lived allocations. Peak heap is essentially unchanged, which matches the shape of the changes.

@KKould KKould self-assigned this Jun 5, 2026
@KKould KKould added the perf label Jun 5, 2026
@KKould KKould merged commit c0e63a0 into main Jun 6, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant