Reduce runtime allocation churn by KKould · Pull Request #348 · KipData/KiteSQL

KKould · 2026-06-05T21:43:27Z

What problem does this PR solve?

Reduce allocator churn in binder/planner/optimizer/storage hot paths observed in TPCC LMDB heaptrack runs.

Issue link:

What is changed and how it works?

This PR reduces short-lived allocation pressure by:

reusing column-pruning outcome buffers and required-column state
avoiding repeated metadata/container clones in DML execution and table scan planning
caching histogram bound comparators
reusing HEP optimizer local-rule state across batches
merging primary-key column inclusion into storage deserializer construction
avoiding unnecessary lowercase string allocations when identifiers are already lowercase
adding a tpcc-lmdb-heaptrack Makefile target for repeatable profiling

heaptrack_print comparison, fresh main vs this branch with columns_len deserializer capacity:

metric	main	this branch	diff
allocation calls	918,551,655	801,203,429	-117,348,226 (-12.78%)
temporary allocations	185,423,184	135,002,184	-50,421,000 (-27.19%)
peak heap	578.98M	579.19M	+217.53K (+0.036%)
runtime	337.55s	332.01s	-5.54s

Notable stack changes from the same reports:

Transaction::create_deserializers -> RawVec::grow_one: 1,007,139 allocation calls on main; no longer appears in this branch's report after using table.columns_len() capacity.
HepOptimizer::apply_local_rules: -52,118,907 allocation calls, -46,689,484 temporary allocations.
BTreeMap::clone_subtree / TableCatalog::clone: -51,039,510 allocation calls.
TableScanOperator::build: 39,275,715 -> 22,915,483 allocation calls (-41.65%).

TPCC 720s comparison using the current benchmark fixes and RocksDB default RepeatableRead on both sides (origin/main ef9a534 + benchmark/default-RR patch vs this branch):

build	backend	TpmC	vs patched main
patched main	LMDB	63,246	baseline
this branch	LMDB	68,394	+5,148 (+8.1%)
patched main	RocksDB	27,540	baseline
this branch	RocksDB	30,387	+2,847 (+10.3%)

TPCC p90 latency from the same 720s runs:

build	backend	New-Order	Payment	Order-Status	Delivery	Stock-Level
patched main	LMDB	0.001s	0.001s	0.001s	0.002s	0.001s
this branch	LMDB	0.001s	0.001s	0.001s	0.002s	0.001s
patched main	RocksDB	0.001s	0.001s	0.002s	0.015s	0.003s
this branch	RocksDB	0.001s	0.001s	0.001s	0.015s	0.002s

Code changes

Has Rust code change
Has CI related scripts change

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Manual test / profiling:

cargo fmt --check
cargo test --lib storage::
heaptrack_print -f /tmp/tpcc_lmdb_heaptrack_main_fresh.zst > /tmp/tpcc_lmdb_heaptrack_main_fresh.report.txt
heaptrack_print -f /tmp/tpcc_lmdb_heaptrack_current_columns_len.zst > /tmp/tpcc_lmdb_heaptrack_current_columns_len.report.txt
heaptrack_print -f /tmp/tpcc_lmdb_heaptrack_current_columns_len.zst --diff /tmp/tpcc_lmdb_heaptrack_main_fresh.zst > /tmp/tpcc_lmdb_heaptrack_current_columns_len.diff_main.txt

# TPCC 720s comparison, num_ware=1, default max_retry=5.
target/release/tpcc --backend kitesql-lmdb --measure-time 720 --num-ware 1 --path /tmp/kitesql_tpcc_bench_branch_lmdb
target/release/tpcc --backend kitesql-rocksdb --measure-time 720 --num-ware 1 --path /tmp/kitesql_tpcc_bench_branch_rocksdb
# Repeated in a temporary origin/main worktree with the same benchmark/default-RR patch applied.
target/release/tpcc --backend kitesql-lmdb --measure-time 720 --num-ware 1 --path /tmp/kitesql_tpcc_bench_main_lmdb
target/release/tpcc --backend kitesql-rocksdb --measure-time 720 --num-ware 1 --path /tmp/kitesql_tpcc_bench_main_rocksdb

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Note for reviewer

The optimization mainly reduces short-lived allocations. Peak heap is essentially unchanged, which matches the shape of the changes.

KKould added 3 commits June 6, 2026 05:42

Reduce runtime metadata overhead

5f2b12a

Optimize column pruning outcome reuse

d7a9f90

Reduce runtime allocation churn

71e5d63

KKould self-assigned this Jun 5, 2026

KKould added the perf label Jun 5, 2026

KKould added 3 commits June 6, 2026 23:32

Update TPCC benchmark results

2d25a9f

Add RocksDB TPCC comparison path

38ec5e9

test: make storage visibility test use read committed

7962591

KKould merged commit c0e63a0 into main Jun 6, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce runtime allocation churn#348

Reduce runtime allocation churn#348
KKould merged 6 commits into
mainfrom
optimize-runtime-allocations

KKould commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KKould commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Code changes

Check List

Note for reviewer

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KKould commented Jun 5, 2026 •

edited

Loading