Chainbase internal optimizations: session allocs, spinlock, snapshot preallocate by heifner · Pull Request #283 · Wire-Network/wire-sysio

heifner · 2026-04-03T22:13:44Z

Summary

Eliminate per-transaction heap allocations in undo sessions — Replace vector<unique_ptr<abstract_session>> (18 heap allocations per transaction) with a lightweight database* + bool pair. The abstract_session / session_impl virtual dispatch layer was redundant since database::undo() and database::squash() already iterate _index_list with the same dispatch through abstract_index.
Remove null terminator from shared_cow_string — shared_blob stores binary data (KV keys, values, ABI blobs) where null termination is unnecessary. Saves 1 byte per allocation; with 8-byte slab bucket rounding this saves 8 bytes per allocation that crosses a bucket boundary (e.g., 24-byte keys: 33 -> 32 bytes).
Replace std::mutex with spinlock in small_size_allocator — Reduces per-bucket overhead from ~40 bytes (pthread_mutex_t) to 1 byte (atomic_flag), saving ~5KB across 128 buckets in shared memory. Uncontended spinlock is ~7-10ns vs ~25ns for mutex, saving ~15ns per alloc+dealloc cycle.
Preallocate chainbase node storage during snapshot loading — Expose per-section row count from snapshot readers, then batch-allocate node storage upfront before the row creation loop. Avoids repeated get_some() calls to the segment manager during row-by-row insertion. Covers all index loading paths (controller, KV, authorization, resource limits).

Replace vector<unique_ptr<abstract_session>> with a lightweight database* + bool pair. The abstract_session / session_impl virtual dispatch layer was redundant — database::undo() and database::squash() already iterate _index_list with the same virtual dispatch through abstract_index. Removes 18 heap allocations per transaction (1 vector + 17 session_impl objects for each registered index type).

shared_cow_string is used as shared_blob for binary data (KV keys, values, ABI blobs) where null termination is unnecessary. No c_str() method exists — all access is via data() + size(). Saves 1 byte per allocation, which with 8-byte slab bucket rounding saves 8 bytes per allocation that crosses a bucket boundary (e.g., 24-byte keys: 33 -> 32 bytes, fitting in a smaller bucket).

Reduces per-bucket overhead from ~40 bytes (pthread_mutex_t) to 1 byte (atomic_flag), saving ~5KB across 128 buckets in shared memory. Uncontended spinlock is ~7-10ns vs ~25ns for mutex, saving ~15ns per alloc+dealloc cycle.

Expose per-section row_count from snapshot readers via section_reader::row_count(), then call preallocate() before the row creation loop for all index types. This batch-allocates node storage from the segment manager upfront, avoiding repeated get_some() calls during row-by-row insertion. Covers controller_index_set, kv_database_index_set, authorization_index_set, and resource_index_set loading paths.

Session destructor calls undo() which throws if _read_only_mode is true. This causes a crash when nodeop receives SIGTERM during a read window while a block-building session is still alive. Add undo_from_session() and squash_from_session() that bypass the read-only guard so RAII cleanup always succeeds regardless of database mode.

The bare `while (_flag.test_and_set(acquire));` busy-wait can livelock under ASan or on heavily-loaded CI runners: when the holder thread is preempted, the spinner burns its entire time slice on the atomic flag and the holder cannot make progress. Use TTAS to avoid cache-line ping-pong on test_and_set, then pause for short waits (x86 PAUSE / ARM YIELD) and yield to the scheduler after ~16 spins so the holder can run.

Two fixes to the billing-accumulation loop: 1. Move the threshold check to the top of the loop body so we break before calling push_trx when billing has already crossed the limit. With the check at the bottom, push_trx can throw tx_cpu_usage_exceeded on the last iteration instead of sysio_assert_message_exception, tripping the BOOST_CHECK_THROW and failing the test. 2. Increase num_itrs from 1000 to 5000. delta_per_action guarantees >= 1 us billed to `other` per transaction, so the 1000 us threshold needs at most 1000 iterations. The original bound was exact; 5x headroom covers rounding variation on ASan / sys-vm-oc CI builds.

heifner added 7 commits April 3, 2026 17:12

Replace std::mutex with spinlock in small_size_allocator

ae46c01

Reduces per-bucket overhead from ~40 bytes (pthread_mutex_t) to 1 byte (atomic_flag), saving ~5KB across 128 buckets in shared memory. Uncontended spinlock is ~7-10ns vs ~25ns for mutex, saving ~15ns per alloc+dealloc cycle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chainbase internal optimizations: session allocs, spinlock, snapshot preallocate#283

Chainbase internal optimizations: session allocs, spinlock, snapshot preallocate#283
heifner wants to merge 7 commits into
masterfrom
feature/chainbase-internals

heifner commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heifner commented Apr 3, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant