Add cpu heap mode (CPU-overhead-controlled heap sizing)#77
Open
Add cpu heap mode (CPU-overhead-controlled heap sizing)#77
cpu heap mode (CPU-overhead-controlled heap sizing)#77Conversation
Adds MMTK_HEAP_MODE=cpu, a dynamic heap-sizing policy that grows or shrinks the heap after each GC cycle to keep measured GC CPU overhead near a configurable target. The control law follows Tavakolisomeh et al., 'Heap Size Adjustment with CPU Control', MPLR '23: a sigmoid of the (averaged) GC CPU overhead error in (-inf, +inf) maps to a heap-size adjustment factor in (0.5, 1.5). Implementation lives alongside the existing 'ruby' delegated trigger in gc/mmtk/src/heap/. T_GC is wall-clock GC duration; T_APP is process CPU time delta read via clock_gettime(CLOCK_PROCESS_CPUTIME_ID), which correctly credits multi-threaded mutator parallelism. Nursery-only generational GCs are skipped so the trigger only re-sizes at full collections. Configuration: MMTK_GC_CPU_TARGET target GC CPU overhead, percent. Default 5. MMTK_GC_CPU_WINDOW number of recent cycles averaged. Default 3. The default differs from the paper's recommended 15. The paper targets ZGC, a concurrent generational collector; MMTk-Ruby currently ships stop-the-world Immix, where every percent of GC CPU also blocks the mutator. An empirical sweep of MMTK_GC_CPU_TARGET across ruby-bench (railsbench, lobsters, psych-load, liquid-render, lee) found 5-6 to be Pareto-optimal vs the existing 'ruby' heap mode: about 6 percent geomean throughput improvement at essentially equal peak RSS. Targets >=10 trade large amounts of throughput for modest RSS savings on this collector. bin/smoke-test, bin/ruby-mmtk-mode, bin/compare-heap-modes, and doc/testing-cpu-heap-mode.md are included so reviewers and future contributors can reproduce the sweep against ruby/ruby-bench. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Authorship disclosure: All code, tests, benchmarks, and documentation in this PR were written by Claude Opus 4.7 acting as a coding assistant, based on the paper cited below and on iterative dialogue with the human author.
The human author (me) directed the work, made design decisions, ran the benchmarks, validated the results, and is taking responsibility for the contribution — but did not personally write the diff. Reviewers should apply whatever scrutiny that level of authorship warrants.
Summary
Adds a new heap mode,
MMTK_HEAP_MODE=cpu, that sizes the heap dynamically based on measured GC CPU overhead. After each (non-nursery) GC cycle, the trigger compares the recent windowed-average GC CPU overhead against a configurable target and adjusts the heap size by a sigmoid-bounded factor in (0.5, 1.5).Implementation follows Tavakolisomeh, Shimchenko, Österlund, Bruno, Ferreira,
Wrigstad — "Heap Size Adjustment with CPU Control", MPLR '23
(https://doi.org/10.1145/3617651.3622988).
Algorithm
After each non-nursery GC cycle:
T_GCis wall-clock GC duration (betweenon_gc_startandon_gc_end).T_APPis process CPU time elapsed since the previous GC cycle ended, read viaclock_gettime(CLOCK_PROCESS_CPUTIME_ID). This sums CPU time across all threads of the process, so multi-threaded workloads (Ractors, parallel GC workers) are correctly attributed.The new heap size is clamped to
[max(1.1 * used_pages, MMTK_HEAP_MIN), MMTK_HEAP_MAX], giving 10% headroom above current live memory so we never resize so small that we'd immediately re-trigger GC.Configuration
MMTK_HEAP_MODE=cpuMMTK_GC_CPU_TARGET5MMTK_GC_CPU_WINDOW3MMTK_HEAP_MIN,MMTK_HEAP_MAXWhy default 5 instead of the paper's 15?
The paper targets ZGC, a concurrent generational collector. MMTk-Ruby currently ships stop-the-world Immix; every percent of GC CPU is also a percent of wall-clock time the mutator is blocked.
A sweep of
MMTK_GC_CPU_TARGET ∈ {1, 2, 3, 4, 5, 6, 7, 15, 25, 40}across five GC-sensitive ruby-bench benchmarks (railsbench,lobsters,psych-load,liquid-render,lee). Geometric means across benchmarks:rubymoderubymodeTarget=5% emerges as the cleanest Pareto-optimal point: ~6% throughput win with negligible RSS change versus the existing
rubyheap mode. Target=15% (the paper's recommendation) is significantly worse than the existingrubymode on this collector.If we ever wire up
ConcurrentImmix(already exists in mmtk-core), the optimal target will likely shift up toward the paper's 15%.Implementation notes
gc/mmtk/src/heap/cpu_heap_trigger.rs, modeled on the existingRubyHeapTrigger.GCTriggerSelector::Delegatedlike therubymode. To dispatch between the two delegated triggers,create_gc_triggerchecks whichOnceCellconfig (RUBY_HEAP_TRIGGER_CONFIGvsCPU_HEAP_TRIGGER_CONFIG) was populated by theMMTK_HEAP_MODEparser.Tooling
bin/smoke-test— minimal allocation loop reporting wall time, CPU time, GC count, and peak RSS viagetrusage. Useful for quick verification.bin/ruby-mmtk-mode— POSIX shell wrapper that invokes a given Ruby withRUBY_GC_LIBRARY=mmtkand a chosenMMTK_HEAP_MODE. Shaped to fitruby-bench's-e "name::cmd"syntax.bin/compare-heap-modes— driver that wires the wrapper intoruby/ruby-benchso you can compare any two heap modes on a configurable benchmark set with--rss.doc/testing-cpu-heap-mode.md— end-to-end walkthrough: building a modular-GC Ruby, installing the binding, running smoke tests, and running the comparison sweep.Tests
Local verification on this branch:
cargo test: 10 unit tests pass, including 5 new ones for the trigger math.cargo build --release: clean.cargo clippy --all-targets: no warnings.bundle exec rake test: 19 tests, 75 assertions, 0 failures, 0 errors, 0 omissions (against a Ruby built with--with-modular-gc, MMTk binding installed). Includes the newtest_MMTK_HEAP_MODE_cpuconfiguration test.End-to-end verified by running
bin/compare-heap-modesagainstruby/ruby-benchforrailsbench,lobsters,psych-load,liquid-render, andlee. Raw output JSONs are available on request.Caveats / future work
target=5as the cross-platform default.ruby-bench'sharness/harness-common.rb:11callsGC.auto_compact = ...unconditionally, which raisesNotImplementedErrorunder MMTk. Benchmarks here ran with a local one-linerescuepatch in that file. That's a separate ruby-bench bug worth reporting upstream — it's not specific to this PR.Risk
Low. The new mode is opt-in via
MMTK_HEAP_MODE=cpu; the existing default (dynamic) and the existingrubymode are untouched. Dispatch increate_gc_triggerfalls through to the existingRubyHeapTriggerpath when the cpu config singleton isn't set.