Add `cpu` heap mode (CPU-overhead-controlled heap sizing) by paracycle · Pull Request #77 · ruby/mmtk

paracycle · 2026-05-07T14:58:53Z

Note

Authorship disclosure: All code, tests, benchmarks, and documentation in this PR were written by Claude Opus 4.7 acting as a coding assistant, based on the paper cited below and on iterative dialogue with the human author.

The human author (me) directed the work, made design decisions, ran the benchmarks, validated the results, and is taking responsibility for the contribution — but did not personally write the diff. Reviewers should apply whatever scrutiny that level of authorship warrants.

Summary

Adds a new heap mode, MMTK_HEAP_MODE=cpu, that sizes the heap dynamically based on measured GC CPU overhead. After each (non-nursery) GC cycle, the trigger compares the recent windowed-average GC CPU overhead against a configurable target and adjusts the heap size by a sigmoid-bounded factor in (0.5, 1.5).

Implementation follows Tavakolisomeh, Shimchenko, Österlund, Bruno, Ferreira,
Wrigstad — "Heap Size Adjustment with CPU Control", MPLR '23
(https://doi.org/10.1145/3617651.3622988).

Algorithm

After each non-nursery GC cycle:

GC_CPU             = mean(T_GC) / mean(T_APP) over a window of N cycles  (Eq. 1)
overhead_error     = GC_CPU - target                                    (Eq. 2)
sigmoid_error      = 1 / (1 + e^(-overhead_error))                      (Eq. 3)
adjustment_factor  = sigmoid_error + 0.5     ; in (0.5, 1.5)            (Eq. 4)
new_size           = current_size * adjustment_factor                   (Eq. 5)

T_GC is wall-clock GC duration (between on_gc_start and on_gc_end). T_APP is process CPU time elapsed since the previous GC cycle ended, read via clock_gettime(CLOCK_PROCESS_CPUTIME_ID). This sums CPU time across all threads of the process, so multi-threaded workloads (Ractors, parallel GC workers) are correctly attributed.

The new heap size is clamped to [max(1.1 * used_pages, MMTK_HEAP_MIN), MMTK_HEAP_MAX], giving 10% headroom above current live memory so we never resize so small that we'd immediately re-trigger GC.

Configuration

Env var	Default	Description
`MMTK_HEAP_MODE=cpu`	—	Selects this mode
`MMTK_GC_CPU_TARGET`	`5`	Target GC CPU overhead, percent
`MMTK_GC_CPU_WINDOW`	`3`	Number of recent GC cycles averaged
`MMTK_HEAP_MIN`, `MMTK_HEAP_MAX`	(existing)	Used as clamp bounds

Why default 5 instead of the paper's 15?

The paper targets ZGC, a concurrent generational collector. MMTk-Ruby currently ships stop-the-world Immix; every percent of GC CPU is also a percent of wall-clock time the mutator is blocked.

A sweep of MMTK_GC_CPU_TARGET ∈ {1, 2, 3, 4, 5, 6, 7, 15, 25, 40} across five GC-sensitive ruby-bench benchmarks (railsbench, lobsters, psych-load, liquid-render, lee). Geometric means across benchmarks:

Target	Geomean throughput vs `ruby` mode	Geomean RSS vs `ruby` mode
1%	+8.8% faster	+33.5% RSS
2%	+7.3% faster	+26.9% RSS
5%	+6.2% faster	−0.2% RSS
6%	+7.0% faster	−0.8% RSS
7%	+2.1% faster	−6.2% RSS
15%	−13–30% slower (depending on bench)	~5–8% RSS savings

Target=5% emerges as the cleanest Pareto-optimal point: ~6% throughput win with negligible RSS change versus the existing ruby heap mode. Target=15% (the paper's recommendation) is significantly worse than the existing ruby mode on this collector.

If we ever wire up ConcurrentImmix (already exists in mmtk-core), the optimal target will likely shift up toward the paper's 15%.

Implementation notes

Lives in gc/mmtk/src/heap/cpu_heap_trigger.rs, modeled on the existing RubyHeapTrigger.
Uses GCTriggerSelector::Delegated like the ruby mode. To dispatch between the two delegated triggers, create_gc_trigger checks which OnceCell config (RUBY_HEAP_TRIGGER_CONFIG vs CPU_HEAP_TRIGGER_CONFIG) was populated by the MMTK_HEAP_MODE parser.
Skips nursery-only GCs in generational plans (consistent with MemBalancer).
5 unit tests cover sigmoid math, factor bounds, windowed-mean correctness, window eviction, and empty-window behavior.

Tooling

bin/smoke-test — minimal allocation loop reporting wall time, CPU time, GC count, and peak RSS via getrusage. Useful for quick verification.
bin/ruby-mmtk-mode — POSIX shell wrapper that invokes a given Ruby with RUBY_GC_LIBRARY=mmtk and a chosen MMTK_HEAP_MODE. Shaped to fit ruby-bench's -e "name::cmd" syntax.
bin/compare-heap-modes — driver that wires the wrapper into ruby/ruby-bench so you can compare any two heap modes on a configurable benchmark set with --rss.
doc/testing-cpu-heap-mode.md — end-to-end walkthrough: building a modular-GC Ruby, installing the binding, running smoke tests, and running the comparison sweep.

Tests

Local verification on this branch:

cargo test: 10 unit tests pass, including 5 new ones for the trigger math.
cargo build --release: clean.
cargo clippy --all-targets: no warnings.
bundle exec rake test: 19 tests, 75 assertions, 0 failures, 0 errors, 0 omissions (against a Ruby built with --with-modular-gc, MMTk binding installed). Includes the new test_MMTK_HEAP_MODE_cpu configuration test.

End-to-end verified by running bin/compare-heap-modes against ruby/ruby-bench for railsbench, lobsters, psych-load, liquid-render, and lee. Raw output JSONs are available on request.

Caveats / future work

Empirical sweep was on a single Apple M-series workstation. A wider comparison (Linux x86_64, more benchmarks) would strengthen the case for target=5 as the cross-platform default.
ruby-bench's harness/harness-common.rb:11 calls GC.auto_compact = ... unconditionally, which raises NotImplementedError under MMTk. Benchmarks here ran with a local one-line rescue patch in that file. That's a separate ruby-bench bug worth reporting upstream — it's not specific to this PR.

Risk

Low. The new mode is opt-in via MMTK_HEAP_MODE=cpu; the existing default (dynamic) and the existing ruby mode are untouched. Dispatch in create_gc_trigger falls through to the existing RubyHeapTrigger path when the cpu config singleton isn't set.

Adds MMTK_HEAP_MODE=cpu, a dynamic heap-sizing policy that grows or shrinks the heap after each GC cycle to keep measured GC CPU overhead near a configurable target. The control law follows Tavakolisomeh et al., 'Heap Size Adjustment with CPU Control', MPLR '23: a sigmoid of the (averaged) GC CPU overhead error in (-inf, +inf) maps to a heap-size adjustment factor in (0.5, 1.5). Implementation lives alongside the existing 'ruby' delegated trigger in gc/mmtk/src/heap/. T_GC is wall-clock GC duration; T_APP is process CPU time delta read via clock_gettime(CLOCK_PROCESS_CPUTIME_ID), which correctly credits multi-threaded mutator parallelism. Nursery-only generational GCs are skipped so the trigger only re-sizes at full collections. Configuration: MMTK_GC_CPU_TARGET target GC CPU overhead, percent. Default 5. MMTK_GC_CPU_WINDOW number of recent cycles averaged. Default 3. The default differs from the paper's recommended 15. The paper targets ZGC, a concurrent generational collector; MMTk-Ruby currently ships stop-the-world Immix, where every percent of GC CPU also blocks the mutator. An empirical sweep of MMTK_GC_CPU_TARGET across ruby-bench (railsbench, lobsters, psych-load, liquid-render, lee) found 5-6 to be Pareto-optimal vs the existing 'ruby' heap mode: about 6 percent geomean throughput improvement at essentially equal peak RSS. Targets >=10 trade large amounts of throughput for modest RSS savings on this collector. bin/smoke-test, bin/ruby-mmtk-mode, bin/compare-heap-modes, and doc/testing-cpu-heap-mode.md are included so reviewers and future contributors can reproduce the sweep against ruby/ruby-bench. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

paracycle force-pushed the cpu-heap-mode branch from 3bbae9e to caae9bf Compare May 7, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `cpu` heap mode (CPU-overhead-controlled heap sizing)#77

Add `cpu` heap mode (CPU-overhead-controlled heap sizing)#77
paracycle wants to merge 1 commit intoruby:mainfrom
paracycle:cpu-heap-mode

paracycle commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paracycle commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Algorithm

Configuration

Why default 5 instead of the paper's 15?

Implementation notes

Tooling

Tests

Caveats / future work

Risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

paracycle commented May 7, 2026 •

edited

Loading