Skip to content

Add cpu heap mode (CPU-overhead-controlled heap sizing)#77

Open
paracycle wants to merge 1 commit intoruby:mainfrom
paracycle:cpu-heap-mode
Open

Add cpu heap mode (CPU-overhead-controlled heap sizing)#77
paracycle wants to merge 1 commit intoruby:mainfrom
paracycle:cpu-heap-mode

Conversation

@paracycle
Copy link
Copy Markdown

@paracycle paracycle commented May 7, 2026

Note

Authorship disclosure: All code, tests, benchmarks, and documentation in this PR were written by Claude Opus 4.7 acting as a coding assistant, based on the paper cited below and on iterative dialogue with the human author.

The human author (me) directed the work, made design decisions, ran the benchmarks, validated the results, and is taking responsibility for the contribution — but did not personally write the diff. Reviewers should apply whatever scrutiny that level of authorship warrants.

Summary

Adds a new heap mode, MMTK_HEAP_MODE=cpu, that sizes the heap dynamically based on measured GC CPU overhead. After each (non-nursery) GC cycle, the trigger compares the recent windowed-average GC CPU overhead against a configurable target and adjusts the heap size by a sigmoid-bounded factor in (0.5, 1.5).

Implementation follows Tavakolisomeh, Shimchenko, Österlund, Bruno, Ferreira,
Wrigstad — "Heap Size Adjustment with CPU Control", MPLR '23
(https://doi.org/10.1145/3617651.3622988).

Algorithm

After each non-nursery GC cycle:

GC_CPU             = mean(T_GC) / mean(T_APP) over a window of N cycles  (Eq. 1)
overhead_error     = GC_CPU - target                                    (Eq. 2)
sigmoid_error      = 1 / (1 + e^(-overhead_error))                      (Eq. 3)
adjustment_factor  = sigmoid_error + 0.5     ; in (0.5, 1.5)            (Eq. 4)
new_size           = current_size * adjustment_factor                   (Eq. 5)

T_GC is wall-clock GC duration (between on_gc_start and on_gc_end). T_APP is process CPU time elapsed since the previous GC cycle ended, read via clock_gettime(CLOCK_PROCESS_CPUTIME_ID). This sums CPU time across all threads of the process, so multi-threaded workloads (Ractors, parallel GC workers) are correctly attributed.

The new heap size is clamped to [max(1.1 * used_pages, MMTK_HEAP_MIN), MMTK_HEAP_MAX], giving 10% headroom above current live memory so we never resize so small that we'd immediately re-trigger GC.

Configuration

Env var Default Description
MMTK_HEAP_MODE=cpu Selects this mode
MMTK_GC_CPU_TARGET 5 Target GC CPU overhead, percent
MMTK_GC_CPU_WINDOW 3 Number of recent GC cycles averaged
MMTK_HEAP_MIN, MMTK_HEAP_MAX (existing) Used as clamp bounds

Why default 5 instead of the paper's 15?

The paper targets ZGC, a concurrent generational collector. MMTk-Ruby currently ships stop-the-world Immix; every percent of GC CPU is also a percent of wall-clock time the mutator is blocked.

A sweep of MMTK_GC_CPU_TARGET ∈ {1, 2, 3, 4, 5, 6, 7, 15, 25, 40} across five GC-sensitive ruby-bench benchmarks (railsbench, lobsters, psych-load, liquid-render, lee). Geometric means across benchmarks:

Target Geomean throughput vs ruby mode Geomean RSS vs ruby mode
1% +8.8% faster +33.5% RSS
2% +7.3% faster +26.9% RSS
5% +6.2% faster −0.2% RSS
6% +7.0% faster −0.8% RSS
7% +2.1% faster −6.2% RSS
15% −13–30% slower (depending on bench) ~5–8% RSS savings

Target=5% emerges as the cleanest Pareto-optimal point: ~6% throughput win with negligible RSS change versus the existing ruby heap mode. Target=15% (the paper's recommendation) is significantly worse than the existing ruby mode on this collector.

If we ever wire up ConcurrentImmix (already exists in mmtk-core), the optimal target will likely shift up toward the paper's 15%.

Implementation notes

  • Lives in gc/mmtk/src/heap/cpu_heap_trigger.rs, modeled on the existing RubyHeapTrigger.
  • Uses GCTriggerSelector::Delegated like the ruby mode. To dispatch between the two delegated triggers, create_gc_trigger checks which OnceCell config (RUBY_HEAP_TRIGGER_CONFIG vs CPU_HEAP_TRIGGER_CONFIG) was populated by the MMTK_HEAP_MODE parser.
  • Skips nursery-only GCs in generational plans (consistent with MemBalancer).
  • 5 unit tests cover sigmoid math, factor bounds, windowed-mean correctness, window eviction, and empty-window behavior.

Tooling

  • bin/smoke-test — minimal allocation loop reporting wall time, CPU time, GC count, and peak RSS via getrusage. Useful for quick verification.
  • bin/ruby-mmtk-mode — POSIX shell wrapper that invokes a given Ruby with RUBY_GC_LIBRARY=mmtk and a chosen MMTK_HEAP_MODE. Shaped to fit ruby-bench's -e "name::cmd" syntax.
  • bin/compare-heap-modes — driver that wires the wrapper into ruby/ruby-bench so you can compare any two heap modes on a configurable benchmark set with --rss.
  • doc/testing-cpu-heap-mode.md — end-to-end walkthrough: building a modular-GC Ruby, installing the binding, running smoke tests, and running the comparison sweep.

Tests

Local verification on this branch:

  • cargo test: 10 unit tests pass, including 5 new ones for the trigger math.
  • cargo build --release: clean.
  • cargo clippy --all-targets: no warnings.
  • bundle exec rake test: 19 tests, 75 assertions, 0 failures, 0 errors, 0 omissions (against a Ruby built with --with-modular-gc, MMTk binding installed). Includes the new test_MMTK_HEAP_MODE_cpu configuration test.

End-to-end verified by running bin/compare-heap-modes against ruby/ruby-bench for railsbench, lobsters, psych-load, liquid-render, and lee. Raw output JSONs are available on request.

Caveats / future work

  • Empirical sweep was on a single Apple M-series workstation. A wider comparison (Linux x86_64, more benchmarks) would strengthen the case for target=5 as the cross-platform default.
  • ruby-bench's harness/harness-common.rb:11 calls GC.auto_compact = ... unconditionally, which raises NotImplementedError under MMTk. Benchmarks here ran with a local one-line rescue patch in that file. That's a separate ruby-bench bug worth reporting upstream — it's not specific to this PR.

Risk

Low. The new mode is opt-in via MMTK_HEAP_MODE=cpu; the existing default (dynamic) and the existing ruby mode are untouched. Dispatch in create_gc_trigger falls through to the existing RubyHeapTrigger path when the cpu config singleton isn't set.

Adds MMTK_HEAP_MODE=cpu, a dynamic heap-sizing policy that grows
or shrinks the heap after each GC cycle to keep measured GC CPU
overhead near a configurable target. The control law follows
Tavakolisomeh et al., 'Heap Size Adjustment with CPU Control', MPLR
'23: a sigmoid of the (averaged) GC CPU overhead error in (-inf, +inf)
maps to a heap-size adjustment factor in (0.5, 1.5).

Implementation lives alongside the existing 'ruby' delegated trigger
in gc/mmtk/src/heap/. T_GC is wall-clock GC duration; T_APP is process
CPU time delta read via clock_gettime(CLOCK_PROCESS_CPUTIME_ID), which
correctly credits multi-threaded mutator parallelism. Nursery-only
generational GCs are skipped so the trigger only re-sizes at full
collections.

Configuration:

  MMTK_GC_CPU_TARGET   target GC CPU overhead, percent. Default 5.
  MMTK_GC_CPU_WINDOW   number of recent cycles averaged. Default 3.

The default differs from the paper's recommended 15. The paper
targets ZGC, a concurrent generational collector; MMTk-Ruby currently
ships stop-the-world Immix, where every percent of GC CPU also blocks
the mutator. An empirical sweep of MMTK_GC_CPU_TARGET across
ruby-bench (railsbench, lobsters, psych-load, liquid-render, lee)
found 5-6 to be Pareto-optimal vs the existing 'ruby' heap mode:
about 6 percent geomean throughput improvement at essentially equal
peak RSS. Targets >=10 trade large amounts of throughput for modest
RSS savings on this collector.

bin/smoke-test, bin/ruby-mmtk-mode, bin/compare-heap-modes, and
doc/testing-cpu-heap-mode.md are included so reviewers and future
contributors can reproduce the sweep against ruby/ruby-bench.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant