Skip to content

Make the CLFUS RAM cache adapt to a shifting working set#13235

Draft
phongn wants to merge 3 commits into
apache:masterfrom
phongn:fix-clfus-resident-aging
Draft

Make the CLFUS RAM cache adapt to a shifting working set#13235
phongn wants to merge 3 commits into
apache:masterfrom
phongn:fix-clfus-resident-aging

Conversation

@phongn
Copy link
Copy Markdown
Collaborator

@phongn phongn commented Jun 3, 2026

Summary

Building on #13233 (which restored the CLFUS value metric), this makes the CLFUS RAM cache actually follow a working set that changes over time. Previously CLFUS captured an initial set of objects and then effectively froze on it: on a working-set change it kept serving the stale set and never admitted the new one.

Stacked on #13233 — please review/merge that first; this branch contains the value-metric fix as its base commit.

Note: This fix for CLFUS is somewhat orthogonal to its original design by keeping the history list. This 'fixes' it, but using a better-designed algorithm like Window-TinyLFU or S3-FIFO might be a better solution (see also: FastLZ vs more modern algorithms like LZ4).

Root cause

Two independent problems, both of which had to be fixed:

  1. Resident frequency never ages. A resident object's hits only ever increased, so an object that was hot days ago kept winning replacement long after going cold. Aging existed only for the history/ghost list (_tick()).
  2. New candidates can't be admitted. _tick() freed a history (ghost) entry the moment its aged hits reached 0, so the ghost list stayed ~1 entry; a re-requested key was forgotten before it could accumulate the value needed for admission, and incumbents were restored on every attempt.

With the value metric fixed, this is stark: on an abrupt 100% working-set change CLFUS scored a 0.125 hit rate on the new set vs LRU's 1.0, while retaining 100% of the now-cold set.

Fix

Two small, complementary changes in RamCacheCLFUS.cc:

  1. Admission — keep the history list. _tick() now ages the oldest ghost entry and keeps it, freeing only to hold the list at its target size, so a recently evicted/seen key is remembered long enough to be re-admitted.
  2. Aging — decay resident counts. Once per "turnover" (one Put per resident object) _age_resident() halves every resident hits and _average_value (the admission bar must fall in step with the values it gates, or the decay is invisible to it).

Memory

Ghost entries are ~88 bytes each and are not counted against proxy.config.cache.ram_cache.size. A full cache-worth of history would be a large unbudgeted cost for caches of many small objects, so the history is bounded to _objects / HISTORY_DIVISOR (4). Testing showed a quarter preserves adaptivity (an eighth begins to slip); the seen-filter threshold tracks the same bound. Indicative cost for a 32 GB cache of 1 KB objects: ~700 MB, vs ~2.8 GB unbounded.

Tests

Adds two regression tests in CacheTest.cc, each comparing CLFUS to the LRU RAM cache (synthetic; higher is better except A-retained):

test LRU CLFUS before CLFUS after
gradual-drift hit rate 0.969 0.391 0.902
abrupt B-hit-rate 1.000 0.125 1.000
abrupt A-retained 15/112 112/112 14/112
steady-state 16 MB var 0.795 0.790 0.839

The existing ram_cache test still passes; CLFUS now also beats LRU on steady-state Zipfian, its intended strength.

Docs

Updates doc/developer-guide/cache-architecture/ram-cache.en.rst: the History List section no longer matched the code. Adds the value metric and floating admission bar, the CLOCK aging (_tick, _age_resident), "Following a shifting working set," and "Memory overhead."

Notes

  • Validated on synthetic access patterns, not production traces.
  • A further, unimplemented lever remains if ever needed: relaxing the incumbent bias (re-queue second-chance + cost/benefit) on a detected shift — not required to pass the tests, so left out to keep the change minimal.
  • Possible follow-ups: budget the ghost RAM against ram_cache.size; expose HISTORY_DIVISOR as a config knob.

phongn and others added 3 commits June 3, 2026 19:19
PR apache#11733 rewrote the CACHE_VALUE_HITS_SIZE cast so static_cast<float>
wraps the whole quotient, making (hits + 1) / (size + overhead) integer
division. It truncates to 0 for normal object sizes, zeroing the value
metric and collapsing CLFUS to FIFO: no promote-on-hit, no clock second
chance, and no value-based ghost re-admission.

Bind the cast to the numerator to restore floating-point division, and
add the ram_cache_clfus_value regression test as a guard (it fails on
the pre-fix macro and passes after).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CLFUS could not follow a changing working set: a once-hot object kept
its frequency forever (resident hit counts were never aged) and new
candidates were never admitted (the history/ghost list was emptied as
fast as it filled). On a working-set shift the cache froze on whatever
it had captured -- e.g. 0.125 vs LRU's 1.0 hit rate on the new set.

Two complementary changes fix it:

* Admission: _tick() used to free a history entry the moment its aged
  count reached 0, so the ghost list stayed ~empty and a re-requested
  key was forgotten before it could be re-admitted. Keep entries and
  free only to hold the list at its target size.

* Aging: halve all resident hit counts (and _average_value, the
  admission bar, in step) once per turnover, so a cold-but-once-hot
  object's advantage decays and warmer newcomers can take over.

The history list is capped at _objects / HISTORY_DIVISOR (4) rather
than a full cache-worth: ghost entries are ~88 bytes each and
unbudgeted (not counted against ram_cache.size), so a full cache-worth
is a large memory cost for caches of many small objects, and testing
showed a quarter preserves adaptivity.

Adds ram_cache_adaptivity (abrupt shift) and ram_cache_drift (gradual
rolling window) regression tests; both, plus the existing ram_cache,
now show CLFUS tracking the working set like LRU and beating it on
steady-state Zipfian.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The developer guide only sketched CLFUS and its History List section no
longer matched the code. Document the value metric and the floating
admission bar, the cached and history lists and their CLOCK aging
(_tick, _age_resident), how a shifting working set is followed, and the
per-object memory overhead -- including the bounded (HISTORY_DIVISOR)
history list and its unbudgeted ghost entries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@phongn phongn added the Cache label Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant