Make the CLFUS RAM cache adapt to a shifting working set#13235
Draft
phongn wants to merge 3 commits into
Draft
Conversation
PR apache#11733 rewrote the CACHE_VALUE_HITS_SIZE cast so static_cast<float> wraps the whole quotient, making (hits + 1) / (size + overhead) integer division. It truncates to 0 for normal object sizes, zeroing the value metric and collapsing CLFUS to FIFO: no promote-on-hit, no clock second chance, and no value-based ghost re-admission. Bind the cast to the numerator to restore floating-point division, and add the ram_cache_clfus_value regression test as a guard (it fails on the pre-fix macro and passes after). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CLFUS could not follow a changing working set: a once-hot object kept its frequency forever (resident hit counts were never aged) and new candidates were never admitted (the history/ghost list was emptied as fast as it filled). On a working-set shift the cache froze on whatever it had captured -- e.g. 0.125 vs LRU's 1.0 hit rate on the new set. Two complementary changes fix it: * Admission: _tick() used to free a history entry the moment its aged count reached 0, so the ghost list stayed ~empty and a re-requested key was forgotten before it could be re-admitted. Keep entries and free only to hold the list at its target size. * Aging: halve all resident hit counts (and _average_value, the admission bar, in step) once per turnover, so a cold-but-once-hot object's advantage decays and warmer newcomers can take over. The history list is capped at _objects / HISTORY_DIVISOR (4) rather than a full cache-worth: ghost entries are ~88 bytes each and unbudgeted (not counted against ram_cache.size), so a full cache-worth is a large memory cost for caches of many small objects, and testing showed a quarter preserves adaptivity. Adds ram_cache_adaptivity (abrupt shift) and ram_cache_drift (gradual rolling window) regression tests; both, plus the existing ram_cache, now show CLFUS tracking the working set like LRU and beating it on steady-state Zipfian. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The developer guide only sketched CLFUS and its History List section no longer matched the code. Document the value metric and the floating admission bar, the cached and history lists and their CLOCK aging (_tick, _age_resident), how a shifting working set is followed, and the per-object memory overhead -- including the bounded (HISTORY_DIVISOR) history list and its unbudgeted ghost entries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Building on #13233 (which restored the CLFUS value metric), this makes the CLFUS RAM cache actually follow a working set that changes over time. Previously CLFUS captured an initial set of objects and then effectively froze on it: on a working-set change it kept serving the stale set and never admitted the new one.
Note: This fix for CLFUS is somewhat orthogonal to its original design by keeping the history list. This 'fixes' it, but using a better-designed algorithm like Window-TinyLFU or S3-FIFO might be a better solution (see also: FastLZ vs more modern algorithms like LZ4).
Root cause
Two independent problems, both of which had to be fixed:
hitsonly ever increased, so an object that was hot days ago kept winning replacement long after going cold. Aging existed only for the history/ghost list (_tick())._tick()freed a history (ghost) entry the moment its agedhitsreached 0, so the ghost list stayed ~1 entry; a re-requested key was forgotten before it could accumulate the value needed for admission, and incumbents were restored on every attempt.With the value metric fixed, this is stark: on an abrupt 100% working-set change CLFUS scored a 0.125 hit rate on the new set vs LRU's 1.0, while retaining 100% of the now-cold set.
Fix
Two small, complementary changes in
RamCacheCLFUS.cc:_tick()now ages the oldest ghost entry and keeps it, freeing only to hold the list at its target size, so a recently evicted/seen key is remembered long enough to be re-admitted.Putper resident object)_age_resident()halves every residenthitsand_average_value(the admission bar must fall in step with the values it gates, or the decay is invisible to it).Memory
Ghost entries are ~88 bytes each and are not counted against
proxy.config.cache.ram_cache.size. A full cache-worth of history would be a large unbudgeted cost for caches of many small objects, so the history is bounded to_objects / HISTORY_DIVISOR(4). Testing showed a quarter preserves adaptivity (an eighth begins to slip); the seen-filter threshold tracks the same bound. Indicative cost for a 32 GB cache of 1 KB objects: ~700 MB, vs ~2.8 GB unbounded.Tests
Adds two regression tests in
CacheTest.cc, each comparing CLFUS to the LRU RAM cache (synthetic; higher is better except A-retained):The existing
ram_cachetest still passes; CLFUS now also beats LRU on steady-state Zipfian, its intended strength.Docs
Updates
doc/developer-guide/cache-architecture/ram-cache.en.rst: the History List section no longer matched the code. Adds the value metric and floating admission bar, the CLOCK aging (_tick,_age_resident), "Following a shifting working set," and "Memory overhead."Notes
ram_cache.size; exposeHISTORY_DIVISORas a config knob.