Skip to content

Fix CLFUS RAM cache value metric broken by integer division#13233

Open
phongn wants to merge 1 commit into
apache:masterfrom
phongn:fix-clfus-cache-value-int-division
Open

Fix CLFUS RAM cache value metric broken by integer division#13233
phongn wants to merge 1 commit into
apache:masterfrom
phongn:fix-clfus-cache-value-int-division

Conversation

@phongn
Copy link
Copy Markdown
Collaborator

@phongn phongn commented Jun 3, 2026

Summary

The CLFUS RAM cache ranks objects by a value density, value = (hits + 1) / (size + overhead), compared against a running double average on every eviction/admission decision. Since #11733 this has been computed with integer division and truncates to 0 for essentially every object, silently disabling CLFUS's frequency/size/recency logic and degrading it to FIFO.

Root cause

#11733 ("Fix C-style cast in Cache subsystem") modernized the macro:

-#define CACHE_VALUE_HITS_SIZE(_h, _s) ((float)((_h) + 1) / ((_s) + ENTRY_OVERHEAD))
+#define CACHE_VALUE_HITS_SIZE(_h, _s) (static_cast<float>(((_h) + 1) / ((_s) + ENTRY_OVERHEAD)))

The original C cast bound only to the numerator, so the division was floating point. The static_cast<float>(...) wraps the whole quotient, so (hits + 1) / (size + overhead) is now integer division. hits is on the order of tens while size + overhead is thousands+, so the result truncates to 0 for any normal object.

With value ≡ 0, _average_value stays 0 and every value-gated branch is dead (0 > 0 is false):

  • no promote-to-MRU on hit (get()),
  • no CLOCK second-chance / requeue (put()),
  • no value-based re-admission from the history (ghost) list.

CLFUS effectively becomes FIFO, which generally performs worse than the simpler LRU it is meant to beat. This has affected every CLFUS deployment since the regression merged on 2024-08-28.

Fix

Bind the cast to the numerator only, restoring floating-point division (the
pre-#11733 semantics):

#define CACHE_VALUE_HITS_SIZE(_h, _s) (static_cast<float>((_h) + 1) / ((_s) + ENTRY_OVERHEAD))

Test

Adds ram_cache_clfus_value, a focused regression test asserting the value metric is a non-zero fraction and is monotonic in hits and in size. It fails on the pre-fix macro and passes after:

  • buggy macro → FAILED ("value metric truncated to zero", etc.)
  • fixed macro → PASSED

The existing ram_cache regression still passes. With the fix, CLFUS's size-aware behavior is restored — in the bundled ram_cache test at the 16 MB size, its variable-size hit rate rises to ~0.82, slightly ahead of LRU (~0.80) on the same workload.

Notes

  • Severity: correctness regression affecting all users of the non-default proxy.config.cache.ram_cache.algorithm = 0 (CLFUS) since Fix C-style cast in Cache subsystem #11733; a backport may be warranted.
  • This restores the intended algorithm. A separate, long-standing limitation — resident entries' hit counts are only aged on the ghost list, never while resident — is not addressed here and is tracked separately.

Fixes a regression introduced in #11733.

PR apache#11733 rewrote the CACHE_VALUE_HITS_SIZE cast so static_cast<float>
wraps the whole quotient, making (hits + 1) / (size + overhead) integer
division. It truncates to 0 for normal object sizes, zeroing the value
metric and collapsing CLFUS to FIFO: no promote-on-hit, no clock second
chance, and no value-based ghost re-admission.

Bind the cast to the numerator to restore floating-point division, and
add the ram_cache_clfus_value regression test as a guard (it fails on
the pre-fix macro and passes after).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant