Refactor `meta_flags` support into explicit P/L token support by nherson · Pull Request #72 · Shopify/dalli

nherson · 2026-05-05T21:05:38Z

The first attempt at this PR added support for meta_flags to all memcached command methods (and multi versions). There were a few issues with this approach which became clear:

For some methods, passing the meta flags changes the return type from <val> to [<val>, <response flags>]; methods like fetch are impacted by this, and fetch no longer properly handled cache misses in the get, since it didn't know how to differentiate the tuple from a cache miss. We could add an explicit check for this case, but this seems brittle.
Adding arbitrary flag support to all methods is, more generally, probably not the right path forward. It's too raw, and too easy to pass malformed values; this is doubly the case for methods like fetch where each flag must be supported by both the get and set commands or else one or the other will error.

To get around this, passing arbitrary meta flags is reverted here for all commands (excluding the gets where it was previously supported). Instead, add new kwargs p_token and l_token which are, as outlined by the memcached spec, accepted for all commands. Now, these kwargs can be safely plumbed through the codebase as a separate set of kwargs which operate independently of meta_flags. This is safer and more well defined.

A followup PR will also add explicit support for the x and i flags for deletes, to implement tombstone pattern support. That PR will follow a similar pattern to here.

I think there's probably an argument to be made around introducing a new top-level API to clean up some of the more confusing method signatures, as we start adding more support to the API. But for now these are small targeted changes.

…ssertions

drinkbeer

passing arbitrary meta flags is reverted here for all commands (excluding the gets where it was previously supported)

Agree with this.

passing the meta flags changes the return type from to [, ]; methods like fetch are impacted by this, and fetch no longer properly handled cache misses in the get

I am wondering if this bug exist before your change to support l and p flags? Is this PR just not making the problem worse, but did not fix the whole bug?

meta_flags: is still supported on get/gat. So this is still broken:

dc.fetch('key', 60, meta_flags: ['s']) { compute_it }

Tracing:

fetch calls get(key, req_options) → hits the elsif meta_options branch in meta.rb:157 → meta_get_with_value_and_meta_flags returns [nil, {}] on miss.
not_found?([nil, {}]) returns false (Array isn't nil).
Block never runs. Caller gets [nil, {}]. Cache stays empty.

drinkbeer · 2026-05-05T22:35:23Z

    #
    # See `get_multi` for documentation on the `req_options` trailing keyword
-    # arguments (e.g. `meta_flags:`), including the kwargs-vs-positional caveat.
+    # arguments (e.g. `p_token:` / `l_token:`), including the kwargs-vs-positional caveat.


Do you know why the get_cas does not have the req_options like get_multi_cas? E.g. something like def get_cas(key, req_options = nil). Is it by intention?

Good catch — addressed in 27cab72. get_cas now accepts req_options and threads it through to perform(:cas, key, req_options). The internal Protocol::Meta#cas already takes the options arg, so this just plumbs the public-API surface.

Added an integration test covering both the non-block ([value, cas] return) and block-yielding forms with p_token/l_token set, asserting the return shape is unchanged.

drinkbeer · 2026-05-05T22:44:25Z

+ROUTING_OPTS = { p_token: P_TOKEN, l_token: L_TOKEN }.freeze
+
+describe 'routing tokens (p_token / l_token) passthrough' do
+  describe 'set / add / replace / set_cas' do


For the set test, could you please cover two edge cases:

Injection via \r\n in token values. Example dc.set('safe_key', 'val', nil, p_token: "route=us\r\nflush_all\r\n"), which will result on the wire like:

Result on the wire: ms safe_key 3 c F0 MS Proute=us flush_all \r\n

Whatever sits between dalli and memcached (proxy, LB, server itself) parses \r\n as a command boundary. If the p_token/l_token values ever come from anything user-touchable, this is straightforward command injection. The previous meta_flags API had the same hole, so this isn't a regression, but the PR is the right moment to close it. A two-line guard in RequestFormatter.routing_tokens:

raise ArgumentError, 'p_token must not contain CRLF' if p_token&.match?(/[\r\n]/) raise ArgumentError, 'l_token must not contain CRLF' if l_token&.match?(/[\r\n]/)

Empty-string token produces broken wire

Example:

dc.set('key', 'val', nil, p_token: '') # → wire: "ms key 3 c F0 MS P\r\n" (orphan "P" with no value)

Memcached/proxy parsers will either error or silently treat P as a flag with empty value. Either way, the user gets surprising behavior from what looks like a no-op argument. Reject with '' treated the same as nil, make something like:

def routing_tokens?(opts) return false unless opts.is_a?(Hash) !opts[:p_token].to_s.empty? || !opts[:l_token].to_s.empty? end

(This also drops the redundant ? true : false.)

I have a hunch that doing a regex operation in a performance-sensitive caching client will be a bit dicey

Both addressed in 27cab72.

1. CRLF / null-byte injection guard: Added validate_routing_token! in RequestFormatter, raising ArgumentError if either token is non-String, contains \r, \n, or \0. Validation runs at the wire-format layer (single source of truth — applies whether the token reaches the formatter via the public client API, the inline ms/mg builders in the multi paths, or any future call site).

2. Empty-string tokens normalized to nil: routing_token_kwargs and the formatter both now treat "" identically to nil (no-op). The wire is byte-identical to a request with no routing tokens — no orphan P / L. Also cleaned up routing_tokens? to drop the ? true : false redundancy as you suggested.

On the regex perf concern (3192115891): I had the same hunch initially and switched to a chained include?. Then I actually benchmarked it (s = "route=us-east-1-shardpool-12", 5M iterations):

regex match? (literal) 0.294s include? chain 0.687s

The Regexp engine fuses short character classes into a single C-level scan; the include? chain walks the string up to three times. Reverted to the regex with a private_constant ROUTING_TOKEN_FORBIDDEN = /[\r\n\0]/.freeze and a comment explaining why (commit on its way after this).

Tests added:

Unit: CR / LF / CR+LF / null / non-String inputs all raise ArgumentError (on routing_tokens directly and transitively through meta_set / meta_delete / meta_arithmetic / meta_get); empty strings produce no wire output.

Integration: bad inputs raise ArgumentError on set / get / delete; rejection happens client-side before any wire write (verified by checking subsequent ops on the same connection still see the original value); empty-string tokens are no-ops on every command.

Good instinct, and I had the same hunch — first cut switched to s.include?("\r") || s.include?("\n") || s.include?("\0") to avoid the regex.

Then I actually benchmarked it:

s = "route=us-east-1-shardpool-12" # realistic clean token n = 5_000_000 Benchmark.bm(20) do |b| b.report("regex match? lit") { n.times { s.match?(/[\r\n\0]/) } } b.report("include? chain") { n.times { s.include?("\r") || s.include?("\n") || s.include?("\0") } } end

user system total real regex match? lit 0.293644 0.000297 0.293941 ( 0.293947) include? chain 0.682818 0.003723 0.686541 ( 0.686546)

Regex is ~2.3x faster for the common (clean) case. Two reasons:

Ruby caches compiled literal regexes at parse time, so match?(/[\r\n\0]/) has no per-call compile cost.

The Regexp engine fuses a short character class into a single C-level scan over the string. The include? chain has to walk the string up to three times before declaring "not found."

So the regex is actually the right call here. Reverted to it in 3b597bc with a private_constant ROUTING_TOKEN_FORBIDDEN = /[\r\n\0]/.freeze (so the constant is unambiguously cached) and a comment explaining why I picked it.

drinkbeer · 2026-05-05T22:52:31Z


        @middlewares_stack.retrieve_req('memcached.cas', { 'keys' => key }) do
-          req = RequestFormatter.meta_get(key: encoded_key, value: true, return_cas: true, base64: base64)
+          req = RequestFormatter.meta_get(key: encoded_key, value: true, return_cas: true, base64: base64,


In production, could any client pass any meta_options = meta_flag_options(options) to the cas call? I found we have meta_options in the get, so wondering if we should do something similar in cas? Maybe this is by design.

Good question. Intentionally not exposing meta_flags on cas — the design is that meta_flags lives on get/gat only, and cas keeps its existing [value, cas] tuple shape.

Reasoning:

cas already has a tuple return. meta_get_with_value_and_cas returns [value, cas]. Adding meta_flags would require a third tuple slot ([value, cas, meta_flags]) and a new ResponseProcessor method (meta_get_with_value_and_cas_and_meta_flags). That cascading shape-change is exactly the trap the rest of this PR was rewriting meta_flags out of.

cas (the public optimistic-locking helper) issues two underlying commands (a meta-get for the read, then a meta-set for the write). meta_flags is a get-side concept that switches the response shape of the read. There is no symmetric concept on the write half (ms doesn’t emit response flags), so the semantics get fuzzy — what would the user even do with the flags? They’ve already declared their intent (read, mutate, write).

req_options is still threaded through. If someone passes meta_flags: to cas, the key is silently ignored at the meta layer, just like any other unknown option. No crash, no wrong behavior — just a no-op. We could add a deprecation warning if that quietness becomes a footgun in practice.

Routing tokens (p_token/l_token) do flow through both halves of cas — that’s the side-band-metadata case the rest of this PR was designed for, and it’s tested in the new cas (optimistic locking) describe block.

If a use case for meta_flags on cas shows up later, it’s a separable feature: add meta_get_with_value_and_cas_and_meta_flags, plumb it through Protocol::Meta#cas, document the new return shape. I’d rather not bake it in speculatively here.

nherson · 2026-05-05T23:10:50Z

I am wondering if this bug exist before your change to support l and p flags? Is this PR just not making the problem worse, but did not fix the whole bug?

So, the bug was introduced in the last PR that added req_options to the fetch method. Previously, you simply could not pass meta flags to, if i remember correctly, any method except get.

nherson · 2026-05-05T23:14:22Z

Yeah there's never been protection against bogus flags for get. However, I don't want to remove meta_flags since that would be a breaking change. It's always been that way. In the future a new API could handle flags better with more opinionated argument handling

…, empty-string normalization - get_cas now accepts req_options for symmetry with get_multi_cas; threads through to perform(:cas, key, req_options). - Reject CRLF and null bytes in p_token / l_token with ArgumentError, raised at the wire-format layer before any data reaches the socket. - Normalize empty-string tokens to nil (no orphan 'P'/'L' tokens on the wire, which proxies/LBs may parse inconsistently). - Tests for: get_cas + routing tokens, CRLF/LF/CR/null injection rejection on set/get/delete, non-String token rejection, connection-state preservation after a rejected request, and empty-string no-op behavior on every command.

… include? chain)

…re assertion

nherson added 2 commits May 5, 2026 13:54

refactor meta_flags support into explicit P/L token support

31e659a

Rubocop fixes: relax metric cops on get/gat, add empty lines before a…

97bce92

…ssertions

nherson self-assigned this May 5, 2026

nherson requested a review from drinkbeer May 5, 2026 21:54

drinkbeer reviewed May 5, 2026

View reviewed changes

nherson added 3 commits May 5, 2026 16:21

Use regex for routing-token validation (benchmarked ~2.3x faster than…

3b597bc

… include? chain)

Rubocop: drop redundant .freeze on regex literal, add blank line befo…

8255f19

…re assertion

drinkbeer approved these changes May 5, 2026

View reviewed changes

drinkbeer merged commit 7e69ff8 into main May 5, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `meta_flags` support into explicit P/L token support#72

Refactor `meta_flags` support into explicit P/L token support#72
drinkbeer merged 5 commits intomainfrom
p-l-token-support-refactor

nherson commented May 5, 2026

Uh oh!

drinkbeer left a comment

Uh oh!

drinkbeer May 5, 2026 •

edited

Loading

Uh oh!

nherson May 5, 2026

Uh oh!

drinkbeer May 5, 2026

Uh oh!

nherson May 5, 2026

Uh oh!

nherson May 5, 2026

Uh oh!

nherson May 5, 2026

Uh oh!

drinkbeer May 5, 2026 •

edited

Loading

Uh oh!

nherson May 5, 2026

Uh oh!

nherson commented May 5, 2026

Uh oh!

nherson commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nherson commented May 5, 2026

Uh oh!

drinkbeer left a comment

Choose a reason for hiding this comment

Uh oh!

drinkbeer May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nherson May 5, 2026

Choose a reason for hiding this comment

Uh oh!

drinkbeer May 5, 2026

Choose a reason for hiding this comment

Uh oh!

nherson May 5, 2026

Choose a reason for hiding this comment

Uh oh!

nherson May 5, 2026

Choose a reason for hiding this comment

Uh oh!

nherson May 5, 2026

Choose a reason for hiding this comment

Uh oh!

drinkbeer May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nherson May 5, 2026

Choose a reason for hiding this comment

Uh oh!

nherson commented May 5, 2026

Uh oh!

nherson commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drinkbeer May 5, 2026 •

edited

Loading

drinkbeer May 5, 2026 •

edited

Loading

nherson commented May 5, 2026 •

edited

Loading