Skip to content

Refactor meta_flags support into explicit P/L token support#72

Merged
drinkbeer merged 5 commits intomainfrom
p-l-token-support-refactor
May 5, 2026
Merged

Refactor meta_flags support into explicit P/L token support#72
drinkbeer merged 5 commits intomainfrom
p-l-token-support-refactor

Conversation

@nherson
Copy link
Copy Markdown

@nherson nherson commented May 5, 2026

The first attempt at this PR added support for meta_flags to all memcached command methods (and multi versions). There were a few issues with this approach which became clear:

  1. For some methods, passing the meta flags changes the return type from <val> to [<val>, <response flags>]; methods like fetch are impacted by this, and fetch no longer properly handled cache misses in the get, since it didn't know how to differentiate the tuple from a cache miss. We could add an explicit check for this case, but this seems brittle.
  2. Adding arbitrary flag support to all methods is, more generally, probably not the right path forward. It's too raw, and too easy to pass malformed values; this is doubly the case for methods like fetch where each flag must be supported by both the get and set commands or else one or the other will error.

To get around this, passing arbitrary meta flags is reverted here for all commands (excluding the gets where it was previously supported). Instead, add new kwargs p_token and l_token which are, as outlined by the memcached spec, accepted for all commands. Now, these kwargs can be safely plumbed through the codebase as a separate set of kwargs which operate independently of meta_flags. This is safer and more well defined.

A followup PR will also add explicit support for the x and i flags for deletes, to implement tombstone pattern support. That PR will follow a similar pattern to here.

I think there's probably an argument to be made around introducing a new top-level API to clean up some of the more confusing method signatures, as we start adding more support to the API. But for now these are small targeted changes.

@nherson nherson self-assigned this May 5, 2026
@nherson nherson requested a review from drinkbeer May 5, 2026 21:54
Copy link
Copy Markdown

@drinkbeer drinkbeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing arbitrary meta flags is reverted here for all commands (excluding the gets where it was previously supported)

Agree with this.

passing the meta flags changes the return type from to [, ]; methods like fetch are impacted by this, and fetch no longer properly handled cache misses in the get

I am wondering if this bug exist before your change to support l and p flags? Is this PR just not making the problem worse, but did not fix the whole bug?

meta_flags: is still supported on get/gat. So this is still broken:

dc.fetch('key', 60, meta_flags: ['s']) { compute_it }

Tracing:

  1. fetch calls get(key, req_options) → hits the elsif meta_options branch in meta.rb:157 → meta_get_with_value_and_meta_flags returns [nil, {}] on miss.
  2. not_found?([nil, {}]) returns false (Array isn't nil).
  3. Block never runs. Caller gets [nil, {}]. Cache stays empty.

Comment thread lib/dalli/client.rb
#
# See `get_multi` for documentation on the `req_options` trailing keyword
# arguments (e.g. `meta_flags:`), including the kwargs-vs-positional caveat.
# arguments (e.g. `p_token:` / `l_token:`), including the kwargs-vs-positional caveat.
Copy link
Copy Markdown

@drinkbeer drinkbeer May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why the get_cas does not have the req_options like get_multi_cas? E.g. something like def get_cas(key, req_options = nil). Is it by intention?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — addressed in 27cab72. get_cas now accepts req_options and threads it through to perform(:cas, key, req_options). The internal Protocol::Meta#cas already takes the options arg, so this just plumbs the public-API surface.

Added an integration test covering both the non-block ([value, cas] return) and block-yielding forms with p_token/l_token set, asserting the return shape is unchanged.

ROUTING_OPTS = { p_token: P_TOKEN, l_token: L_TOKEN }.freeze

describe 'routing tokens (p_token / l_token) passthrough' do
describe 'set / add / replace / set_cas' do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the set test, could you please cover two edge cases:

  1. Injection via \r\n in token values. Example dc.set('safe_key', 'val', nil, p_token: "route=us\r\nflush_all\r\n"), which will result on the wire like:
  Result on the wire:
  ms safe_key 3 c F0 MS Proute=us
  flush_all
  \r\n

Whatever sits between dalli and memcached (proxy, LB, server itself) parses \r\n as a command boundary. If the p_token/l_token values ever come from anything user-touchable, this is straightforward command injection. The previous meta_flags API had the same hole, so this isn't a regression, but the PR is the right moment to close it. A two-line guard in RequestFormatter.routing_tokens:

  raise ArgumentError, 'p_token must not contain CRLF' if p_token&.match?(/[\r\n]/)
  raise ArgumentError, 'l_token must not contain CRLF' if l_token&.match?(/[\r\n]/)
  1. Empty-string token produces broken wire

Example:

dc.set('key', 'val', nil, p_token: '')
# → wire: "ms key 3 c F0 MS P\r\n"  (orphan "P" with no value)

Memcached/proxy parsers will either error or silently treat P as a flag with empty value. Either way, the user gets surprising behavior from what looks like a no-op argument. Reject with '' treated the same as nil, make something like:

  def routing_tokens?(opts)
    return false unless opts.is_a?(Hash)
    !opts[:p_token].to_s.empty? || !opts[:l_token].to_s.empty?
  end

(This also drops the redundant ? true : false.)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hunch that doing a regex operation in a performance-sensitive caching client will be a bit dicey

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both addressed in 27cab72.

1. CRLF / null-byte injection guard: Added validate_routing_token! in RequestFormatter, raising ArgumentError if either token is non-String, contains \r, \n, or \0. Validation runs at the wire-format layer (single source of truth — applies whether the token reaches the formatter via the public client API, the inline ms/mg builders in the multi paths, or any future call site).

2. Empty-string tokens normalized to nil: routing_token_kwargs and the formatter both now treat "" identically to nil (no-op). The wire is byte-identical to a request with no routing tokens — no orphan P / L. Also cleaned up routing_tokens? to drop the ? true : false redundancy as you suggested.

On the regex perf concern (3192115891): I had the same hunch initially and switched to a chained include?. Then I actually benchmarked it (s = "route=us-east-1-shardpool-12", 5M iterations):

regex match? (literal)   0.294s
include? chain           0.687s

The Regexp engine fuses short character classes into a single C-level scan; the include? chain walks the string up to three times. Reverted to the regex with a private_constant ROUTING_TOKEN_FORBIDDEN = /[\r\n\0]/.freeze and a comment explaining why (commit on its way after this).

Tests added:

  • Unit: CR / LF / CR+LF / null / non-String inputs all raise ArgumentError (on routing_tokens directly and transitively through meta_set / meta_delete / meta_arithmetic / meta_get); empty strings produce no wire output.
  • Integration: bad inputs raise ArgumentError on set / get / delete; rejection happens client-side before any wire write (verified by checking subsequent ops on the same connection still see the original value); empty-string tokens are no-ops on every command.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good instinct, and I had the same hunch — first cut switched to s.include?("\r") || s.include?("\n") || s.include?("\0") to avoid the regex.

Then I actually benchmarked it:

s = "route=us-east-1-shardpool-12"  # realistic clean token
n = 5_000_000
Benchmark.bm(20) do |b|
  b.report("regex match? lit") { n.times { s.match?(/[\r\n\0]/) } }
  b.report("include? chain")   { n.times { s.include?("\r") || s.include?("\n") || s.include?("\0") } }
end
                           user     system      total        real
regex match? lit       0.293644   0.000297   0.293941 (  0.293947)
include? chain         0.682818   0.003723   0.686541 (  0.686546)

Regex is ~2.3x faster for the common (clean) case. Two reasons:

  • Ruby caches compiled literal regexes at parse time, so match?(/[\r\n\0]/) has no per-call compile cost.
  • The Regexp engine fuses a short character class into a single C-level scan over the string. The include? chain has to walk the string up to three times before declaring "not found."

So the regex is actually the right call here. Reverted to it in 3b597bc with a private_constant ROUTING_TOKEN_FORBIDDEN = /[\r\n\0]/.freeze (so the constant is unambiguously cached) and a comment explaining why I picked it.


@middlewares_stack.retrieve_req('memcached.cas', { 'keys' => key }) do
req = RequestFormatter.meta_get(key: encoded_key, value: true, return_cas: true, base64: base64)
req = RequestFormatter.meta_get(key: encoded_key, value: true, return_cas: true, base64: base64,
Copy link
Copy Markdown

@drinkbeer drinkbeer May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In production, could any client pass any meta_options = meta_flag_options(options) to the cas call? I found we have meta_options in the get, so wondering if we should do something similar in cas? Maybe this is by design.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Intentionally not exposing meta_flags on cas — the design is that meta_flags lives on get/gat only, and cas keeps its existing [value, cas] tuple shape.

Reasoning:

  1. cas already has a tuple return. meta_get_with_value_and_cas returns [value, cas]. Adding meta_flags would require a third tuple slot ([value, cas, meta_flags]) and a new ResponseProcessor method (meta_get_with_value_and_cas_and_meta_flags). That cascading shape-change is exactly the trap the rest of this PR was rewriting meta_flags out of.

  2. cas (the public optimistic-locking helper) issues two underlying commands (a meta-get for the read, then a meta-set for the write). meta_flags is a get-side concept that switches the response shape of the read. There is no symmetric concept on the write half (ms doesn’t emit response flags), so the semantics get fuzzy — what would the user even do with the flags? They’ve already declared their intent (read, mutate, write).

  3. req_options is still threaded through. If someone passes meta_flags: to cas, the key is silently ignored at the meta layer, just like any other unknown option. No crash, no wrong behavior — just a no-op. We could add a deprecation warning if that quietness becomes a footgun in practice.

  4. Routing tokens (p_token/l_token) do flow through both halves of cas — that’s the side-band-metadata case the rest of this PR was designed for, and it’s tested in the new cas (optimistic locking) describe block.

If a use case for meta_flags on cas shows up later, it’s a separable feature: add meta_get_with_value_and_cas_and_meta_flags, plumb it through Protocol::Meta#cas, document the new return shape. I’d rather not bake it in speculatively here.

@nherson
Copy link
Copy Markdown
Author

nherson commented May 5, 2026

I am wondering if this bug exist before your change to support l and p flags? Is this PR just not making the problem worse, but did not fix the whole bug?

So, the bug was introduced in the last PR that added req_options to the fetch method. Previously, you simply could not pass meta flags to, if i remember correctly, any method except get.

@nherson
Copy link
Copy Markdown
Author

nherson commented May 5, 2026

Yeah there's never been protection against bogus flags for get. However, I don't want to remove meta_flags since that would be a breaking change. It's always been that way. In the future a new API could handle flags better with more opinionated argument handling

nherson added 3 commits May 5, 2026 16:21
…, empty-string normalization

- get_cas now accepts req_options for symmetry with get_multi_cas; threads
  through to perform(:cas, key, req_options).
- Reject CRLF and null bytes in p_token / l_token with ArgumentError, raised
  at the wire-format layer before any data reaches the socket.
- Normalize empty-string tokens to nil (no orphan 'P'/'L' tokens on the wire,
  which proxies/LBs may parse inconsistently).
- Tests for: get_cas + routing tokens, CRLF/LF/CR/null injection rejection on
  set/get/delete, non-String token rejection, connection-state preservation
  after a rejected request, and empty-string no-op behavior on every command.
@drinkbeer drinkbeer merged commit 7e69ff8 into main May 5, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants