Refactor meta_flags support into explicit P/L token support#72
Refactor meta_flags support into explicit P/L token support#72
meta_flags support into explicit P/L token support#72Conversation
drinkbeer
left a comment
There was a problem hiding this comment.
passing arbitrary meta flags is reverted here for all commands (excluding the gets where it was previously supported)
Agree with this.
passing the meta flags changes the return type from to [, ]; methods like fetch are impacted by this, and fetch no longer properly handled cache misses in the get
I am wondering if this bug exist before your change to support l and p flags? Is this PR just not making the problem worse, but did not fix the whole bug?
meta_flags: is still supported on get/gat. So this is still broken:
dc.fetch('key', 60, meta_flags: ['s']) { compute_it }
Tracing:
fetchcallsget(key, req_options)→ hits the elsifmeta_optionsbranch in meta.rb:157 →meta_get_with_value_and_meta_flagsreturns[nil, {}]on miss.not_found?([nil, {}])returnsfalse(Array isn't nil).- Block never runs. Caller gets
[nil, {}]. Cache stays empty.
| # | ||
| # See `get_multi` for documentation on the `req_options` trailing keyword | ||
| # arguments (e.g. `meta_flags:`), including the kwargs-vs-positional caveat. | ||
| # arguments (e.g. `p_token:` / `l_token:`), including the kwargs-vs-positional caveat. |
There was a problem hiding this comment.
Do you know why the get_cas does not have the req_options like get_multi_cas? E.g. something like def get_cas(key, req_options = nil). Is it by intention?
There was a problem hiding this comment.
Good catch — addressed in 27cab72. get_cas now accepts req_options and threads it through to perform(:cas, key, req_options). The internal Protocol::Meta#cas already takes the options arg, so this just plumbs the public-API surface.
Added an integration test covering both the non-block ([value, cas] return) and block-yielding forms with p_token/l_token set, asserting the return shape is unchanged.
| ROUTING_OPTS = { p_token: P_TOKEN, l_token: L_TOKEN }.freeze | ||
|
|
||
| describe 'routing tokens (p_token / l_token) passthrough' do | ||
| describe 'set / add / replace / set_cas' do |
There was a problem hiding this comment.
For the set test, could you please cover two edge cases:
- Injection via
\r\nin token values. Exampledc.set('safe_key', 'val', nil, p_token: "route=us\r\nflush_all\r\n"), which will result on the wire like:
Result on the wire:
ms safe_key 3 c F0 MS Proute=us
flush_all
\r\n
Whatever sits between dalli and memcached (proxy, LB, server itself) parses \r\n as a command boundary. If the p_token/l_token values ever come from anything user-touchable, this is straightforward command injection. The previous meta_flags API had the same hole, so this isn't a regression, but the PR is the right moment to close it. A two-line guard in RequestFormatter.routing_tokens:
raise ArgumentError, 'p_token must not contain CRLF' if p_token&.match?(/[\r\n]/)
raise ArgumentError, 'l_token must not contain CRLF' if l_token&.match?(/[\r\n]/)
- Empty-string token produces broken wire
Example:
dc.set('key', 'val', nil, p_token: '')
# → wire: "ms key 3 c F0 MS P\r\n" (orphan "P" with no value)
Memcached/proxy parsers will either error or silently treat P as a flag with empty value. Either way, the user gets surprising behavior from what looks like a no-op argument. Reject with '' treated the same as nil, make something like:
def routing_tokens?(opts)
return false unless opts.is_a?(Hash)
!opts[:p_token].to_s.empty? || !opts[:l_token].to_s.empty?
end
(This also drops the redundant ? true : false.)
There was a problem hiding this comment.
I have a hunch that doing a regex operation in a performance-sensitive caching client will be a bit dicey
There was a problem hiding this comment.
Both addressed in 27cab72.
1. CRLF / null-byte injection guard: Added validate_routing_token! in RequestFormatter, raising ArgumentError if either token is non-String, contains \r, \n, or \0. Validation runs at the wire-format layer (single source of truth — applies whether the token reaches the formatter via the public client API, the inline ms/mg builders in the multi paths, or any future call site).
2. Empty-string tokens normalized to nil: routing_token_kwargs and the formatter both now treat "" identically to nil (no-op). The wire is byte-identical to a request with no routing tokens — no orphan P / L. Also cleaned up routing_tokens? to drop the ? true : false redundancy as you suggested.
On the regex perf concern (3192115891): I had the same hunch initially and switched to a chained include?. Then I actually benchmarked it (s = "route=us-east-1-shardpool-12", 5M iterations):
regex match? (literal) 0.294s
include? chain 0.687s
The Regexp engine fuses short character classes into a single C-level scan; the include? chain walks the string up to three times. Reverted to the regex with a private_constant ROUTING_TOKEN_FORBIDDEN = /[\r\n\0]/.freeze and a comment explaining why (commit on its way after this).
Tests added:
- Unit: CR / LF / CR+LF / null / non-String inputs all raise
ArgumentError(onrouting_tokensdirectly and transitively throughmeta_set/meta_delete/meta_arithmetic/meta_get); empty strings produce no wire output. - Integration: bad inputs raise
ArgumentErroronset/get/delete; rejection happens client-side before any wire write (verified by checking subsequent ops on the same connection still see the original value); empty-string tokens are no-ops on every command.
There was a problem hiding this comment.
Good instinct, and I had the same hunch — first cut switched to s.include?("\r") || s.include?("\n") || s.include?("\0") to avoid the regex.
Then I actually benchmarked it:
s = "route=us-east-1-shardpool-12" # realistic clean token
n = 5_000_000
Benchmark.bm(20) do |b|
b.report("regex match? lit") { n.times { s.match?(/[\r\n\0]/) } }
b.report("include? chain") { n.times { s.include?("\r") || s.include?("\n") || s.include?("\0") } }
end user system total real
regex match? lit 0.293644 0.000297 0.293941 ( 0.293947)
include? chain 0.682818 0.003723 0.686541 ( 0.686546)
Regex is ~2.3x faster for the common (clean) case. Two reasons:
- Ruby caches compiled literal regexes at parse time, so
match?(/[\r\n\0]/)has no per-call compile cost. - The Regexp engine fuses a short character class into a single C-level scan over the string. The
include?chain has to walk the string up to three times before declaring "not found."
So the regex is actually the right call here. Reverted to it in 3b597bc with a private_constant ROUTING_TOKEN_FORBIDDEN = /[\r\n\0]/.freeze (so the constant is unambiguously cached) and a comment explaining why I picked it.
|
|
||
| @middlewares_stack.retrieve_req('memcached.cas', { 'keys' => key }) do | ||
| req = RequestFormatter.meta_get(key: encoded_key, value: true, return_cas: true, base64: base64) | ||
| req = RequestFormatter.meta_get(key: encoded_key, value: true, return_cas: true, base64: base64, |
There was a problem hiding this comment.
In production, could any client pass any meta_options = meta_flag_options(options) to the cas call? I found we have meta_options in the get, so wondering if we should do something similar in cas? Maybe this is by design.
There was a problem hiding this comment.
Good question. Intentionally not exposing meta_flags on cas — the design is that meta_flags lives on get/gat only, and cas keeps its existing [value, cas] tuple shape.
Reasoning:
-
casalready has a tuple return.meta_get_with_value_and_casreturns[value, cas]. Addingmeta_flagswould require a third tuple slot ([value, cas, meta_flags]) and a newResponseProcessormethod (meta_get_with_value_and_cas_and_meta_flags). That cascading shape-change is exactly the trap the rest of this PR was rewritingmeta_flagsout of. -
cas(the public optimistic-locking helper) issues two underlying commands (a meta-get for the read, then a meta-set for the write).meta_flagsis a get-side concept that switches the response shape of the read. There is no symmetric concept on the write half (msdoesn’t emit response flags), so the semantics get fuzzy — what would the user even do with the flags? They’ve already declared their intent (read, mutate, write). -
req_optionsis still threaded through. If someone passesmeta_flags:tocas, the key is silently ignored at the meta layer, just like any other unknown option. No crash, no wrong behavior — just a no-op. We could add a deprecation warning if that quietness becomes a footgun in practice. -
Routing tokens (
p_token/l_token) do flow through both halves ofcas— that’s the side-band-metadata case the rest of this PR was designed for, and it’s tested in the newcas (optimistic locking)describe block.
If a use case for meta_flags on cas shows up later, it’s a separable feature: add meta_get_with_value_and_cas_and_meta_flags, plumb it through Protocol::Meta#cas, document the new return shape. I’d rather not bake it in speculatively here.
So, the bug was introduced in the last PR that added |
|
Yeah there's never been protection against bogus flags for |
…, empty-string normalization - get_cas now accepts req_options for symmetry with get_multi_cas; threads through to perform(:cas, key, req_options). - Reject CRLF and null bytes in p_token / l_token with ArgumentError, raised at the wire-format layer before any data reaches the socket. - Normalize empty-string tokens to nil (no orphan 'P'/'L' tokens on the wire, which proxies/LBs may parse inconsistently). - Tests for: get_cas + routing tokens, CRLF/LF/CR/null injection rejection on set/get/delete, non-String token rejection, connection-state preservation after a rejected request, and empty-string no-op behavior on every command.
The first attempt at this PR added support for
meta_flagsto all memcached command methods (and multi versions). There were a few issues with this approach which became clear:<val>to[<val>, <response flags>]; methods likefetchare impacted by this, andfetchno longer properly handled cache misses in theget, since it didn't know how to differentiate the tuple from a cache miss. We could add an explicit check for this case, but this seems brittle.fetchwhere each flag must be supported by both thegetandsetcommands or else one or the other will error.To get around this, passing arbitrary meta flags is reverted here for all commands (excluding the gets where it was previously supported). Instead, add new kwargs
p_tokenandl_tokenwhich are, as outlined by the memcached spec, accepted for all commands. Now, these kwargs can be safely plumbed through the codebase as a separate set of kwargs which operate independently ofmeta_flags. This is safer and more well defined.A followup PR will also add explicit support for the
xandiflags for deletes, to implement tombstone pattern support. That PR will follow a similar pattern to here.I think there's probably an argument to be made around introducing a new top-level API to clean up some of the more confusing method signatures, as we start adding more support to the API. But for now these are small targeted changes.