perf: bound token-validation cost on adversarial input by rmyndharis · Pull Request #121 · google-labs-code/design.md

rmyndharis · 2026-06-25T15:12:28Z

Summary

The CLI parses arbitrary user-supplied markdown/YAML (file or stdin). Three validation paths can be driven into super-linear CPU or unbounded recursion by a crafted DESIGN.md. Each fix is a small guard that leaves results for legitimate input unchanged.

1. Quadratic backtracking in dimension validation

parseDimensionParts (model/spec.ts) uses /^(-?\d*\.?\d+)([a-zA-Z%]+)$/; the same shape is in token-like-ignored's CSS_DIMENSION_RE. On a long all-digit string the engine backtracks every split point (clean O(n²) — measurable seconds at ~80k chars). Real CSS dimensions are a handful of characters, so a leaf value over 64 chars is rejected before the regex runs.

2. Unbounded Levenshtein cost for unknown keys

unknown-key builds a full (m+1)×(n+1) DP matrix against every schema key for each unknown top-level key, with no length guard. Since edit distance is always ≥ the length difference, a key whose length differs from a schema key by more than MAX_TYPO_DISTANCE can never be a typo — those comparisons are now skipped. Suggestions are provably unchanged.

3. Unbounded `color-mix()` recursion

parseCssColor recurses into each inner color of a color-mix() with no depth bound; a deeply nested value throws RangeError: Maximum call stack size exceeded, which the model's catch-all turns into a single generic error that discards every other finding for the file. A depth counter (cap 32) makes an over-deep value resolve to a normal "invalid color" finding instead, leaving the rest of the model intact.

Testing

bun test: 285 pass, 1 skip, 0 fail (added 3 tests). The suite completes in ~0.3s, including a 100k-char dimension, a 50k-char unknown key, and a 50-deep color-mix — all previously slow/throwing, now instant/graceful.
bun run lint (tsc --noEmit): clean.
The new color-mix test asserts the over-deep value is rejected per-token and that a sibling valid color still resolves (i.e. no model-wide collapse).

Three small guards so a hostile DESIGN.md cannot pin CPU or exhaust the call stack. All inputs are at the documented untrusted boundary (arbitrary file/stdin), and none of the changes alter results for legitimate input. - parseDimensionParts (and token-like-ignored's CSS_DIMENSION_RE) backtrack quadratically on long all-digit strings. Cap value length to 64 chars before matching; real CSS dimensions are far shorter. - unknown-key runs an O(n*m) Levenshtein DP against every schema key for each unknown key. Skip a schema key whose length differs by more than the typo threshold — edit distance is at least the length difference, so the set of suggestions is unchanged. - parseCssColor recurses for nested color-mix() with no depth bound. Thread a depth counter and stop at 32, so an over-deep value resolves to an invalid color (a precise error finding) instead of a RangeError that collapses the whole model build.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: bound token-validation cost on adversarial input#121

perf: bound token-validation cost on adversarial input#121
rmyndharis wants to merge 1 commit into
google-labs-code:mainfrom
rmyndharis:fix/harden-token-parsing-cost

rmyndharis commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rmyndharis commented Jun 25, 2026

Summary

1. Quadratic backtracking in dimension validation

2. Unbounded Levenshtein cost for unknown keys

3. Unbounded color-mix() recursion

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

3. Unbounded `color-mix()` recursion