perf: bound token-validation cost on adversarial input#121
Open
rmyndharis wants to merge 1 commit into
Open
Conversation
Three small guards so a hostile DESIGN.md cannot pin CPU or exhaust the call stack. All inputs are at the documented untrusted boundary (arbitrary file/stdin), and none of the changes alter results for legitimate input. - parseDimensionParts (and token-like-ignored's CSS_DIMENSION_RE) backtrack quadratically on long all-digit strings. Cap value length to 64 chars before matching; real CSS dimensions are far shorter. - unknown-key runs an O(n*m) Levenshtein DP against every schema key for each unknown key. Skip a schema key whose length differs by more than the typo threshold — edit distance is at least the length difference, so the set of suggestions is unchanged. - parseCssColor recurses for nested color-mix() with no depth bound. Thread a depth counter and stop at 32, so an over-deep value resolves to an invalid color (a precise error finding) instead of a RangeError that collapses the whole model build.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The CLI parses arbitrary user-supplied markdown/YAML (file or stdin). Three validation paths can be driven into super-linear CPU or unbounded recursion by a crafted DESIGN.md. Each fix is a small guard that leaves results for legitimate input unchanged.
1. Quadratic backtracking in dimension validation
parseDimensionParts(model/spec.ts) uses/^(-?\d*\.?\d+)([a-zA-Z%]+)$/; the same shape is intoken-like-ignored'sCSS_DIMENSION_RE. On a long all-digit string the engine backtracks every split point (clean O(n²) — measurable seconds at ~80k chars). Real CSS dimensions are a handful of characters, so a leaf value over 64 chars is rejected before the regex runs.2. Unbounded Levenshtein cost for unknown keys
unknown-keybuilds a full(m+1)×(n+1)DP matrix against every schema key for each unknown top-level key, with no length guard. Since edit distance is always ≥ the length difference, a key whose length differs from a schema key by more thanMAX_TYPO_DISTANCEcan never be a typo — those comparisons are now skipped. Suggestions are provably unchanged.3. Unbounded
color-mix()recursionparseCssColorrecurses into each inner color of acolor-mix()with no depth bound; a deeply nested value throwsRangeError: Maximum call stack size exceeded, which the model's catch-all turns into a single generic error that discards every other finding for the file. A depth counter (cap 32) makes an over-deep value resolve to a normal "invalid color" finding instead, leaving the rest of the model intact.Testing
bun test: 285 pass, 1 skip, 0 fail (added 3 tests). The suite completes in ~0.3s, including a 100k-char dimension, a 50k-char unknown key, and a 50-deepcolor-mix— all previously slow/throwing, now instant/graceful.bun run lint(tsc --noEmit): clean.