fix(router-core): preserve percent-encoded URL-unsafe chars in decodeSegment#7695
fix(router-core): preserve percent-encoded URL-unsafe chars in decodeSegment#7695CDillinger wants to merge 1 commit into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
View your CI Pipeline Execution ↗ for commit dd05e5d
☁️ Nx Cloud last updated this comment at |
Merging this PR will improve performance by 11.41%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Memory | mem streaming-peak chunked (vue) |
11.9 MB | 12.3 MB | -3.2% |
| ⚡ | Memory | mem aborted-requests (solid) |
2.4 MB | 1.7 MB | +35.53% |
| ⚡ | Memory | mem aborted-requests (vue) |
1,021 KB | 900.1 KB | +13.43% |
| ⚡ | Memory | mem peak-large-page (solid) |
3.9 MB | 3.7 MB | +3.54% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing CDillinger:fix/decode-path-preserve-unsafe-chars (dd05e5d) with main (ba52d2b)1
Footnotes
782242e to
7123395
Compare
|
There are two issues with the current approach. First, a single Unicode character can span multiple percent-encoded bytes. For example: 'ш' // %D1%88
'🚀' // %F0%9F%9A%80Decoding each This also needs to work when safe and unsafe characters are adjacent. For example: '🚀@' // %F0%9F%9A%80%40Here,
Based on the current test suite, I compiled the following exclusion set: const PATH_KEEP_ENCODED = /^[\x00-\x1F\x7F\x20"#$%&+,/:;<=>?@`^\\{}]$/This retains control characters, spaces, router-reserved characters, and other path-sensitive values in their encoded form. It intentionally does not exclude the full component percent-encode set, as doing so would also preserve characters such as The remaining test differences are expected:
I have an update to the PR ready to handle this. Since this runs on a hot path, it would be useful for @Sheraff to review the implementation and suggest any performance refinements. |
4fca828 to
00296a3
Compare
|
This is the fastest I got (so far) that passes the new tests and the olds ones: function decodeSegment(segment: string): string {
if (segment.indexOf('%') !== -1) {
try {
return decodeURI(segment)
} catch {}
}
return segment
}
// ...
// Match percent-encoded bytes that `decodeURI` would expose but that must
// stay encoded in paths: percent signs, backslashes, controls, and the
// WHATWG path percent-encode set.
const re = /%(?:[01][\dA-F]|2[025]|3[CE]|5C|60|7[BDF])/gi
let cursor = 0
let result = ''
let match
while (null !== (match = re.exec(path))) {
result += decodeSegment(path.slice(cursor, match.index)) + match[0]
cursor = re.lastIndex
}
if (cursor) {
result += decodeSegment(path.slice(cursor))
// eslint-disable-next-line no-control-regex
if (/[\x00-\x1f\x7f]/.test(path)) {
result = sanitizePathSegment(result)
}
} else {
result = sanitizePathSegment(decodeSegment(path))
}But I think correctness matters more here, so @nlynzaad and @CDillinger you should make sure the tests cover what we need and to have a working version, and we can merge as soon as that is ready. I can work on perf afterwards. BTW: it is expected for the Bundle Size, and the Labeler workflows to break on forks, but PR/Test should pass |
…Segment Replace sanitizePathSegment (which stripped control characters) with a re-encode step that keeps WHATWG path percent-encode set characters and control characters in their encoded form after decodeURI. This preserves the existing decodeURI-based approach which correctly handles multi-byte UTF-8 sequences, while fixing the mismatch between the original request URL and the router's internal representation that caused infinite 307 redirect loops on paths containing these characters. Fixes TanStack#7587. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
00296a3 to
dd05e5d
Compare
There was a problem hiding this comment.
Nx Cloud is proposing a fix for your failed CI:
We updated the open-redirect e2e test in react-router/basic-file-based to align with the new decodeSegment behavior introduced by this PR. The stale assertion (expect(url.pathname).toMatch(/^\/test-path\/?$/)) assumed the old "strip CR then collapse //" approach, but %0d is now kept encoded so the path resolves to /%0D/test-path rather than /test-path. This mirrors the identical fix already applied to the equivalent react-start/basic test in the PR.
Tip
✅ We verified this fix by re-running tanstack-router-e2e-react-basic-file-based:test:e2e, tanstack-react-start-e2e-basic:test:e2e--rsbuild-prerender.
diff --git a/e2e/react-router/basic-file-based/tests/open-redirect-prevention.spec.ts b/e2e/react-router/basic-file-based/tests/open-redirect-prevention.spec.ts
index 3ad83fb4..2f0fe256 100644
--- a/e2e/react-router/basic-file-based/tests/open-redirect-prevention.spec.ts
+++ b/e2e/react-router/basic-file-based/tests/open-redirect-prevention.spec.ts
@@ -69,10 +69,7 @@ test.describe('Open redirect prevention', () => {
page,
baseURL,
}) => {
- // When control characters are stripped from paths like /%0d/evil.com/
- // the result could be //evil.com/ which is a protocol-relative URL
- // Our fix collapses these to /evil.com/ to prevent external redirects
- // This is already tested above, but we verify the collapsed path works
+ // %0d is kept encoded, so /%0d/test-path/ stays as-is and won't become //test-path/
await page.goto('/%0d/test-path/')
await page.waitForLoadState('networkidle')
@@ -80,8 +77,6 @@ test.describe('Open redirect prevention', () => {
expect(page.url().startsWith(baseURL!)).toBe(true)
const url = new URL(page.url())
expect(url.origin).toBe(new URL(baseURL!).origin)
- // Path should be collapsed to /test-path (not //test-path/)
- expect(url.pathname).toMatch(/^\/test-path\/?$/)
})
})
Because this branch comes from a fork, it is not possible for us to apply fixes directly, but you can apply the changes locally using the available options below.
Apply changes locally with:
npx nx-cloud apply-locally N4iC-eU6c
Apply fix locally with your editor ↗ View interactive diff ↗
🎓 Learn more about Self-Healing CI on nx.dev
What
decodeSegment(called bydecodePath) previously useddecodeURI()which decoded all percent-encoded characters — including those that are unsafe in URL paths per the WHATWG spec. This caused the router's internal path representation to differ from the raw request URL, which the SSR redirect comparator interpreted as a URL change, triggering infinite 307 redirect loops.This PR replaces the
decodeURI()-based approach with per-character decoding that preserves:",<,>,`,{,}Reproduction
Any TanStack Start app with a path param route will infinite-loop on URLs containing encoded curly braces, angle brackets, etc:
Why this approach
The previous implementation decoded everything in
decodeSegmentand then tried to fix problems after the fact (sanitizePathSegmentstripped control chars,encodePathLikeUrlwas supposed to re-encode). This "decode then patch" approach is fragile — any character missed by the downstream fixups creates a mismatch.The cleaner fix is to not decode these characters in the first place. The router still decodes all "safe" characters (unicode, regular ASCII letters/symbols) so route matching and param extraction work as expected.
sanitizePathSegmentis no longer needed since control characters are never decoded. The protocol-relative URL defense (//collapsing) is kept as defense-in-depth.Fixes #7587.