fix(zhipu): 将 429/529 兜底退避抖动从 Full Jitter 改为 Equal Jitter;#263
Merged
ThreeFish-AI merged 1 commit intoJun 30, 2026
Merged
Conversation
修复 529 过载重试延迟非单调问题(实测 418.8→1857.7→961.6→3769.7ms, 非递增)。根因为 calculate_delay 的 Full Jitter(random.uniform(0, ceiling)) 本质非单调,且 529 通常无 Retry-After 头而落入该兜底分支。改为 Equal Jitter (temp/2 + random(0, temp/2))后区间为 [500,1000]→[1000,2000]→[2000,4000] →[4000,8000],单调非递减;429/529 共用退避路径同步受益,retry-after 优先级不变。 - 新增 tests/test_retry.py 独立单元测试(calculate_delay 此前零覆盖) - 新增 test_zhipu.py::test_529_equal_jitter_delay_in_expected_band - 同步更新 retry.py / zhipu.py docstring、CHANGELOG、issue.md 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
cc 调用 zhipu 返回 529 过载触发重试时,延迟序列呈非单调形态(实测
418.8ms → 1857.7ms → 961.6ms → 3769.7ms,第 3 次反而短于第 2 次),不像 429 那样呈现干净的指数退避。期望 529 与 429 使用相同的指数退避规则。根因
经核查,429 与 529 在代码层面早已共用同一退避路径
ZhipuVendor._compute_retry_delay_from_headers,并非"两套规则"。感知差异来自两点叠加:Retry-After头 → 走确定性 server-guided 路径(retry_after * 1.1),看起来"干净递增";529(过载)响应通常不携带该头 → 落入抖动兜底分支calculate_delay。random.uniform(0, ceiling)),其本质就是非单调的——每次延迟是[0, ceiling]区间均匀随机值。实测值逐项精确匹配attempt 0/1/2/3的random(0,1000)/(0,2000)/(0,4000)/(0,8000)。修复
将
calculate_delay的抖动从 Full Jitter 改为 Equal Jitter(AWS M. Brooker, "Exponential Backoff And Jitter," 2015):Zhipu 配置下各 attempt 延迟区间由
[0,1000]/[0,2000]/[0,4000]/[0,8000]收窄为[500,1000]/[1000,2000]/[2000,4000]/[4000,8000],相邻区间仅边界相切,延迟几乎必然单调非递减。429/529 共用路径同步受益;保留抖动以防惊群;retry-after优先级链不动。变更内容
src/coding/proxy/routing/retry.py:calculate_delay核心改动(1 行)+ 模块/函数 docstring(含单调性契约边界声明)src/coding/proxy/vendors/zhipu.py:2 处 docstring 同步(Full Jitter → Equal Jitter)tests/test_retry.py:新建,6 个独立单元测试(填补calculate_delay此前零覆盖缺口:无抖动精确指数、Equal Jitter 区间边界、max 封顶、单调非递减、极小 initial、可复现性)tests/test_zhipu.py:新增test_529_equal_jitter_delay_in_expected_band(529 无 retry-after 时首跳落在[0.5, 1.0]s)CHANGELOG.md/docs/.agents/issue.md:变更记录与根因沉淀验证
uv run pytest:1608 passed,零回归uv run ruff check / format+ pre-commit hooks:全通过设计取舍
calculate_delay(方案 A)而非新增可配置 jitter 字段(方案 B):calculate_delay经全仓库 grep 确证仅被 Zhipu 调用;方案 B 既属 YAGNI 又会恶化仓库既有的"双RetryConfig死代码"债(routing/retry.py活跃 vsconfig/resiliency.py死代码),已记入 issue.md 待后续清理。backoff_multiplier ≥ 2.0且未触及max_delay_ms封顶(当前max_retries=4触及不到),已在 docstring 写明。🤖 Generated with Claude Code