feat(zhipu): 新增每模型并发限制,默认 3 个并行请求 FIFO 排队#248
Merged
ThreeFish-AI merged 2 commits intoMay 25, 2026
Conversation
- 新增 ZhipuConcurrencyConfig 与 ModelConcurrencyLimiter,按映射后模型名(如 glm-5v-turbo / glm-5.1 / glm-4.5-air)维护独立 asyncio.Semaphore,槽位满时新请求 FIFO 排队等待; - ZhipuVendor 流式与非流式入口共用同一信号量,并与既有 429 重试机制兼容(重试期间持续占用槽位); - VendorConfig 新增 concurrency 字段,由工厂转发至 ZhipuConfig,未配置时回退默认 default=3,concurrency=None 完全禁用限流; - 同步更新 docs/arch/config-reference.md 与 CHANGELOG.md,新增 18 项专项测试(含配置层、限制器单元、流式/非流式集成与异常释放)。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
… warning; 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
在缺乏客户端并发控制时,多个 Claude Code 实例并发请求会瞬时打爆智谱 API,频繁触发 429 Rate Limit 并放大故障转移开销。本 PR 为 zhipu 引入按映射后模型的并发上限,将上游压力收敛到可控水平。
主要变更
config/vendors.py新增ZhipuConcurrencyConfig(default+models双层),ZhipuConfig与VendorConfig接入concurrency字段,工厂在case "zhipu"中按需转发;config.default.yaml默认开启default: 3。vendors/concurrency.py,ModelConcurrencyLimiter基于asyncio.Semaphore按映射后模型名(glm-5v-turbo/glm-5.1/glm-4.5-air等)惰性创建独立信号量,天然 FIFO 公平排队;附带get_diagnostics()输出每个模型的limit/in_use/available。send_message与send_message_stream在入口处通过self.map_model()解析目标模型并 acquire 槽位,try/finally保证异常路径下也能释放;流式与非流式共用同一信号量;429 重试在槽位持有期间进行(重试视为同一请求的延续,不释放槽位)。docs/arch/config-reference.md新增 §5.5 ZhipuConcurrencyConfig 参数说明与 YAML 示例;CHANGELOG.md登记 Unreleased 项。tests/test_zhipu_concurrency.py共 18 个用例,覆盖配置层验证、限制器单元、非流式/流式并发上限、跨模型独立性、异常释放、429 重试兼容、concurrency=None完全旁路。实现要点
map_model()是纯同步字典查找,在 Semaphore 等待前调用安全,确保排队键稳定。@asynccontextmanager,改为手动acquire/release+try/finally;429 在 status code 检查阶段即抛出(无已发出 chunk),重试不会污染下游字节流。concurrency=None时_concurrency_limiter为 None,行为与旧版完全一致,向后兼容。验证
uv run pytest tests/test_zhipu_concurrency.py -v→ 18 passeduv run pytest tests/→ 1516 passed, 17 deselected(含原有 zhipu / native_vendors / config_loader 等回归全绿)uv run ruff check+ruff format通过配置示例