Skip to content

feat(zhipu): 新增每模型并发限制,默认 3 个并行请求 FIFO 排队#248

Merged
ThreeFish-AI merged 2 commits into
feature/1.x.xfrom
ThreeFish-AI/zhipu-model-concurrency-limit
May 25, 2026
Merged

feat(zhipu): 新增每模型并发限制,默认 3 个并行请求 FIFO 排队#248
ThreeFish-AI merged 2 commits into
feature/1.x.xfrom
ThreeFish-AI/zhipu-model-concurrency-limit

Conversation

@ThreeFish-AI

@ThreeFish-AI ThreeFish-AI commented May 25, 2026

Copy link
Copy Markdown
Owner

背景

在缺乏客户端并发控制时,多个 Claude Code 实例并发请求会瞬时打爆智谱 API,频繁触发 429 Rate Limit 并放大故障转移开销。本 PR 为 zhipu 引入按映射后模型的并发上限,将上游压力收敛到可控水平。

主要变更

  • 配置模型:在 config/vendors.py 新增 ZhipuConcurrencyConfigdefault + models 双层),ZhipuConfigVendorConfig 接入 concurrency 字段,工厂在 case "zhipu" 中按需转发;config.default.yaml 默认开启 default: 3
  • 并发控制器:新增 vendors/concurrency.pyModelConcurrencyLimiter 基于 asyncio.Semaphore 按映射后模型名(glm-5v-turbo / glm-5.1 / glm-4.5-air 等)惰性创建独立信号量,天然 FIFO 公平排队;附带 get_diagnostics() 输出每个模型的 limit/in_use/available
  • ZhipuVendor 集成send_messagesend_message_stream 在入口处通过 self.map_model() 解析目标模型并 acquire 槽位,try/finally 保证异常路径下也能释放;流式与非流式共用同一信号量;429 重试在槽位持有期间进行(重试视为同一请求的延续,不释放槽位)。
  • 可观测性与文档docs/arch/config-reference.md 新增 §5.5 ZhipuConcurrencyConfig 参数说明与 YAML 示例;CHANGELOG.md 登记 Unreleased 项。
  • 测试:新增 tests/test_zhipu_concurrency.py 共 18 个用例,覆盖配置层验证、限制器单元、非流式/流式并发上限、跨模型独立性、异常释放、429 重试兼容、concurrency=None 完全旁路。

实现要点

  • 信号量按映射后模型名键控,避免不同 Claude 输入路由到同一 GLM 模型时被切分成多个槽位,与上游真实承载模型对齐。
  • map_model() 是纯同步字典查找,在 Semaphore 等待前调用安全,确保排队键稳定。
  • 流式路径无法使用 @asynccontextmanager,改为手动 acquire/release + try/finally;429 在 status code 检查阶段即抛出(无已发出 chunk),重试不会污染下游字节流。
  • concurrency=None_concurrency_limiter 为 None,行为与旧版完全一致,向后兼容。

验证

  • uv run pytest tests/test_zhipu_concurrency.py -v → 18 passed
  • uv run pytest tests/ → 1516 passed, 17 deselected(含原有 zhipu / native_vendors / config_loader 等回归全绿)
  • uv run ruff check + ruff format 通过

配置示例

- vendor: zhipu
  concurrency:
    default: 3
    models:
      glm-5v-turbo: 5
      glm-5.1: 2

- 新增 ZhipuConcurrencyConfig 与 ModelConcurrencyLimiter,按映射后模型名(如 glm-5v-turbo / glm-5.1 / glm-4.5-air)维护独立 asyncio.Semaphore,槽位满时新请求 FIFO 排队等待;
- ZhipuVendor 流式与非流式入口共用同一信号量,并与既有 429 重试机制兼容(重试期间持续占用槽位);
- VendorConfig 新增 concurrency 字段,由工厂转发至 ZhipuConfig,未配置时回退默认 default=3,concurrency=None 完全禁用限流;
- 同步更新 docs/arch/config-reference.md 与 CHANGELOG.md,新增 18 项专项测试(含配置层、限制器单元、流式/非流式集成与异常释放)。

🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist)
Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
@ThreeFish-AI ThreeFish-AI merged commit f46c21a into feature/1.x.x May 25, 2026
6 checks passed
@ThreeFish-AI ThreeFish-AI deleted the ThreeFish-AI/zhipu-model-concurrency-limit branch May 26, 2026 02:08
@ThreeFish-AI ThreeFish-AI mentioned this pull request May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant