feat(dashboard): Model Calling 实时监控扩展至全 vendor,修复排队队列长度不可见#253
Merged
ThreeFish-AI merged 4 commits intoMay 27, 2026
Merged
Conversation
将 ModelConcurrencyLimiter 重构为 ModelConcurrencyController,新增 monitor 模式(config=None,仅计数不限流)与既有 limited 模式(Zhipu 等限流场景)。 - _ConcurrencySlot 增加 limit=None 分支,fast path 直接 in_use++; - 新增 track() async ctx mgr 与 mode 属性,便于 executor 层 async with 包裹; - 新增 _peak_samples 滑窗,get_diagnostics 输出最近 10s pending 峰值; - BaseVendor 注入默认 monitor controller,新增 track_in_flight() / update_concurrency() 默认实现;get_diagnostics 自动合入 concurrency 字段; - ZhipuVendor 删除内部 _maybe_acquire_concurrency_slot,统一委托 BaseVendor + executor 层包裹,slot 跨 429 重试自然持有; - 保留 ModelConcurrencyLimiter 别名向后兼容。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
将 Model Calling 卡片从仅监控 Zhipu 扩展到所有 vendor 的所有模型(仅 CC 场景)。
- executor: 在 execute_stream / execute_message 调用 vendor.send_message[_stream]
前用 async with vendor.track_in_flight(mapped_model) 包裹,覆盖所有 vendor
(包括完全覆写 send 方法的 Copilot / Antigravity);
- dashboard: updateModelCalling 按 mode 字段区分渲染:
- limited 模式(Zhipu)保留进度条 + limit 可编辑 + ⏳ pending 徽章;
- monitor 模式(其他 vendor)仅显示 in_use 计数徽章,无可编辑控件;
- 新增 mc-badge-peak「🕘 曾排队 N」灰色余晖徽章,覆盖 pending=0 但
peak_pending_recent>0 的场景;
- 轮询间隔由 5000ms 缩短至 1500ms,提升瞬时排队可观测性;
- routes: /api/concurrency PUT 在 monitor-only vendor 上返回 422(区别于参数
非法的 400),错误消息明确指示不支持调整。
🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist)
Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
- test_zhipu_concurrency: 适配 Controller 重构(导入别名 + diagnostics 字段 断言更新),新增 test_peak_pending_recent_tracking 验证余晖记录; ZhipuVendor 测试通过 _send_with_tracking / _stream_with_tracking 辅助函数 模拟 executor 层 track_in_flight 包裹,行为等价; - test_concurrency_monitor(新增): 覆盖 monitor 模式 acquire 不阻塞、 pending 恒为 0、set_limit 抛 ValueError、100 并发峰值正确等场景; BaseVendor.track_in_flight 默认行为(空 model 名 no-op、非空委托 controller); - test_executor_in_flight_tracking(新增): 用探针式 fake controller 验证 executor 在调用 send_message[_stream] 前进入 track 上下文, 调用结束(含异常)后正确退出; - test_router_executor: _mock_vendor 新增 track_in_flight 字段(nullcontext) 以适配 executor 调用约定。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
- 监控扩展:所有 vendor 在 CC 场景下都呈现 in_use 实时计数(monitor 模式); - 限流收敛:Zhipu 保留 limit + FIFO 排队(limited 模式),其他 vendor 仅观察; - 队列可见性:新增 peak_pending_recent 余晖追踪 + 轮询间隔由 5s 缩短至 1.5s; - 抽象统一:ModelConcurrencyLimiter → ModelConcurrencyController(保留别名)。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
当前 Dashboard / Overview 上的「📡 Model Calling 实时状态」卡片仅 Zhipu 一家可见,其他 vendor 在 CC(Claude Code)场景下完全无并行度可视化;同时 Zhipu 限流排队时「队列长度」也观察不到(5 秒轮询过粗,瞬时排队完全错过)。
改动
架构重构
ModelConcurrencyLimiter→ModelConcurrencyController:统一 monitor / limited 双模式抽象,保留别名向后兼容;_ConcurrencySlot新增limit=None分支(monitor 模式:fast path 仅计数,无排队);BaseVendor注入默认 monitor controller +track_in_flight()async ctx mgr +update_concurrency()默认实现;get_diagnostics()自动合入concurrency字段;ZhipuVendor删除内部_maybe_acquire_concurrency_slot,并发控制统一委托 BaseVendor + executor 层包裹,slot 跨 429 重试自然持有(async with包整个 send_message[_stream] 调用链);_RouteExecutor在execute_stream/execute_message调用 vendor 前用async with tier.vendor.track_in_flight(mapped_model)包裹,覆盖所有 vendor(含完全覆写 send 方法的 Copilot / Antigravity)。Dashboard / API
updateModelCalling按mode字段区分渲染:monitor 模式仅显示 in_use 计数徽章(无 limit / 进度条 / 编辑控件),limited 模式保留既有效果;peak_pending_recent10s 滑窗追踪 +🕘 曾排队 N灰色余晖徽章(覆盖pending=0 但峰值>0的瞬时排队场景);5000ms → 1500ms;/api/concurrencyPUT 对 monitor-only vendor 返回422(区别于参数非法的 400)。测试
tests/test_concurrency_monitor.py:覆盖 monitor 模式 acquire 不阻塞、pending 恒 0、set_limit抛ValueError、100 并发峰值正确等;tests/test_executor_in_flight_tracking.py:探针式 fake controller 验证 executor 在调用前进入 track、调用结束(含异常)后正确退出;tests/test_zhipu_concurrency.py适配重构(通过_send_with_tracking辅助函数模拟 executor 层包裹,行为等价),新增test_peak_pending_recent_tracking;_mock_vendor增加track_in_flight = nullcontext()字段以适配 executor 调用约定;设计决策
async with track_in_flightlimit=None表示 monitorasync with包整个调用链验证
uv run pytest1539 passed, 17 deselected in 15.44s;system-instruction-you-are-working-imperative-taco.md。Test plan
uv run pytest全量通过uv run ruff check src/ tests/干净uv run ruff format --check src/ tests/干净3/3+⏳ 2,释放后看到余晖徽章/api/concurrencyPUT 对非 zhipu vendor 返回 422