feat(dashboard): Model Calling 实时监控扩展至全 vendor，修复排队队列长度不可见 by ThreeFish-AI · Pull Request #253 · ThreeFish-AI/coding-proxy

ThreeFish-AI · 2026-05-27T05:59:59Z

背景

当前 Dashboard / Overview 上的「📡 Model Calling 实时状态」卡片仅 Zhipu 一家可见，其他 vendor 在 CC（Claude Code）场景下完全无并行度可视化；同时 Zhipu 限流排队时「队列长度」也观察不到（5 秒轮询过粗，瞬时排队完全错过）。

改动

架构重构

ModelConcurrencyLimiter → ModelConcurrencyController：统一 monitor / limited 双模式抽象，保留别名向后兼容；
_ConcurrencySlot 新增 limit=None 分支（monitor 模式：fast path 仅计数，无排队）；
BaseVendor 注入默认 monitor controller + track_in_flight() async ctx mgr + update_concurrency() 默认实现；get_diagnostics() 自动合入 concurrency 字段；
ZhipuVendor 删除内部 _maybe_acquire_concurrency_slot，并发控制统一委托 BaseVendor + executor 层包裹，slot 跨 429 重试自然持有（async with 包整个 send_message[_stream] 调用链）；
_RouteExecutor 在 execute_stream / execute_message 调用 vendor 前用 async with tier.vendor.track_in_flight(mapped_model) 包裹，覆盖所有 vendor（含完全覆写 send 方法的 Copilot / Antigravity）。

Dashboard / API

updateModelCalling 按 mode 字段区分渲染：monitor 模式仅显示 in_use 计数徽章（无 limit / 进度条 / 编辑控件），limited 模式保留既有效果；
新增 peak_pending_recent 10s 滑窗追踪 + 🕘 曾排队 N 灰色余晖徽章（覆盖 pending=0 但峰值>0 的瞬时排队场景）；
Model Calling 轮询间隔 5000ms → 1500ms；
/api/concurrency PUT 对 monitor-only vendor 返回 422（区别于参数非法的 400）。

测试

新增 tests/test_concurrency_monitor.py：覆盖 monitor 模式 acquire 不阻塞、pending 恒 0、set_limit 抛 ValueError、100 并发峰值正确等；
新增 tests/test_executor_in_flight_tracking.py：探针式 fake controller 验证 executor 在调用前进入 track、调用结束（含异常）后正确退出；
tests/test_zhipu_concurrency.py 适配重构（通过 _send_with_tracking 辅助函数模拟 executor 层包裹，行为等价），新增 test_peak_pending_recent_tracking；
_mock_vendor 增加 track_in_flight = nullcontext() 字段以适配 executor 调用约定；
全量 1539 个测试用例通过，lint + format 干净。

设计决策

决策点	选择	理由
监控接入层	Executor 层 `async with track_in_flight`	Copilot / Antigravity 完全覆写 send 方法不调 super()，BaseVendor 包装会漏；executor 是 CC 路径唯一汇聚点
并发抽象	单一 Controller，`limit=None` 表示 monitor	一套数据通道 + diagnostics，前端按 mode 区分渲染
Zhipu 跨 429 持有 slot	executor `async with` 包整个调用链	行为与原内部 acquire 等价，去除冗余抽象
队列可见性	缩短轮询 + 服务端峰值滑窗	双保险：常态实时性 + 瞬时排队余晖

验证

全量测试：uv run pytest 1539 passed, 17 deselected in 15.44s；
端到端：见 plan 文件 system-instruction-you-are-working-imperative-taco.md。

Test plan

uv run pytest 全量通过
uv run ruff check src/ tests/ 干净
uv run ruff format --check src/ tests/ 干净
手动验证：启动 proxy 触发 Zhipu 并发请求，Dashboard 显示 3/3 + ⏳ 2，释放后看到余晖徽章
手动验证：切换至 kimi/copilot 等 vendor 触发并发，Dashboard 显示纯计数徽章（无 limit 分母）
手动验证：/api/concurrency PUT 对非 zhipu vendor 返回 422

将 ModelConcurrencyLimiter 重构为 ModelConcurrencyController，新增 monitor 模式（config=None，仅计数不限流）与既有 limited 模式（Zhipu 等限流场景）。 - _ConcurrencySlot 增加 limit=None 分支，fast path 直接 in_use++； - 新增 track() async ctx mgr 与 mode 属性，便于 executor 层 async with 包裹； - 新增 _peak_samples 滑窗，get_diagnostics 输出最近 10s pending 峰值； - BaseVendor 注入默认 monitor controller，新增 track_in_flight() / update_concurrency() 默认实现；get_diagnostics 自动合入 concurrency 字段； - ZhipuVendor 删除内部 _maybe_acquire_concurrency_slot，统一委托 BaseVendor + executor 层包裹，slot 跨 429 重试自然持有； - 保留 ModelConcurrencyLimiter 别名向后兼容。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>

将 Model Calling 卡片从仅监控 Zhipu 扩展到所有 vendor 的所有模型（仅 CC 场景）。 - executor: 在 execute_stream / execute_message 调用 vendor.send_message[_stream] 前用 async with vendor.track_in_flight(mapped_model) 包裹，覆盖所有 vendor （包括完全覆写 send 方法的 Copilot / Antigravity）； - dashboard: updateModelCalling 按 mode 字段区分渲染： - limited 模式（Zhipu）保留进度条 + limit 可编辑 + ⏳ pending 徽章； - monitor 模式（其他 vendor）仅显示 in_use 计数徽章，无可编辑控件； - 新增 mc-badge-peak「🕘 曾排队 N」灰色余晖徽章，覆盖 pending=0 但 peak_pending_recent>0 的场景； - 轮询间隔由 5000ms 缩短至 1500ms，提升瞬时排队可观测性； - routes: /api/concurrency PUT 在 monitor-only vendor 上返回 422（区别于参数非法的 400），错误消息明确指示不支持调整。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>

- test_zhipu_concurrency: 适配 Controller 重构（导入别名 + diagnostics 字段断言更新），新增 test_peak_pending_recent_tracking 验证余晖记录； ZhipuVendor 测试通过 _send_with_tracking / _stream_with_tracking 辅助函数模拟 executor 层 track_in_flight 包裹，行为等价； - test_concurrency_monitor（新增）: 覆盖 monitor 模式 acquire 不阻塞、 pending 恒为 0、set_limit 抛 ValueError、100 并发峰值正确等场景； BaseVendor.track_in_flight 默认行为（空 model 名 no-op、非空委托 controller）； - test_executor_in_flight_tracking（新增）: 用探针式 fake controller 验证 executor 在调用 send_message[_stream] 前进入 track 上下文，调用结束（含异常）后正确退出； - test_router_executor: _mock_vendor 新增 track_in_flight 字段（nullcontext）以适配 executor 调用约定。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>

- 监控扩展：所有 vendor 在 CC 场景下都呈现 in_use 实时计数（monitor 模式）； - 限流收敛：Zhipu 保留 limit + FIFO 排队（limited 模式），其他 vendor 仅观察； - 队列可见性：新增 peak_pending_recent 余晖追踪 + 轮询间隔由 5s 缩短至 1.5s； - 抽象统一：ModelConcurrencyLimiter → ModelConcurrencyController（保留别名）。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>

ThreeFish-AI added 4 commits May 27, 2026 13:55

ThreeFish-AI merged commit 804fe92 into feature/1.x.x May 27, 2026
6 checks passed

ThreeFish-AI deleted the ThreeFish-AI/extend-model-monitoring-queue-fix branch May 27, 2026 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dashboard): Model Calling 实时监控扩展至全 vendor，修复排队队列长度不可见#253

feat(dashboard): Model Calling 实时监控扩展至全 vendor，修复排队队列长度不可见#253
ThreeFish-AI merged 4 commits into
feature/1.x.xfrom
ThreeFish-AI/extend-model-monitoring-queue-fix

ThreeFish-AI commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ThreeFish-AI commented May 27, 2026

背景

改动

设计决策

验证

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant