feat(concurrency): 支持运行时动态修改每模型并行度 (#251)#251
Merged
Conversation
重构 ModelConcurrencyLimiter,以自定义 _ConcurrencySlot 替代 asyncio.Semaphore, 支持 set_limit() 动态调整上限。新增 PUT /api/concurrency 端点,Dashboard Model Calling 模块中 limit 数字可直接点击编辑(1-20),无需重启进程。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
Escape 取消时 restore() 移除 input 元素会触发浏览器 blur 事件, 导致 blur handler 通过 setTimeout 在 50ms 后调用 submit() 将已取消 的值发送到服务端。引入 _cancelled 标志在 Escape 时置位,submit 入口 及 blur 回调中双重守卫,确保取消操作不被忽略。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
重构
ModelConcurrencyLimiter并发控制核心,以自定义_ConcurrencySlot(基于asyncio.Event)替代asyncio.Semaphore,使其支持运行时动态调整每模型并发上限;同时新增PUT /api/concurrencyAPI 端点与 Dashboard 前端可编辑 UI,实现无需重启即可调整并行度。变更内容
后端
vendors/concurrency.py:新增_ConcurrencySlot类,使用asyncio.Event+ while 循环实现 FIFO 公平排队,提供set_limit()方法支持动态调整上限;ModelConcurrencyLimiter新增set_limit(model, new_limit)方法,同步更新 config 与 slot 状态vendors/zhipu.py:ZhipuVendor新增update_concurrency(model, limit)代理方法server/routes.py:新增PUT /api/concurrency端点,接受{tier, model, limit}请求体,校验 limit 范围 1-20,遍历 tiers 查找目标 vendor 执行更新前端
server/dashboard.py:Model Calling 模块中 limit 数字渲染为可点击的.mc-limit-editable元素,点击后展开 inline number input,支持 Enter 确认 / Escape 取消 / 失焦确认,成功/失败分别有绿色/红色闪烁动画反馈测试
tests/test_zhipu_concurrency.py:适配新 API(_get_or_create_slot、slot.available),全量 1520 测试通过无回归设计决策
acquire()返回self,保持现有调用模式(slot.release())不变,ZhipuVendor调用方无需改动Test plan
🤖 Generated with Claude Code, CodeX, Gemini