diff --git a/.gitignore b/.gitignore index 475b250..593147e 100644 --- a/.gitignore +++ b/.gitignore @@ -27,4 +27,4 @@ config.yaml .playwright-mcp/ # Log files (dual-write logging) -coding-proxy.log* +.logs/ diff --git a/AGENTS.md b/AGENTS.md index 30d9d7a..ea86087 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,15 +2,11 @@ ## Collaboration Protocol (协作协议) -本文件旨在规范 AI Agent(Claude Code、Antigravity 等)在本项目中的代码与文档协作行为。 +本文件旨在规范 AI Agent(Claude Code、Antigravity 等)在本项目中的代码与文档协作行为。项目定位详见 [README.md](./README.md)。 - **Core Language**: Output MUST be in **Chinese (Simplified)** unless serving code/technical constraints. - **Tone**: Professional, precise, and evidence-based. -## Project Positioning (项目定位) - -参考 README.md - ## Engineering Code of Conduct (工程行为准则) **Core Philosophy**: **Entropy Reduction (熵减)**. 通过上下文锚定、复用驱动与标准化流水线,对抗软件系统的无序熵增。 @@ -19,67 +15,44 @@ - **Context-Driven (上下文驱动)**: 上下文是第一性要素 (Context Quality First)。任何变更需建立在深度理解之上(CDD),拒绝基于关键字匹配的机械式修改。 - **Minimal Intervention (最小干预)**: 遵循奥卡姆剃刀与 YAGNI 原则,仅实施必要的变更,推崇演进式设计 (Evolutionary Design) 而非过度设计。 -- **Evidence-Based (循证工程)**: 杜绝主观臆断,核心决策需以权威文献(IEEE 格式)为佐证,构建 Feedback Loops 以验证假设。 -- **Systemic Integrity (系统完整性)**: 具备全局视角与二阶思维 (Second-Order Thinking),评估变更对上下游依赖及整个生态(Engine, Adapter, Agent, UI)的“涟漪效应”,优先保障整体稳定性与逻辑自洽。 +- **Evidence-Based (循证工程)**: 杜绝主观臆断,核心决策需以**最新**且**权威**的文献(IEEE 格式)为佐证,构建“设计-实现-验证”的完整反馈闭环,确保每一项工程行动都能产生可观测的反馈信号(测试、日志、监控),以验证假设并指导迭代。 +- **Systemic Integrity (系统完整性)**: 具备全局视角与二阶思维 (Second-Order Thinking),评估变更对上下游依赖及整个生态(Engine, Adapter, Agent, UI)的“涟漪效应”,不只关注变更的直接结果,更要预测“结果的结果”(如引入缓存导致的陈旧数据、重试机制引发的雪崩),优先保障整体稳定性与逻辑自洽。 +- **Knowledge Crystallization (知识结晶)**: 将系统视为有机体,通过将工程错误与 AI 失败案例转化为经验约束 (Negative Prompts) 和持久化知识,驱动系统的自我进化与持续熵减。 +- **Proactive Navigation (主动导航)**: 智能体不应止步于被动响应,需即时转化为“领航者”。在交付任务结果的同时,**必须**基于上下文预判并提出**下一步最佳行动建议 (Next Best Action)**,不仅交付“答案”,更要交付“路径”,消除用户决策的认知摩擦。 ### 法 (Strategy - 架构原则) -- **Plan Node Default (默认规划模式)**: 面对任何非琐碎任务(预估步骤 > 3 或涉及架构级决策),**必须**率先进入 Plan 模式。规划产物需明确界定:功能边界、边缘 Case 应对策略、与现有逻辑的交互锚点以及预计改动的爆炸半径。 +- **Plan-First Default (规划先行)**: 面对任何非琐碎任务(预估步骤 > 3 或涉及架构级决策),**必须**率先进入 Plan 模式。规划产物需明确界定:功能边界、边缘 Case 应对策略、与现有逻辑的交互锚点以及预计改动的爆炸半径。 - **Subagent Strategy (子代理并发策略)**: 面对高复杂度命题,严禁主 Agent 单点统揽。应贯彻“算力换空间”思路,果断编排 Subagent 进行任务拆解与并行攻坚,主 Agent 的职责需严格收敛于上下文协同与最终成果的组装整合。 - **Verification Before Done (交付前验证定式)**: 严禁在缺乏确凿运行证据的情况下标记任务为“已完成”。交付阶段**强制要求**提供客观自证材料:Diff 变更分析、测试用例覆盖、实施日志截图及核心链路边缘 Case 验证结果,并时刻以“方案是否能通过 Staff Engineer 严格审查”的视角自检。 -- **Reuse-Driven (复用驱动)**: Composition over Construction。系统变更**必须**主动参考业界经典设计模式与最佳实践。在进入实质性编码前,需率先对相关领域的成熟范式进行深度调研,并结合当前项目上下文输出充分的关联分析与方案梳理。坚决贯彻“拿来主义”,优先通过组合与集成来构建系统,防范闭门造车与重复造轮子。 +- **Reuse-Driven (复用驱动)**: Compose over Reinvent。系统变更**必须**主动参考业界经典设计模式与最佳实践。在进入实质性编码前,需率先对相关领域的成熟范式进行深度调研,并结合当前项目上下文输出充分的关联分析与方案梳理。坚决贯彻“拿来主义”,优先通过组合与集成来构建系统,防范闭门造车与重复造轮子。 - **Boundary Management (边界管理)**: 严控模块/Agent 间的职责边界与契约,确保高内聚低耦合,防范隐式依赖穿透。 - **Orthogonal Decomposition (正交分解)**: 坚持“正交地提取概念主体”。识别系统中独立变化的维度并进行解耦(如机制与策略分离),确保单一概念主体的变更具备局部性,避免逻辑纠缠。 -- **Feedback Loops (反馈闭环)**:构建“设计-实现-验证”的完整闭环,确保每一项工程行动都能产生可观测的反馈信号(测试、日志、监控),以验证假设并指导迭代。 -- **Evolutionary Design (演进式设计)**: 将系统视为有机体,通过将 AI 错误转化为经验约束 (Negative Prompts) 和持久化知识,实现系统的自我进化与熵减。 -- **Second-Order Thinking (二阶思维)**:不只关注变更的直接结果,更要预测“结果的结果”(如引入缓存导致的陈旧数据、重试机制引发的雪崩),未雨绸缪防范隐性风险。 - **Single Source of Truth (单一事实源)**:严格维护唯一的权威定义源。引用时**必须**使用轻量级指针 (Link/ID) 而非数据副本 (Copy-Paste),从根源消除断裂 (Split-Brain) 风险。 -- **Proactive Navigation (主动导航)**: 智能体不应止步于被动响应,需即时转化为“领航者”。在交付任务结果的同时,**必须**基于上下文预判并提出**下一步最佳行动建议 (Next Best Action)**。不仅交付“答案”,更要交付“路径”,消除用户决策的认知摩擦,确保持续的熵减动量。 ### 术 (Tactics - 执行规范) -- **Vibe Coding Pipeline**: 遵循 **Specification-Driven (规划驱动)** + **Context-Anchored (上下文锚定)** + **AI-Pair (AI 结对)** 模式,将开发固化为可审计的流水线,避免代码腐化为无法维护的“大泥球 (Big Ball of Mud)”。 -- **Visual Documentation (图文并茂)**: 对于复杂逻辑,优先使用 Mermaid 图表(Sequence/Flowchart/Class)辅助说明,构建“图文并茂”的直观文档。 -- **Direct Hyperlinking (直接跳转)**: 在文档中提及 Repo 内其他资源(文档/代码)时,**必须**构建可跳转的相对路径链接(如 `[Doc Name](./path.md)`),严禁使用“死文本”引用,以降低信息检索熵。 +- **Structured AI-Pair Pipeline (规范化 AI 结对流水线)**: 遵循 **Specification-Driven (规约驱动)** + **Context-Anchored (上下文锚定)** + **AI-Pair (AI 结对)** 模式,将开发固化为可审计的流水线,避免代码腐化为无法维护的“大泥球 (Big Ball of Mud)”。 - **Operational Excellence (卓越运营)**: - 1. **Git Hygiene**: 如非显性要求,严禁调用 git commit; + 1. **Git Discipline**: 默认严禁调用 git commit;当用户显式要求提交时,一律使用 Claude Code 的自定义 Slash Command: `/commit-no-push` 进行操作(若非 Claude Code 运行环境,则读取 /commit-no-push 命令中的规则执行)。严禁执行 Rebase; 2. **Temp Management**: 临时产物(执行计划等)一律收敛至 `.temp/` 并及时清理; 3. **Link Validity**: 确保所有引用的 URL 可访问且具备明确的上下文价值; - 4. **Git Commit**: 在需要提交变更到 Git 时,一律使用 Shell 调用 Claude Code 的自定义 Slash Command: `/commit` 进行 git commit 操作(若环境中未安装 Claude Code,则直接读取 `~/.claude/commands/commit.md`,按照其中的规则进行 git commit 操作)。不要执行 Rebase。 - 5. **Pre-commit Hooks**: 克隆仓库后执行 `uv run pre-commit install` 激活本地 Git hooks,使 Ruff lint(含 auto-fix)、Ruff format 及通用代码卫生检查在每次 commit 前自动运行。若 hooks 自动修复了问题,提交会被中断,执行 `git add -p` 审阅修复内容后重新提交即可。 - 6. **Issue**: 在 docs/issue.md 中维护你处理过的 Issue 摘要(问题描述、表因根因、处理方式、后续防范、同类问题影响与处理注意实现等),便于同类问题的跨上下文处理;注意识别相同 Issue,不要同 Issue 多处维护。 + 4. **Testing**: 统一在 tests/ 下维护测试用例,区分单元测试(unit)和集成测试(integration),所有测试的本地运行总时间控制在 3 min 以内; + 5. **Pre-commit Hooks**: 首次克隆仓库使用 `uv run pre-commit install` 激活本地 Git hooks,使 Ruff lint(含 auto-fix)、Ruff format 及通用代码卫生检查在每次 commit 前自动运行。若 hooks 自动修复了问题,提交会被中断,执行 `git add -p` 审阅修复内容后重新提交即可; + 6. **Issue**: 在 [issue.md](docs/agents/issue.md) 中维护你处理过的 Issue 摘要(问题描述、表因根因、处理方式、后续防范、同类问题影响与处理注意事项等),便于同类问题的跨上下文处理;注意识别相同 Issue,不要同 Issue 多处维护; - **Package Management Standardization (包管理规范)**: 1. **Python**: 严禁使用 pip/poetry,**必须**统一使用 `uv` 进行包管理与脚本执行(如 `uv run`); - 2. **JavaScript/TypeScript**: 严禁使用 npm/yarn,**必须**统一使用 `pnpm` 进行包管理与脚本执行。 + 2. **JavaScript/TypeScript**: 严禁使用 npm/yarn,**必须**统一使用 `pnpm` 进行包管理与脚本执行; - **Database Management**: 谨慎操作,数据迁移、测试等操作严禁将现有数据删除,谨慎操作数据迁移的回滚,防止数据被清理。 - **In-depth and close to the facts**:系统且全面地进行问题的分析,深入贴近事实,如有疑问,需先发问,不要乱做决定。 - -## Documentation Standards (文档规范) - -### Mermaid Visualization Norms (Mermaid 可视化规范) - -- **色彩语义与兼容性**:为图表节点配置具备语义辨识度的色彩,并确保在深色模式(Dark Mode)下具有极高的对比度与清晰度。 -- **逻辑模块化解构**:针对业务跨度较大的架构流程,强制采用 `subgraph` 容器进行层级解构与边界划分,以增强图表的自解说(Self-explaining)能力。 - -### Reference Specifications (IEEE) - -为保障工程决策的可追溯性与学术严谨性,核心引用需遵循 **IEEE 标准引用格式**。 - -> **模版准则**:[编号] 作者缩写. 姓, "文章标题," _刊名/会议名缩写 (斜体)_, 卷号, 期数, 页码, 年份. - -```latex -[1] A. Author, B. Author, and C. Author, "Title of paper," *Abbrev. Title of Journal*, vol. X, no. Y, pp. XX–XX, Year. -``` - -**引用实践** - -- **文内锚定**:采用标准上标链接形式:`描述内容[[1]](#ref1)`。 -- **文献索引**:底层采用 HTML 锚点 `id` 实现跳转稳定性。 - -```latex -[1] A. Vaswani et al., "Attention is all you need," Adv. Neural Inf. Process. Syst., vol. 30, pp. 5998–6008, 2017. -``` - -## Knowledge Map (知识索引) - -(WIP) +- **Browser Validation Protocol (浏览器验证准则)**:Agent 不得自行完成、绕过或模拟任何 OAuth / SSO 认证流程,所有登录态均来源于用户已认证的 Chrome 主 profile(真实用户登录态)。完整协议(连通性自检、凭证管理、E2E 集成、实机回归等)详见 [浏览器验证协议](./docs/agents/browser-validation.md); + 1. **安全红线**:禁止在 Sandbox 浏览器中跳转 Google 同意屏;禁止以模拟用户或第三方账号替代真实登录态;禁止要求用户在 chat 中粘贴密码、Cookie 或验证码; +- **Knowledge Map (知识索引)**:项目所有文档索引统一维护在 [知识索引](./docs/agents/knowledge-map.md),并在文档目录变更时即时同步跟新; +- **Documentation Standards (文档规范)**: + 1. **Visual Documentation (图文并茂)**: 对于复杂逻辑,优先 **Mermaid Visualization Norms (Mermaid 可视化规范)**,构建”图文并茂”的直观文档; + - **色彩语义与兼容性**:为图表节点配置具备语义辨识度的色彩,并确保在深色模式(Dark Mode)下具有极高的对比度与清晰度; + - **逻辑模块化解构**:针对业务跨度较大的架构流程,强制采用 `subgraph` 容器进行层级解构与边界划分,以增强图表的自解说(Self-explaining)能力; + 2. **语言叙事**:用语精准,叙事完备,行文专业,聚焦核心,篇幅精炼,形象具体,体现真实作用与用户吸引性,字数恰当; + 3. **Direct Hyperlinking (直接跳转)**: 在文档中提及 Repo 内其他资源(文档/代码)时,**必须**构建可跳转的相对路径链接(如 `[Doc Name](./path.md)`),严禁使用”死文本”引用,以降低信息检索熵; + 4. **实操截图**:文档需要引入必要的浏览器实操截图时,需自行通过默认浏览器打开相关页面,通过实操现场截图并保留到文档路径进行文档引用; +- **Reference Specifications (IEEE)**:为保障工程决策的可追溯性与学术严谨性,核心引用需遵循 [reference-specifications.md](docs/agents/reference-specifications.md)IEEE 标准引用格式; diff --git a/CHANGELOG.md b/CHANGELOG.md index 0fb0f1d..f745ec1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,30 @@ ## [Unreleased] +## [v0.5.0](https://github.com/ThreeFish-AI/coding-proxy/releases/tag/v0.5.0) - 2026-05-27 + +> [!IMPORTANT] +> +> **🚀 Model Calling 实时状态!** +> +> 模型并发与排队深度一目了然,运行时动态调整每个模型并行度,预防 vendor 侧的 429 幺蛾子。 + +![model-calling](assets/model-calling-v0.5.0.png) + +### ✨ 核心亮点 + +- feat(concurrency): 新增 Model Calling 实时状态模块,可视化每模型并发与排队深度,支持运行时动态修改每模型并行度 (#250) (#251) +- feat(zhipu): 新增每模型并发限制,默认 3 个并行请求 FIFO 排队 (#248) +- feat(zhipu): 为 429 Rate Limit 添加指数退避重试挽回机制 (#242) + +### 🔧 更多特性 + +- fix(antigravity): 修复 v1internal 模式检测逻辑并新增 E2E 测试; (#234) +- fix(routes): 修复 count_tokens 路由对 target_vendor.name 的错误属性访问; (#235) +- fix(vendor-channels): 修复 zhipu→anthropic 通道 tool_use/tool_result 配对漏洞; (#236) +- fix(native-api): 修复 Gemini :verb 路径中 %3A URL 编码导致上游 400 的兼容问题; (#237) +- fix(zhipu): 诊断首选 tier 语义拒绝降级问题,增强可观测性并提取跨供应商清洗共享函数 (#243) + ## [v0.4.0](https://github.com/ThreeFish-AI/coding-proxy/releases/tag/v0.4.0) — 2026-05-01 > [!IMPORTANT] diff --git a/README.md b/README.md index 1383338..6cb7211 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ When you're deeply immersed in your coding "zone" with **Claude Code** (or any A ## 🌟 Core Features
- +
- **⛓️ N-tier Chained Failover**: Autonomous descending sequence, supporting Claude's official plans, as well as Coding Plans from GitHub Copilot, Google Antigravity, Z AI, MiniMax, Alibaba Qwen, Xiaomi, Kimi, Doubao, etc. diff --git a/assets/dashboard-v0.2.4.png b/assets/dashboard-v0.2.4.png deleted file mode 100644 index aef75f7..0000000 Binary files a/assets/dashboard-v0.2.4.png and /dev/null differ diff --git a/assets/dashboard-v0.4.0.png b/assets/dashboard-v0.4.0.png new file mode 100644 index 0000000..14e985b Binary files /dev/null and b/assets/dashboard-v0.4.0.png differ diff --git a/assets/model-calling-v0.5.0.png b/assets/model-calling-v0.5.0.png new file mode 100644 index 0000000..1b1e31b Binary files /dev/null and b/assets/model-calling-v0.5.0.png differ diff --git a/docs/agents/browser-validation.md b/docs/agents/browser-validation.md new file mode 100644 index 0000000..ee4b705 --- /dev/null +++ b/docs/agents/browser-validation.md @@ -0,0 +1,172 @@ +# Browser Validation Protocol(浏览器验证协议) + +> 由 [AGENTS.md §Browser Validation Protocol](../../AGENTS.md) 锚定的浏览器自动化与认证态使用协议。本协议是工程行为准则的子集,**任何 AI Agent 在执行浏览器自动化任务前必须完整遵循**。 +> +> **协议版本**:v1.0 | **生效范围**:所有面向本仓库的 AI Agent 协作场景 +> +> **关联工具**:`chrome-devtools` MCP、`claude-in-chrome` MCP、`playwright` MCP + +[TOC] + +--- + +## 1. 协议目的 + +为 AI Agent 在浏览器自动化场景下提供**统一、可审计、不可绕过**的认证态使用规范,解决以下问题: + +- AI Agent 不应也不可代用户决策"我是谁"——所有登录态归属问题必须由用户本人主导 +- 浏览器自动化能力一旦失控,可能在用户毫不知情时产生不可撤销的副作用(消息发送、订单提交、权限变更等) +- OAuth / SSO 同意屏在自动化上下文中存在被绕过的潜在风险,违反平台 ToS 与基本伦理 + +本协议通过"原则—红线—操作流程—验证"四层结构,将上述问题约束在工程可控范围内。 + +--- + +## 2. 核心原则 + +| 原则 | 具体含义 | +| -------------------------- | ----------------------------------------------------------------------------------------------------- | +| **登录态归属于用户** | Agent 不得自行完成、绕过或模拟任何 OAuth / SSO 认证流程;所有登录态来源于用户已认证的 Chrome 主 profile | +| **真实主 profile 优先** | 浏览器自动化默认接入用户日常使用的 Chrome 主 profile,复用其 Cookie / Session / SSO 状态 | +| **可审计、可回放** | 浏览器路径关键操作(点击、表单填写、跳转)应留下可被 GIF 回放或日志追溯的痕迹 | +| **最小副作用** | 优先以只读方式(查看、提取、断言)完成任务;写操作(提交、发送)需在协议第 5 节框架下显式确认 | + +--- + +## 3. 安全红线 + +> 以下条款**不可协商**,违反任一条款即视为协议违反。 + +1. **禁止跳转 Google 同意屏**:在 Sandbox / 自动化浏览器环境内**严禁**触发 Google OAuth 同意屏跳转。同意屏只能在用户主 profile 的真实浏览会话中由用户本人完成。 +2. **禁止模拟身份**:禁止以模拟用户身份、虚构 Cookie、第三方账号或测试账号替代真实登录态完成任务。 +3. **禁止凭证泄露**:禁止要求用户在 chat 中粘贴密码、Cookie、Session Token、二维码扫描结果或任何形式的验证码(含 6 位数字、短信、TOTP)。 +4. **禁止跨账号操作**:在多用户环境下,Agent 不得在未经显式确认的情况下切换 profile 或账号身份。 +5. **禁止规避 ToS**:不得通过 Headless 模式、UA 伪装、Captcha 自动求解等方式规避目标站点的服务条款。 +6. **禁止下载执行**:浏览器路径触发的任何文件下载需在主对话中显式确认;下载文件不得自动执行或注入到项目目录。 + +--- + +## 4. 连通性自检(Connectivity Probe) + +执行浏览器自动化任务前,Agent **必须**完成以下自检序列: + +| 步骤 | 操作 | 通过判据 | +| --------------------- | ----------------------------------------------------------------- | --------------------------------------------------------- | +| 4.1 工具可用性 | 列出当前会话可用的 MCP 工具 | 至少存在 `chrome-devtools` / `claude-in-chrome` 之一 | +| 4.2 主 profile 加载 | 通过工具调用获取当前 Tab 列表或 Page 列表 | 返回非空,且 Tab 标题来自用户真实浏览历史而非空白会话 | +| 4.3 目标域名可达 | 通过 `navigate_page` 或 `browser_navigate` 访问目标域名首页 | HTTP 200 / 已登录态正常加载 | +| 4.4 登录态识别 | 在目标域名首页定位"已登录"标识(头像、用户名、退出按钮) | 能在 Snapshot / AOM 中找到一致标识 | +| 4.5 异常路径分类 | 若 4.4 失败,按"未登录 vs 会话过期 vs 拒绝服务"分类,**不**自动重登 | 输出明确分类,转入第 5 节的用户接力流程 | + +> **失败处置**:自检任一步骤失败,Agent **必须**停止任务、向用户输出诊断结论,**不得**尝试 OAuth / 凭证补救。 + +--- + +## 5. 凭证管理(Credential Lifecycle) + +### 5.1 发现路径 + +凭证通过以下路径**被动**发现,Agent **不**主动读取、导出或日志化: + +- 浏览器 Cookie / LocalStorage(仅由浏览器引擎内部使用) +- 浏览器扩展(如 Claude in Chrome)持有的 Session +- 用户在 chat 中以"我刚登录了 X"形式给出的事实陈述(非凭证本身) + +### 5.2 过期检测信号 + +| 信号 | 处置 | +| ---------------------------------------- | ----------------------------------------------- | +| HTTP 401 / 403 | 转 5.3 接力流程 | +| 重定向到登录页(含 `/login`、`/signin`) | 转 5.3 接力流程 | +| 同意屏触发(OAuth scope 变更) | **立即停止**,由用户在主 profile 完成同意 | +| Captcha 出现 | **立即停止**,输出"需用户介入" | + +### 5.3 用户接力流程(Re-authentication Handoff) + +``` +1. Agent 检测到登录态失效 +2. Agent 向用户输出:(a)失效域名 (b)建议在用户主 profile 完成登录的指引 +3. Agent 暂停浏览器任务,**不**触发任何登录流程 +4. 用户在真实浏览器完成登录后,回到 chat 通知 Agent +5. Agent 重新执行第 4 节连通性自检 +6. 自检通过后恢复任务 +``` + +### 5.4 凭证刷新约束 + +- Agent **不**调用任何 refresh_token / device_code 接口 +- Agent **不**触发邮箱链接、短信验证码、TOTP 输入 +- 凭证刷新由用户在原始登录路径自主完成 + +--- + +## 6. E2E 集成(End-to-End Integration) + +### 6.1 与项目 OAuth 模块的边界 + +本项目内置 GitHub Device Flow 与 Google OAuth 模块(`src/coding/proxy/auth/`)。浏览器协议与之的边界如下: + +- **项目 OAuth 模块**:服务端运行时凭证管理,由 `coding-proxy auth login/reauth` CLI 触发,目标是给 **proxy 自身**获取上游 API 凭证 +- **本协议**:客户端浏览器自动化场景,目标是让 **Agent 协助用户**完成日常任务(如查文档、填表单) + +二者**互不调用**:Agent 不调用 `coding-proxy auth` 替用户完成项目 OAuth;项目 OAuth 流程也不依赖本协议第 4 节自检。 + +### 6.2 与 CLI 命令的协同 + +| 场景 | 由谁触发 | +| ------------------------------- | ------------------------- | +| 给 proxy 注入 GitHub PAT | 用户运行 `auth login` | +| 给 proxy 注入 Google OAuth | 用户运行 `auth login` | +| 凭证过期重认证 | 用户运行 `auth reauth` | +| 浏览器查看 GitHub Token 状态 | Agent 通过本协议浏览器访问 | + +### 6.3 测试用例的浏览器隔离 + +- 单元测试(`tests/unit/`)**不**触发任何浏览器路径 +- 集成测试(`tests/integration/`)**不**触发任何浏览器路径 +- 浏览器路径仅在交互式 Agent 会话中触发,不进入 CI 自动化测试链路 + +--- + +## 7. 实机回归(Real-Device Regression) + +### 7.1 提交前的浏览器路径自检清单 + +涉及浏览器路径的改动在提交前需手工核验: + +- [ ] 第 4 节连通性自检在用户主 profile 通过 +- [ ] 第 3 节安全红线未被触碰(特别是同意屏、密码粘贴) +- [ ] 浏览器路径的关键操作有 GIF / Snapshot 留痕 +- [ ] 失败路径输出明确的用户接力指引 + +### 7.2 与 CI 的边界 + +CI 流水线(详见 [ops/ci-cd.md](../ops/ci-cd.md))**不**触发浏览器自动化路径。所有浏览器侧验证均在本地实机完成。 + +### 7.3 回归失败上报 + +若实机回归失败: + +1. 在 [docs/issue.md](../issue.md) 记录现象、根因、防范 +2. 若涉及协议本身缺陷,提交 PR 修订本文件并同步 [AGENTS.md](../../AGENTS.md) 锚点 +3. 不通过的 Agent 行为应在 [knowledge-map.md](./knowledge-map.md) 标注为已知问题 + +--- + +## 8. 引用规范 + +- 本协议章节可被 [AGENTS.md](../../AGENTS.md) / [CLAUDE.md](../../CLAUDE.md) 通过标题锚点形式引用 +- 修订本协议**必须**在 [docs/issue.md](../issue.md) 留存背景与决策记录 +- 协议条款发生变更时,需同步检查 [AGENTS.md §Browser Validation Protocol](../../AGENTS.md) 的兜底原则与本协议是否一致 + +--- + +## 附录 A:术语对照 + +| 术语 | 说明 | +| ------------------- | ----------------------------------------------------------------- | +| 主 profile | 用户日常使用的 Chrome / Edge 浏览器档案,含真实登录态 | +| Sandbox 浏览器 | 自动化工具启动的临时/隔离浏览器,无真实用户态 | +| 同意屏(Consent) | OAuth 流程中用户授予权限范围的页面 | +| 接力流程 | Agent 停止 → 用户介入完成 → Agent 恢复 的三段式协作 | +| 实机回归 | 在用户真实终端(非 CI)完成的端到端验证 | diff --git a/docs/agents/issue.md b/docs/agents/issue.md new file mode 100644 index 0000000..c202b8a --- /dev/null +++ b/docs/agents/issue.md @@ -0,0 +1,280 @@ +# Issue 处理档案 + +> 维护已处理过的 Issue 摘要(问题描述、表因根因、处理方式、后续防范、同类问题影响与处理注意事项),便于同类问题的跨上下文处理。识别相同 Issue 时应在原条目追加复盘,避免同 Issue 多处维护。 + +--- + +## streaming usage parse failed: 'NoneType' object has no attribute 'get' + +**问题描述** + +OpenAI 兼容 SSE 流式响应过程中,单次请求日志反复刷出数十条 WARNING: + +``` +WARNING streaming usage parse failed: 'NoneType' object has no attribute 'get' +``` + +警告本身被上层 `try/except` 吞掉不影响主链路,但日志噪声严重,且每帧都丢失了 usage 累加。 + +**表因** + +`StreamingUsageAccumulator.feed` 调用 `parse_usage_from_chunk` 解析 SSE chunk 时抛出 `AttributeError`。 + +**根因** + +`src/coding/proxy/routing/usage_parser.py::parse_usage_from_chunk` 中 Anthropic message_start 与 Anthropic message_delta / OpenAI 两条分支都使用了脆弱的判空模式: + +```python +if "usage" in data: # 仅判断 key 存在 + u = data["usage"] # 但值可能是 null + u.get("output_tokens", 0) # AttributeError +``` + +部分上游(含某些 OpenAI 兼容供应商)在中间 chunk 显式发送 `"usage": null` 占位帧,`in` 检查通过但取出的是 `None`。 + +**处理方式** + +将两处 guard 统一改为 `u = container.get("usage"); if isinstance(u, dict):`,既排除缺省也排除 null,并顺手移除内部冗余的 `if isinstance(u, dict):` 包装层(已被外层 guard 覆盖)。同时新增三个回归用例覆盖 `data.usage = null` / `message.usage = null` / null 帧后跟有效帧三种场景。 + +**后续防范** + +- 解析外部 SSE / JSON 结构时, 不要单独使用 `if key in data` 作为安全 guard, 应统一采用 `value = data.get(key); if isinstance(value, dict):` 的双重保护, 同时排除缺省与显式 null。 +- 对 try/except 包裹的 WARNING 路径要保持警觉: 异常被吞不代表无害,重复刷屏的同类警告往往暗示防御性 guard 过窄,需要回溯至根因修复,而非依赖 except 兜底。 + +**同类问题影响与处理注意事项** + +- 本仓库内 `parse_usage_from_chunk` 的 Gemini `usageMetadata` 分支 (line ~219) 已经使用 `isinstance(um, dict)` 防御, 不受影响, 可作为参考实现。 +- 检查其他解析器 (如 routing / vendor adapter 层) 是否还有 `if "key" in data: v = data["key"]; v.get(...)` 这种模式, 必要时同步加固。 + +--- + +## anthropic 400: `tool_use` ids were found without `tool_result` blocks immediately after + +**问题描述** + +zhipu → anthropic 通道流式请求偶发 400, 错误形如: + +``` +WARNING anthropic stream error: status=400 body=... + messages.3: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_normalized_2. +INFO Failover: anthropic → zhipu (reason: HTTP 400) +INFO Tier zhipu stream succeeded (took over from failed tier: anthropic) +``` + +同一请求伴随 `Applied transition channel zhipu → anthropic: rewritten_N_srvtoolu_ids, misplaced_tool_result_relocated, stripped_M_thinking_blocks` 的 adaptations 但**没有 `orphaned_tool_use_repaired`**, 即转换层主观上认为已配对、但 Anthropic 仍判定结构不合规。Failover 至 zhipu 后请求成功, 证明上游消息体本身没有损坏, 问题出在 zhipu→anthropic 通道转换过程引入了不一致。 + +**表因** + +`src/coding/proxy/convert/vendor_channels.py::_rewrite_srvtoolu_ids` 在单遍循环中同时承担 Case A (assistant 端 `server_tool_use` → `tool_use` 与 `srvtoolu_*` ID 重写) 与 Case B (任意位置 `tool_result.tool_use_id` 同步重写)。Case B 依赖 `id_map` 已被 Case A 填入。 + +**根因** + +Zhipu GLM-5 流式响应偶发将 inline `tool_result` 块输出在**对应的 `server_tool_use` 块之前** (同 assistant content 内乱序), 或将 `tool_result` 放在更早的 user 消息中而对应 `tool_use` 在更晚的 assistant 消息。两种乱序下, 单遍扫描遍历到 `tool_result` 时 `id_map` 还是空 → `tool_result.tool_use_id` 不被改写, 停留在 `srvtoolu_X`; 随后 Case A 把对应 `tool_use.id` 改写为 `toolu_normalized_N`。 + +后续 `enforce_anthropic_tool_pairing` Step A 提取这条 misplaced tool_result 时使用**旧 ID** 作为 `extracted_tool_results` 字典 key, Step F 用新 ID 去查 → 不命中 → 走 `existing_result_ids` 分支, 因为相邻 user 的 tool_result 已经被改写到新 ID, 该 uid 命中 `existing_result_ids` 被 continue 跳过, 于是 enforce 错误地认为完成配对、不产生 `orphaned_tool_use_repaired` 标签, 而被默默丢弃的 misplaced tool_result 本应填补到的 user 槽位实际上**仍然缺位**。最终 body 中某条 assistant 的 tool_use 在下一条 user 中找不到对应 tool_result → Anthropic 400。 + +**处理方式** + +1. `_rewrite_srvtoolu_ids` 改为**两遍扫描**: Pass 1 仅遍历 assistant 消息收集 `id_map` (按 assistant 出现顺序分配, 保持序号兼容性); Pass 2 全量遍历改写任意 `tool_result.tool_use_id`。以"先建表、后改写"的次序消除时序耦合。 +2. 在 `enforce_anthropic_tool_pairing` 主循环末尾追加独立 helper `_enforce_pairing_sanity_pass`, 仅做检测+合成 `is_error=True` 占位 (不剥离、不重定位), 命中追加 `pairing_sanity_repaired` adaptation 并打 WARNING (含 message index 与 uid)。这层作为纵深防御, 在主循环未来重构时仍能稳定守住 Anthropic 配对约束。 +3. 新增回归测试覆盖三类场景: 同 assistant content 内乱序、跨消息边界 tool_result 早于 tool_use、端到端复现日志故障形态。新增 `TestEnforcePairingSanityPass` 独立测试套件确保兜底分支具备正向回归保护。 + +**后续防范** + +- 任何在多 content block 之间存在**前向引用** (后出现的块定义的标识符被前面的块引用) 的就地改写逻辑, 都必须采用两遍扫描或全局表先建后用, 不可依赖遍历位置上 "上一次循环已经写入" 的隐含次序。 +- 纵深防御层 (sanity helper) 必须**独立可单测**, 而不是把 sanity 内嵌在主路径内部 — 否则主路径的快速通道会让 sanity 分支永远走不到正向测试, 缺乏回归保护。 +- adaptations 标签 (`pairing_sanity_repaired`) 与主循环标签 (`orphaned_tool_use_repaired`) 分离, 便于运维聚合时按层归因。 + +**同类问题影响与处理注意事项** + +- 历史教训: commit `9061cd0` 曾经实现"两遍扫描 + sanity helper"修复了正是这类问题, 但 commit `2bac9a7` revert 至 v0.3.0 时**连带回滚**了它 — revert 的真实目标是去除 `f497077` / `fdd4a92` / `43488a1` 引入的"zhipu 自清理通道"和"tool_result.id 注入"副作用, 两遍扫描属无辜方。**后续若再次需要 revert `vendor_channels.py`**, 必须先 `grep _enforce_pairing_sanity_pass` 与 `Pass 1` / `Pass 2` 注释, 确认这两段是核心修复而非可以一起回滚的实验性代码。 +- 类似 "vendor 私有 ID 跨消息体改写" 场景 (如 doubao、minimax 未来若引入类似机制), 实现时同样应当遵循"先全局收集 id_map、后统一改写"的两阶段模式。 +- 单元测试覆盖"块顺序敏感"类 bug 时, 建议在用例命名中显式标注顺序条件 (如 `test_two_pass_handles_inline_tool_result_before_server_tool_use`), 让未来 reviewer 一眼看出测试的边界价值。 + +--- + +## count_tokens 路由 `AttributeError: 'ZhipuVendor' object has no attribute 'name'` + +**问题描述** + +后台日志反复出现 `POST /v1/messages/count_tokens?beta=true 500 Internal Server Error`,并伴随: + +``` +File ".../coding/proxy/server/routes.py", line 153, in count_tokens + channel_fn = get_transition_channel(source, target_vendor.name) +AttributeError: 'ZhipuVendor' object has no attribute 'name' +``` + +同一时间窗口内大量请求 200 OK、少量请求 500,呈"间歇性"故障特征。 + +**表因** + +`src/coding/proxy/server/routes.py` 的 `count_tokens` 在 153 / 160 两处访问 `target_vendor.name`,触发 `AttributeError` 被 ASGI 中间件捕获返回 500。 + +**根因** + +`BaseVendor` 仅暴露**抽象方法** `get_name() -> str`(`src/coding/proxy/vendors/base.py:75-77`),所有派生类(`AnthropicVendor`、`ZhipuVendor`、`CopilotVendor`、`MinimaxVendor`、`DoubaoVendor`、`KimiVendor` 等)均通过 `_vendor_name` 类属性配合 `get_name()` 返回名称 —— **并无 `name` 实例属性**。该错误访问在 lint/类型检查阶段无告警(因 `BaseVendor` 未在类型系统中约束 `name` 字段),仅在运行时触发。 + +间歇性原因:第 152 行 `if source:` 是守卫;`source` 由 `infer_source_vendor_from_body(body)`(`src/coding/proxy/convert/vendor_channels.py:357-394`)从请求体启发式推断,仅当出现 zhipu 私有产物(`srvtoolu_*` 形式的 `tool_use.id` 或 `server_tool_use` / `server_tool_use_delta` 类型 content block)时返回 `"zhipu"`,否则 `None`。纯净的首轮 count_tokens 请求 `source is None` 自然绕过 153 行,因此 200/500 共存。 + +**处理方式** + +1. `routes.py:153,160` 将 `target_vendor.name` 改为 `target_vendor.get_name()`,并将结果提取到局部变量 `target_name` 复用,避免重复方法调用与日志/调用点不一致风险。 +2. `tests/test_app_routes.py` 新增 `test_count_tokens_triggers_zhipu_to_target_channel`:通过注入 `server_tool_use` + `srvtoolu_*` 让 `infer_source_vendor_from_body` 返回 `"zhipu"`,断言返回 200 且 debug 日志含 `"count_tokens channel zhipu → anthropic"`,证明通道被实际触发。此前 6 个 count_tokens 测试的请求体都是纯净的、未触达该分支,是 bug 长期漏过的根因。 + +**后续防范** + +- 跨模块引用 Vendor 实例字段时,**统一通过 `BaseVendor` 暴露的方法**(`get_name()`、`map_model()` 等),避免直接访问派生类未定义的"假属性"。 +- 长期演进可考虑在 `BaseVendor` 增加 `@property name` 指向 `get_name()`,将契约前移到类型系统由 mypy / pyright 拦截 —— 该重构属"演进式设计"范畴,不在本次最小干预范围内。 +- 测试覆盖原则:路由层涉及"内容感知"分支(如 `infer_source_vendor_from_body`)时,至少补一个让分支命中的最小用例,避免守卫掩盖代码缺陷。 + +**同类问题影响与处理注意事项** + +- 已 `grep -rn "vendor\.name\b" src/` 全仓扫描,确认 `target_vendor.name | vendor.name` 误用仅 routes.py 的这两处,已随本次修复一并消除。`/v1/messages` 主链路在 executor 中调用 `tier.name`(`Tier` 对象的合法 dataclass 属性),与 vendor 实例 `name` 无关,不受影响。 +- 若未来新增 Vendor 子类,仍只需实现 `get_name()` 抽象方法;外部调用方应遵循同一契约,本档案的修复模式可作为参考。 + +--- + +## Gemini embedding 透传至 Vertex AI 上游返回 `request body doesn't contain valid prompts` + +**问题描述** + +通过本代理调用 Gemini embedding 模型时,上游返回 400: + +``` +litellm.BadRequestError: GeminiException BadRequestError - +{"error":{"message":"request body doesn't contain valid prompts"}} +POST /api/gemini/v1beta/models/gemini-embedding-001%3AbatchEmbedContents 400 +``` + +litellm 报错日志中 URL 路径是 `:batchEmbedContents`,调用端疑似格式不兼容。 + +**表因** + +litellm 按 Google AI Studio 格式构造请求: +- 路径:`POST {api_base}/v1beta/models/{model}:batchEmbedContents` +- Body:`{"requests": [{"model": "models/...", "content": {"parts": [{"text": "..."}]}}]}` + +但实际上游(如 `llms.as-in.io` 这类 Vertex AI 风格网关)只接受 Vertex AI 格式: +- 路径:`POST {api_base}/v1beta1/publishers/google/models/{model}:embedContent` +- Body:`{"content": {"parts": [{"text": "..."}]}}` + +且无 `batchEmbedContents` 端点。 + +**根因** + +1. 代理 `NativeProxyHandler.dispatch()` 是字节级透传,对 embedding 端点未做协议适配,直接把 Google AI Studio 格式的 URL/Body 转给 Vertex AI 上游,路由不匹配。 +2. litellm `_check_custom_proxy()` 在自定义 `api_base` 场景下会丢失 `v1beta/` 版本前缀,发送 `{api_base}/models/{model}:verb`,使代理原有的 `OperationClassifier` 正则(要求 `v1beta/` 前缀)失配,进而走原始透传分支再次失败。 + +**处理方式** + +1. `src/coding/proxy/native_api/operation.py`:放宽 Gemini 路径正则中的 `v1(?:beta1?)?/` 段为可选,兼容 litellm 丢失版本前缀的异常路径。 +2. `src/coding/proxy/native_api/handler.py`:在 `dispatch()` 中新增 Gemini embedding Vertex AI 适配分支: + - 仅当 `provider == "gemini"`、`operation in {"embedding", "embedding.batch"}`、且 `base_url` 非官方 `generativelanguage.googleapis.com` 时启用; + - `embedContent` → 重写路径为 `v1beta1/publishers/google/models/{model}:embedContent`,剥离 body 中的 `model` 字段; + - `batchEmbedContents` → 拆分为多次并发 `embedContent` 调用(`asyncio.gather`),聚合响应为 `{"embeddings": [...]}` 返回; + - 用量抽取累加各子请求的 `usageMetadata`。 +3. `tests/test_native_api_handler.py`:新增 3 个回归测试覆盖单次 / 批量 / 官方上游透传不变三类场景。 + +**后续防范** + +- 协议适配层只对**非官方上游**生效,官方 `generativelanguage.googleapis.com` 仍走字节级透传,避免引入不必要的转换开销与协议偏差。 +- 上游路径分支的判定优先用 base_url 域名而非依赖网关行为特征,便于后续扩展(如 Vertex Express、其他 LLM gateway)时的精确匹配。 +- 真实链路验证:使用 litellm `embedding(api_base=..., api_key=...)` 单输入 / 多输入分别调用,确认返回 3072 维向量及正确批量计数。 + +**同类问题影响与处理注意事项** + +- litellm 在 Gemini 其他端点(`generateContent` / `countTokens`)同样存在 `_check_custom_proxy` 丢失 `v1beta/` 前缀的 bug;本次仅放宽了 `operation.py` 中的路径正则(让分类器能识别此类异常路径),未对这些端点做格式转换,因为非 embedding 端点的 Google AI Studio / Vertex AI 请求体差异较小,多数上游兼容。如未来出现类似失配再做针对性适配。 +- 若上游网关同时支持 OpenAI `/v1/embeddings` 与 Vertex AI 路径,建议优先在客户端配置 OpenAI 兼容路径,减少协议转换链路。 + +--- + +## Dashboard Sessions 页 `Tokens` 列漏算缓存 Token + +**问题描述** + +Dashboard 的 **Sessions** 标签页中,每条会话的 `Tokens` 列与展开详情卡的 `Tokens` 值,仅统计 `input + output`,遗漏了 `cache_creation`(写缓存)与 `cache_read`(读缓存)。在长链路 Anthropic Prompt Cache 场景下,读取命中常常是 input/output 的数倍,导致 Sessions 页总量被显著低估,与 Overview 标签页(卡片、Token 时序图)跨页口径分裂。 + +**表因** + +前端 `dashboard.py:1597 / 1614` 直接渲染 `s.total_tokens`,该值由 `/api/dashboard/sessions` 透传自 `token_logger.query_recent_sessions()` 的聚合结果。 + +**根因** + +`src/coding/proxy/logging/db.py` 中两条按 `session_key` 分组的聚合 SQL 使用了不完整的求和口径: + +```sql +SUM(input_tokens + output_tokens) AS total_tokens -- 第 607 行(query_recent_sessions) +SUM(input_tokens + output_tokens) AS total_tokens -- 第 634 行(query_session_profile) +``` + +而同文件内 `query_usage()`(第 465–466 行分别 `SUM(...)` 四列)与 `query_total_tokens_by_vendor()`(第 584 行 `SUM(input + output + cache_creation + cache_read)`)已采用完整四项口径,构成了同文件内的口径双标。 + +**处理方式** + +复用 `query_total_tokens_by_vendor` 的四项求和表达式,将两处 `total_tokens` 改写为: + +```sql +SUM(input_tokens + output_tokens + + cache_creation_tokens + cache_read_tokens) AS total_tokens +``` + +不改动 API 返回结构、不新增字段、不改前端 detail-card——前端 `fmtTokens(s.total_tokens)` 调用无须变更。同时在 `tests/test_session_aware.py` 的 `test_query_recent_sessions_basic` / `test_query_session_profile_found` 中追加 `cache_creation_tokens` / `cache_read_tokens` 入参与完整口径断言,覆盖回归。 + +**后续防范** + +- SQL 聚合层涉及"总 Tokens"概念时,必须保持**单一权威定义**(Single Source of Truth):要么所有视图共用同一求和表达式,要么抽取为常量片段集中引用,杜绝多处独立维护造成的语义漂移。 +- 未来若引入新的 token 维度(如 reasoning_tokens、tool_tokens 等),需要全文检索 `SUM(input_tokens + output_tokens` 这一历史模式并同步补齐,避免出现新的口径分裂点。 + +**同类问题影响与处理注意事项** + +- 历次 PR 中 cache token 字段的引入是渐进式的(schema 已有四列、`log()` 入参齐全、Overview 已全口径消费),但部分聚合视图的口径升级被遗漏;任何向 `usage_log` 增列后,**必须**审计所有 `SUM(input_tokens` / `SUM(output_tokens` 出现处的聚合表达式是否需要同步更新。 +- 跨标签页同一指标(如"总 Tokens")的口径一致性,建议在添加新视图时主动与 Overview 现有口径做交叉核对,必要时在 SQL 注释中标注口径来源,便于后续 review。 + +--- + +## Zhipu vendor 间歇性 `[1210][API 调用参数有误]` 拒绝(诊断阶段) + +**问题描述** + +Zhipu vendor 作为首选 tier 时,处理 `claude-haiku-* → glm-5-turbo` 的部分请求被上游直接拒绝: + +``` +WARNING Tier zhipu semantic rejection + (type=invalid_request_error, + msg=[1210][API 调用参数有误,请检查文档。][...]) + [model=claude-haiku-4-5-20251001, messages=1], trying next tier without recording failure +INFO Tier anthropic message succeeded (took over from failed tier: zhipu) +``` + +失败请求统一表现为 `duration<1s + tokens=[0 0 0 0]`,被 zhipu 在入口校验阶段直接拒绝、未消耗任何 token。两次观察窗口失败率分别为 4%(2026-05-23 22:24,glm-4.7 旧映射)与 27%(2026-05-25 17:26+,glm-5-turbo 当前映射),均触发降级至 anthropic / copilot。 + +**表因** + +`is_semantic_rejection` 检测到 zhipu 返回 `invalid_request_error + 1210` 含「API 调用参数有误」中文标记,判定为语义拒绝,跳过下一层 tier。1210 是智谱官方错误码,[官方文档](https://docs.bigmodel.cn/cn/api/api-code) 定义为「参数格式/类型不符规范」(区别于 1213「必需字段缺失」、1214「字段参数非法」)。 + +**根因(已定位,修复中)** + +PR #247 (Step 1 v2) 部署后,2026-05-26 16:30–16:31 的诊断日志显示 8 次连续拒绝**全部携带 `thinking={"type": "adaptive"}`**(Anthropic Claude 4.x 新增的参数类型),而同一时段其他会话的请求持续成功。之前 curl 测试仅验证了 `{"type": "enabled"}`,未覆盖 `adaptive` 类型。GLM 可能不支持此特定类型值,导致 [1210] 参数校验失败。 + +**处理方式(分阶段)** + +- **Step 1(PR #244,已合并)**:在 `executor.py::_build_semantic_rejection_diagnostic` 中输出 thinking / cache_control 相关字段 — 但证据反转,覆盖不足以定位真因。 +- **Step 1 v2(PR #247,已合并)**:扩展诊断函数覆盖 `system_kind|blocks(+cc)` / `tools` / `tool_choice` / 采样参数 / `stream` / `metadata_keys` / `content_types` / `body_bytes` 等维度。所有项「仅存在时输出」以控制日志噪声。配套 14 个单元测试(`TestBuildSemanticRejectionDiagnostic`)覆盖各字段组合。 +- **Step 2(进行中)**:基于 Step 1 v2 的日志证据,在 `ZhipuVendor._prepare_request` 中实现 **兼容转换**(而非移除): + - `thinking.type="adaptive"` → `{"type": "enabled", "budget_tokens": 16000}`(保留 thinking 能力) + - 新增 `_build_zhipu_request_snapshot` 诊断快照,同时覆盖成功/失败请求,建立可对比证据链 + - 扩展语义拒绝日志的错误体截断限制(200 → 500 字符),保留完整字段级诊断 + - `metadata` 暂不处理(待进一步诊断确认兼容性) + +**后续防范** + +- **「无证据,不下结论」**:当初版诊断字段无法覆盖根因时,禁止反复猜测,应优先扩展诊断维度抓取更多线索。本次先扩展再修复的迭代节奏可作为同类「黑盒 API 报错」问题的范式。 +- **诊断字段设计原则**:所有诊断项应「仅存在时输出」,避免常态化噪声;输出格式紧凑(`key=val`)便于日志检索;参数值用 `!r:.N` 截断防止巨型对象灌入日志。 +- **错误码差异化**:智谱 12xx 系列错误码语义并不等价(1210 ≠ 1213 ≠ 1214),未来面对类似 `[code][message]` 形式的供应商错误时,应优先查阅其官方错误码字典,避免基于错误消息字面意思的误判。 + +**同类问题影响与处理注意事项** + +- 其他薄透传 vendor(minimax / kimi / doubao / alibaba / xiaomi)共用 `NativeAnthropicVendor._prepare_request`,若它们也开始报「参数错误」类语义拒绝,可复用本次扩展的诊断函数定位差异。 +- 若证据指向 `tools` 字段(如工具 schema 不兼容)、`metadata` 字段(如自定义键被 zhipu 拒收)等具体路径,修复时应优先复用 `convert/vendor_channels.py` 中已有的 `normalize_for_zhipu` / `strip_thinking_blocks` 工具,避免在 vendor 内部重复实现剥离逻辑。 +- 部署 Step 1 v2 后,建议观察至少 48 小时收集足够样本(>20 次失败),通过失败/成功请求形态对比统计找出**唯一差异维度**,再进入 Step 2。 diff --git a/docs/agents/knowledge-map.md b/docs/agents/knowledge-map.md new file mode 100644 index 0000000..08bd983 --- /dev/null +++ b/docs/agents/knowledge-map.md @@ -0,0 +1,95 @@ +# Knowledge Map(知识索引) + +> 项目所有文档的统一入口与权威索引。由 [AGENTS.md §Knowledge Map](../../AGENTS.md) 锚定,文档目录变更时**必须**即时同步更新本文件。 +> +> **使用方式**:按"受众 × 目的"二维定位所需文档;不确定起点时,从「入口导航」开始。 + +[TOC] + +--- + +## 1. 入口导航 + +| 文档 | 角色 | 受众 | +| --------------------------------------------- | ----------------------------------------------- | --------------- | +| [README.md](../../README.md) | 项目首页(英文版门面) | 公开访客 | +| [docs/zh-CN/README.md](../zh-CN/README.md) | 项目首页中文镜像(与英文版功能对等) | 中文公开访客 | +| [docs/user-guide.md](../user-guide.md) | 用户操作上位导航 + 配置概览速查 | 终端用户 | +| [docs/framework.md](../framework.md) | 架构枢纽(项目动机、设计目标、模块清单) | 架构师/贡献者 | + +--- + +## 2. 用户向([docs/guide/](../guide/)) + +> 面向最终用户的操作手册,按"安装 → 配置 → 运行 → 观测 → 排障"线性铺陈。 + +| 文档 | 主旨 | +| ------------------------------------------------- | --------------------------------------------------- | +| [guide/quickstart.md](../guide/quickstart.md) | 环境要求、安装、最小配置、启动、Claude Code 集成 | +| [guide/vendors.md](../guide/vendors.md) | 全部 9 种供应商配置详情、模型映射、定价表 | +| [guide/cli-reference.md](../guide/cli-reference.md) | start / status / usage / reset / auth 全部命令 | +| [guide/api-reference.md](../guide/api-reference.md) | /v1/messages、health、status、reset、dashboard 等 | +| [guide/dashboard.md](../guide/dashboard.md) | Web 可视化看板功能与交互 | +| [guide/monitoring.md](../guide/monitoring.md) | 日志、用量统计、性能调优、常见场景、故障排查 | + +--- + +## 3. 架构向([docs/arch/](../arch/)) + +> 面向贡献者与维护者的架构与实现细节,从 [framework.md](../framework.md) 正交分解而来。 + +| 文档 | 主旨 | +| ----------------------------------------------------- | ----------------------------------------------------- | +| [arch/config-reference.md](../arch/config-reference.md) | 配置参数权威定义(Single Source of Truth) | +| [arch/design-patterns.md](../arch/design-patterns.md) | 13 种设计模式详解(熔断器、状态机、Composite 等) | +| [arch/routing.md](../arch/routing.md) | 路由引擎 12 个子模块职责 | +| [arch/vendors.md](../arch/vendors.md) | Vendor 类层次结构与 9 种实现 | +| [arch/convert.md](../arch/convert.md) | Anthropic ↔ Gemini ↔ OpenAI 三向格式转换 | +| [arch/testing.md](../arch/testing.md) | 测试覆盖矩阵与工具链 | + +--- + +## 4. 运维向([docs/ops/](../ops/)) + +> 面向运维与发布工程的流程文档。 + +| 文档 | 主旨 | +| ----------------------------------- | ------------------------------------------------- | +| [ops/ci-cd.md](../ops/ci-cd.md) | 发布流程、热修复、回滚、CI/CD 故障排查 | + +--- + +## 5. Agent 协作([docs/agents/](./)) + +> AGENTS.md 工程行为准则的卫星文件,定义 AI Agent 协作过程中的规范与协议。 + +| 文档 | 主旨 | +| --------------------------------------------------------------- | --------------------------------------------- | +| [agents/knowledge-map.md](./knowledge-map.md) | 本文件——项目文档统一索引 | +| [agents/reference-specifications.md](./reference-specifications.md) | IEEE 文献引用格式模板与实践指南 | +| [agents/browser-validation.md](./browser-validation.md) | 浏览器验证协议(连通性自检、凭证管理、E2E) | + +--- + +## 6. 问题档案 + +| 文档 | 主旨 | +| --------------------------------- | ----------------------------------------------------- | +| [docs/issue.md](../issue.md) | 已处理 Issue 摘要档案(表因、根因、防范) | + +--- + +## 7. 工程规范(顶层) + +| 文档 | 主旨 | +| --------------------------------- | ----------------------------------------------------- | +| [AGENTS.md](../../AGENTS.md) | 工程行为准则与 AI Agent 协作协议(与 CLAUDE.md 同源) | +| [CHANGELOG.md](../../CHANGELOG.md)| 版本历史与变更日志 | + +--- + +## 维护约束 + +1. **同步原则**:新增/删除/重命名 `docs/` 下任意 .md 文件时,**必须**同步本索引。 +2. **路径基准**:本文件位于 `docs/agents/`,所有相对路径以此为基准(向上一级 `../` 访问 `docs/`,向上两级 `../../` 访问仓库根)。 +3. **链接验证**:维护者修改本文件后应通过 grep 自检:所有 `[...](path)` 中的 `path` 文件存在。 diff --git a/docs/agents/reference-specifications.md b/docs/agents/reference-specifications.md new file mode 100644 index 0000000..896b866 --- /dev/null +++ b/docs/agents/reference-specifications.md @@ -0,0 +1,16 @@ +# Reference Specifications (IEEE) + +> **模版准则**:[编号] 作者缩写. 姓, "文章标题," _刊名/会议名缩写 (斜体)_, 卷号, 期数, 页码, 年份. + +```latex +[1] A. Author, B. Author, and C. Author, "Title of paper," *Abbrev. Title of Journal*, vol. X, no. Y, pp. XX–XX, Year. +``` + +**引用实践** + +- **文内锚定**:采用标准上标链接形式:`描述内容[[1]](#ref1)`。 +- **文献索引**:底层采用 HTML 锚点 `id` 实现跳转稳定性。 + +```latex +[1] A. Vaswani et al., "Attention is all you need," Adv. Neural Inf. Process. Syst., vol. 30, pp. 5998–6008, 2017. +``` diff --git a/docs/arch/config-reference.md b/docs/arch/config-reference.md index 24e11e5..1f4460f 100644 --- a/docs/arch/config-reference.md +++ b/docs/arch/config-reference.md @@ -89,12 +89,13 @@ flowchart TD ## 5. VendorConfig 弹性字段 -| 字段 | 类型 | 默认值 | 说明 | -| -------------------- | -------------- | -------------------- | --------------------------- | -| `circuit_breaker` | config \| None | `None` | 熔断器配置(None = 终端层) | -| `retry` | config | `RetryConfig()` | 重试策略配置 | -| `quota_guard` | config | `QuotaGuardConfig()` | 日度配额守卫配置 | -| `weekly_quota_guard` | config | `QuotaGuardConfig()` | 周度配额守卫配置 | +| 字段 | 类型 | 默认值 | 说明 | +| -------------------- | -------------- | -------------------- | ----------------------------------- | +| `circuit_breaker` | config \| None | `None` | 熔断器配置(None = 终端层) | +| `retry` | config | `RetryConfig()` | 重试策略配置 | +| `quota_guard` | config | `QuotaGuardConfig()` | 日度配额守卫配置 | +| `weekly_quota_guard` | config | `QuotaGuardConfig()` | 周度配额守卫配置 | +| `concurrency` | config \| None | `None` | `[zhipu]` 每模型并发限制(详见 5.5) | @@ -143,6 +144,33 @@ flowchart TD | `error_types` | list[str] | `["rate_limit_error", "overloaded_error", "api_error"]` | | `error_message_patterns` | list[str] | `["quota", "limit exceeded", "usage cap", "capacity", "internal network failure"]` | +### 5.5 ZhipuConcurrencyConfig — Zhipu 每模型并发参数 + +仅对 `vendor: zhipu` 生效,基于 `asyncio.Semaphore` 实现 FIFO 公平排队。 + +| 字段 | 类型 | 默认值 | 说明 | +| --------- | -------------- | ------ | -------------------------------------------------------------------------------- | +| `default` | int | `3` | 全局默认并行度(适用于所有未在 `models` 中显式覆盖的模型);取值范围 `[1, 20]` | +| `models` | map[str → int] | `{}` | 按映射后模型名(如 `glm-5v-turbo` / `glm-5.1` / `glm-4.5-air`)自定义并行度上限 | + +YAML 示例: + +```yaml +- vendor: zhipu + concurrency: + default: 3 + models: + glm-5v-turbo: 5 + glm-5.1: 2 +``` + +行为语义: + +- 信号量按**映射后模型名**键控,与上游真实承载模型对齐;流式与非流式请求共用同一槽位。 +- 槽位满时新请求按 FIFO 顺序排队,直到任一在途请求释放槽位才被唤醒。 +- 429 重试期间持续占用槽位(重试视为同一请求的延续)。 +- 顶层 `concurrency` 字段缺省为 `None` → 转发至 `ZhipuConfig` 时回退默认值 `default=3`;如需完全关闭限流,可在 `ZhipuConfig` 构造层显式置 `null`(一般无需操作)。 + --- ## 6. 供应商专属字段 diff --git a/docs/arch/vendors.md b/docs/arch/vendors.md index 2ec79ad..0e0d862 100644 --- a/docs/arch/vendors.md +++ b/docs/arch/vendors.md @@ -1,7 +1,7 @@ # 供应商模块(vendors/) > 路径约定:相对于 `src/coding/proxy/` -> 定位:从 [framework.md](./framework.md) 提取,详述供应商分类体系与各供应商实现。 +> 定位:从 [framework.md](../framework.md) 提取,详述供应商分类体系与各供应商实现。 [TOC] diff --git a/docs/guide/monitoring.md b/docs/guide/monitoring.md index 7e89341..e11e648 100644 --- a/docs/guide/monitoring.md +++ b/docs/guide/monitoring.md @@ -31,7 +31,7 @@ ```yaml logging: level: "DEBUG" # 查看详细的模型映射和路由决策 - file: "coding-proxy.log" # 输出到文件 + file: ".logs/coding-proxy.log" # 输出到文件 max_bytes: 5242880 # 单文件 5 MB,触发轮转 backup_count: 5 # 保留 5 个 gzip 压缩备份 ``` diff --git a/docs/issue.md b/docs/issue.md deleted file mode 100644 index c8f9765..0000000 --- a/docs/issue.md +++ /dev/null @@ -1,47 +0,0 @@ -# Issue 处理档案 - -> 维护已处理过的 Issue 摘要(问题描述、表因根因、处理方式、后续防范、同类问题影响与处理注意事项),便于同类问题的跨上下文处理。识别相同 Issue 时应在原条目追加复盘,避免同 Issue 多处维护。 - ---- - -## streaming usage parse failed: 'NoneType' object has no attribute 'get' - -**问题描述** - -OpenAI 兼容 SSE 流式响应过程中,单次请求日志反复刷出数十条 WARNING: - -``` -WARNING streaming usage parse failed: 'NoneType' object has no attribute 'get' -``` - -警告本身被上层 `try/except` 吞掉不影响主链路,但日志噪声严重,且每帧都丢失了 usage 累加。 - -**表因** - -`StreamingUsageAccumulator.feed` 调用 `parse_usage_from_chunk` 解析 SSE chunk 时抛出 `AttributeError`。 - -**根因** - -`src/coding/proxy/routing/usage_parser.py::parse_usage_from_chunk` 中 Anthropic message_start 与 Anthropic message_delta / OpenAI 两条分支都使用了脆弱的判空模式: - -```python -if "usage" in data: # 仅判断 key 存在 - u = data["usage"] # 但值可能是 null - u.get("output_tokens", 0) # AttributeError -``` - -部分上游(含某些 OpenAI 兼容供应商)在中间 chunk 显式发送 `"usage": null` 占位帧,`in` 检查通过但取出的是 `None`。 - -**处理方式** - -将两处 guard 统一改为 `u = container.get("usage"); if isinstance(u, dict):`,既排除缺省也排除 null,并顺手移除内部冗余的 `if isinstance(u, dict):` 包装层(已被外层 guard 覆盖)。同时新增三个回归用例覆盖 `data.usage = null` / `message.usage = null` / null 帧后跟有效帧三种场景。 - -**后续防范** - -- 解析外部 SSE / JSON 结构时, 不要单独使用 `if key in data` 作为安全 guard, 应统一采用 `value = data.get(key); if isinstance(value, dict):` 的双重保护, 同时排除缺省与显式 null。 -- 对 try/except 包裹的 WARNING 路径要保持警觉: 异常被吞不代表无害,重复刷屏的同类警告往往暗示防御性 guard 过窄,需要回溯至根因修复,而非依赖 except 兜底。 - -**同类问题影响与处理注意事项** - -- 本仓库内 `parse_usage_from_chunk` 的 Gemini `usageMetadata` 分支 (line ~219) 已经使用 `isinstance(um, dict)` 防御, 不受影响, 可作为参考实现。 -- 检查其他解析器 (如 routing / vendor adapter 层) 是否还有 `if "key" in data: v = data["key"]; v.get(...)` 这种模式, 必要时同步加固。 diff --git a/docs/ci-cd.md b/docs/ops/ci-cd.md similarity index 98% rename from docs/ci-cd.md rename to docs/ops/ci-cd.md index 6b35b38..65d0464 100644 --- a/docs/ci-cd.md +++ b/docs/ops/ci-cd.md @@ -211,7 +211,7 @@ CI 流水线中使用的工具及其版本均与项目实际配置严格对齐 | 工具 | 版本 / 引用 | 来源 (Action) | 与项目配置的对齐关系 | | -------------- | ----------------------------------- | ---------------------------------------- | -------------------------------------------------------------------------- | -| Python | `["3.12", "3.13", "3.14"]` (matrix) | `actions/setup-python@v5` | 对齐 [`pyproject.toml`](../pyproject.toml) 中 `requires-python = ">=3.12"` | +| Python | `["3.12", "3.13", "3.14"]` (matrix) | `actions/setup-python@v5` | 对齐 [`pyproject.toml`](../../pyproject.toml) 中 `requires-python = ">=3.12"` | | uv | latest (v4) | `astral-sh/setup-uv@v4` | 项目强制包管理器(见 AGENTS.md 包管理规范) | | build | latest | `uv pip install --system build` | PEP 517 构建前端,后端为 hatchling | | twine | latest | `uv pip install --system twine` | 包元数据校验与上传工具 | @@ -435,7 +435,7 @@ flowchart TD ### 4.1 promote.yml 工作流架构 -[`promote.yml`](../.github/workflows/promote.yml) 由两个 Job 组成,形成 **Validate → Promote** 的串行管线: +[`promote.yml`](../../.github/workflows/promote.yml) 由两个 Job 组成,形成 **Validate → Promote** 的串行管线: #### Job 1:validate(验证门控) @@ -629,7 +629,7 @@ flowchart TD | 问题现象 | 可能原因 | 排查步骤 | 解决方案 | | ------------------------------- | ------------------------------------------------------ | ----------------------------------------------------- | --------------------------------------------------------- | | `release.yml` 未触发 | Release 创建时未触发 `published` 事件(如 Draft 状态) | 检查 Actions 页面是否有该 workflow run | 确保 Release 为非 Draft 状态;或重新发布 | -| `build` Job 失败 | `twine check` 报错(包元数据不合规) | 查看 build Job 日志中的 twine 输出 | 修复 [`pyproject.toml`](../pyproject.toml) 中的元数据字段 | +| `build` Job 失败 | `twine check` 报错(包元数据不合规) | 查看 build Job 日志中的 twine 输出 | 修复 [`pyproject.toml`](../../pyproject.toml) 中的元数据字段 | | publish 失败 (HTTP 400) | 包名或版本号冲突(目标仓库已有同版本) | 查看 verbose 日志中的响应体(已启用 `verbose: true`) | 检查 TestPyPI/PyPI 是否已有同版本;使用递增版本号 | | publish 失败 (HTTP 403) | 认证失败(Token 无效或缺失) | 检查 Job 日志中的认证错误详情 | 验证 Secret 配置或 Trusted Publisher 设置(参见 §7.2) | | `promote.yml` validate 失败 | Target release 不是 prerelease(已是 stable) | 查看 validate Job 错误信息 | 确认输入的 `tag_name` 对应的是 prerelease release | @@ -701,7 +701,7 @@ CI 流水线中的工具版本选择并非随意,每一项都与项目配置 | CI 配置 | 项目配置 | 对齐关系 | | ------------------------------------------------ | --------------------------------------------------------------------- | --------------------------------------------------------------------------- | -| `python-version: "${{ matrix.python-version }}"` | `requires-python = ">=3.12"` in [`pyproject.toml`](../pyproject.toml) | CI 构建环境必须满足项目的最低 Python 版本要求(matrix: 3.12 / 3.13 / 3.14) | +| `python-version: "${{ matrix.python-version }}"` | `requires-python = ">=3.12"` in [`pyproject.toml`](../../pyproject.toml) | CI 构建环境必须满足项目的最低 Python 版本要求(matrix: 3.12 / 3.13 / 3.14) | | `hatchling.build` (build-backend) | `[build-system] requires = ["hatchling"]` | 构建后端声明必须一致 | | `uv pip install --system` | AGENTS.md 强制使用 `uv` | GitHub Actions Runner 默认无激活的 virtualenv,需 `--system` 标志 | | `retention-days: 14` | — | Artifact 保留两周,覆盖正常的验证窗口期(通常 1-3 天) | @@ -714,7 +714,7 @@ CI 流水线中的工具版本选择并非随意,每一项都与项目配置 ### 8.1 release.yml 结构索引 -[`.github/workflows/release.yml`](../.github/workflows/release.yml) 文件结构一览: +[`.github/workflows/release.yml`](../../.github/workflows/release.yml) 文件结构一览: | 行范围 | 区块 | 内容摘要 | | ------- | ----------------------- | ---------------------------------------------------------------------------------------------------- | @@ -729,7 +729,7 @@ CI 流水线中的工具版本选择并非随意,每一项都与项目配置 ### 8.2 promote.yml 结构索引 -[`.github/workflows/promote.yml`](../.github/workflows/promote.yml) 文件结构一览: +[`.github/workflows/promote.yml`](../../.github/workflows/promote.yml) 文件结构一览: | 行范围 | 区块 | 内容摘要 | | ------ | --------------- | -------------------------------------------------------------- | diff --git a/docs/user-guide.md b/docs/user-guide.md index 81bbba1..f9ecad8 100644 --- a/docs/user-guide.md +++ b/docs/user-guide.md @@ -202,7 +202,7 @@ database: logging: level: "INFO" # DEBUG / INFO / WARNING / ERROR - # file: "coding-proxy.log" # 输出到文件 + # file: ".logs/coding-proxy.log" # 输出到文件 # max_bytes: 5242880 # 单文件 5 MB # backup_count: 5 # 保留 5 个备份 ``` diff --git a/docs/zh-CN/README.md b/docs/zh-CN/README.md index 658e27f..4b32986 100644 --- a/docs/zh-CN/README.md +++ b/docs/zh-CN/README.md @@ -30,7 +30,7 @@ ## 🌟 核心特性 (Core Features)
- +
- **⛓️ N-tier 链式故障转移 (Failover)**:自主降序序列,支持 Claude 官方 Plans,以及 GitHub Copilot、Google Antigravity、智谱、MiniMax、阿里千问、小米、Kimi、豆包等的 Coding Plan。 diff --git a/pyproject.toml b/pyproject.toml index 24630e1..14dcba1 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "coding-proxy" -version = "0.4.0" +version = "0.5.0" description = "A High-Availability, Transparent, and Smart Multi-Vendor Proxy for Claude Code. Support Claude Plans, GitHub Copilot, Google Antigravity, ZAI/GLM, MiniMax, Qwen, Xiaomi, Kimi, Doubao..." readme = "README.md" requires-python = ">=3.12" @@ -84,7 +84,10 @@ docstring-code-format = true [tool.pytest.ini_options] asyncio_mode = "auto" testpaths = ["tests"] -addopts = "-v --tb=short" +addopts = "-v --tb=short -m 'not e2e'" +markers = [ + "e2e: marks tests as end-to-end (deselect with '-m \"not e2e\"')", +] filterwarnings = [ "ignore::DeprecationWarning", ] diff --git a/src/coding/proxy/cli/__init__.py b/src/coding/proxy/cli/__init__.py index 3b479fb..b51f089 100644 --- a/src/coding/proxy/cli/__init__.py +++ b/src/coding/proxy/cli/__init__.py @@ -109,7 +109,7 @@ def start( print_banner(console, host=cfg.server.host, port=cfg.server.port) # 解析文件日志路径:未显式配置时使用默认值 - _file_path: str | None = cfg.logging.file or "coding-proxy.log" + _file_path: str | None = cfg.logging.file or ".logs/coding-proxy.log" uvicorn.run( fastapi_app, host=cfg.server.host, diff --git a/src/coding/proxy/config/config.default.yaml b/src/coding/proxy/config/config.default.yaml index 40808fd..d945125 100644 --- a/src/coding/proxy/config/config.default.yaml +++ b/src/coding/proxy/config/config.default.yaml @@ -8,7 +8,7 @@ server: logging: level: "INFO" - # file: "coding-proxy.log" # 文件日志路径;设为 null 或空字符串禁用 + # file: ".logs/coding-proxy.log" # 文件日志路径;设为 null 或空字符串禁用 # max_bytes: 5242880 # 单文件上限(5 MB),触发轮转 # backup_count: 5 # 保留 gzip 压缩备份文件数 @@ -119,6 +119,14 @@ vendors: window_hours: 24.0 threshold_percent: 95.0 probe_interval_seconds: 300 + # 每模型并发限制:默认 3 个并行请求;超出则按 FIFO 排队等待 + # 可通过 models 字段覆盖单个模型的限制(如 glm-5.1: 5) + concurrency: + default: 3 + # models: + # glm-5v-turbo: 3 + # glm-5.1: 3 + # glm-4.5-air: 3 # Vendor 4: MiniMax(默认禁用,需手动启用并添加到 tiers) - vendor: minimax diff --git a/src/coding/proxy/config/routing.py b/src/coding/proxy/config/routing.py index 3326a0b..2c29363 100644 --- a/src/coding/proxy/config/routing.py +++ b/src/coding/proxy/config/routing.py @@ -9,6 +9,7 @@ from pydantic import BaseModel, BeforeValidator, Field, PrivateAttr, model_validator from .resiliency import CircuitBreakerConfig, QuotaGuardConfig, RetryConfig +from .vendors import ZhipuConcurrencyConfig # ── 价格字段解析($ / ¥ 前缀支持) ────────────────────────── @@ -64,13 +65,13 @@ def _detect_currency(v: Any) -> str | None: "api_key", } ) -# 向后兼容别名 -_ZHIPU_FIELDS = _NATIVE_ANTHROPIC_FIELDS +# Zhipu 独占字段:在通用 api_key 基础上增加每模型并发限制 +_ZHIPU_FIELDS: frozenset[str] = _NATIVE_ANTHROPIC_FIELDS | frozenset({"concurrency"}) _VENDOR_EXCLUSIVE_FIELDS: dict[str, frozenset[str]] = { "copilot": _COPILOT_FIELDS, "antigravity": _ANTIGRAVITY_FIELDS, - "zhipu": _NATIVE_ANTHROPIC_FIELDS, + "zhipu": _ZHIPU_FIELDS, "minimax": _NATIVE_ANTHROPIC_FIELDS, "kimi": _NATIVE_ANTHROPIC_FIELDS, "doubao": _NATIVE_ANTHROPIC_FIELDS, @@ -285,6 +286,12 @@ class VendorConfig(BaseModel): quota_guard: QuotaGuardConfig = Field(default_factory=QuotaGuardConfig) weekly_quota_guard: QuotaGuardConfig = Field(default_factory=QuotaGuardConfig) + # ── Zhipu 专属:每模型并发限制 ─────────────────────────── + concurrency: ZhipuConcurrencyConfig | None = Field( + default=None, + description="[zhipu] 每模型并发限制;None 表示不限并发", + ) + @model_validator(mode="after") def _warn_irrelevant_fields(self) -> VendorConfig: """对非当前 vendor 类型的非空专属字段发出 warning.""" diff --git a/src/coding/proxy/config/schema.py b/src/coding/proxy/config/schema.py index ee21ee7..40e5428 100644 --- a/src/coding/proxy/config/schema.py +++ b/src/coding/proxy/config/schema.py @@ -54,6 +54,7 @@ KimiConfig, MinimaxConfig, XiaomiConfig, + ZhipuConcurrencyConfig, ZhipuConfig, ) @@ -318,6 +319,7 @@ def compat_state_path(self) -> Path: "CopilotConfig", "AntigravityConfig", "ZhipuConfig", + "ZhipuConcurrencyConfig", # resiliency "CircuitBreakerConfig", "RetryConfig", diff --git a/src/coding/proxy/config/server.py b/src/coding/proxy/config/server.py index 7d67207..6fa3e8f 100644 --- a/src/coding/proxy/config/server.py +++ b/src/coding/proxy/config/server.py @@ -21,7 +21,7 @@ class LoggingConfig(BaseModel): Attributes: level: 控制台日志级别(INFO / WARNING / DEBUG 等)。 - file: 文件日志路径。为 ``None`` 时使用默认值 ``coding-proxy.log``; + file: 文件日志路径。为 ``None`` 时使用默认值 ``.logs/coding-proxy.log``; 设为空字符串可禁用文件日志。 max_bytes: 单个日志文件最大字节数(触发轮转)。默认 5 MB。 backup_count: 保留的已压缩备份文件数。默认 5。 diff --git a/src/coding/proxy/config/vendors.py b/src/coding/proxy/config/vendors.py index 4f15531..a1c0280 100644 --- a/src/coding/proxy/config/vendors.py +++ b/src/coding/proxy/config/vendors.py @@ -2,7 +2,21 @@ from __future__ import annotations -from pydantic import BaseModel +from pydantic import BaseModel, Field + + +class ZhipuConcurrencyConfig(BaseModel): + """Zhipu 每模型并发限制配置.""" + + default: int = Field(default=3, ge=1, le=20, description="全局默认并行度") + models: dict[str, int] = Field( + default_factory=dict, + description="按映射后模型名自定义并行度(覆盖 default)", + ) + + def get_limit(self, model: str) -> int: + """获取指定模型的并行度限制.""" + return self.models.get(model, self.default) class AnthropicConfig(BaseModel): @@ -48,6 +62,7 @@ class ZhipuConfig(BaseModel): base_url: str = "https://open.bigmodel.cn/api/anthropic" api_key: str = "" timeout_ms: int = 3000000 + concurrency: ZhipuConcurrencyConfig = Field(default_factory=ZhipuConcurrencyConfig) class MinimaxConfig(BaseModel): @@ -100,6 +115,7 @@ class AlibabaConfig(BaseModel): "CopilotConfig", "AntigravityConfig", "ZhipuConfig", + "ZhipuConcurrencyConfig", "MinimaxConfig", "KimiConfig", "DoubaoConfig", diff --git a/src/coding/proxy/convert/vendor_channels.py b/src/coding/proxy/convert/vendor_channels.py index bec46f7..456a9b3 100644 --- a/src/coding/proxy/convert/vendor_channels.py +++ b/src/coding/proxy/convert/vendor_channels.py @@ -219,9 +219,114 @@ def enforce_anthropic_tool_pairing( ", ".join(synthesized_ids), ) + # 纵深防御: sanity 兜底,捕获主循环未覆盖的边角配对漏洞 + adaptations.extend(_enforce_pairing_sanity_pass(messages_list)) + return adaptations +def _enforce_pairing_sanity_pass( + messages_list: list[dict[str, Any]], +) -> list[str]: + """``enforce_anthropic_tool_pairing`` 主循环之后的纯检测兜底 helper. + + 职责正交于主循环(不剥离 tool_result、不插入新 user 消息),仅做两件事: + + 1. 遍历每个 ``role == "assistant"`` 且包含 ``tool_use`` 块的消息, + 检查 ``messages[i+1]`` 是否为 ``user`` 且包含所有 ``tool_use.id`` 对应 + ``tool_result.tool_use_id``。 + 2. 缺失项在该 user 消息末尾追加 ``is_error=True`` 占位块;如果 next 消息根本 + 不是 user(主循环未触达此分支的退化场景),同样不做插入,仅记录 WARNING + 供运维定位 —— 该路径正常情况下永不命中(主循环已保证 next user 存在)。 + + 本 helper 单独抽出的目的有两个: + + - 直接构造"绕过主循环"的输入做单元测试,确保 sanity 分支具备**正向回归保护** + (历史教训: ``9061cd0`` 引入两遍扫描+sanity 后被 ``2bac9a7`` 连带回滚, + 重要原因之一是缺乏对兜底路径的独立单测)。 + - 在主循环 A-F 步骤未来重构时,sanity 仍能稳定守住 Anthropic 配对约束。 + + Args: + messages_list: 消息列表(就地修改)。 + + Returns: + 新增的 adaptation 标签列表(命中则为 ``["pairing_sanity_repaired"]``,否则空列表)。 + """ + repaired: list[tuple[int, str]] = [] + + for i, msg in enumerate(messages_list): + if not isinstance(msg, dict) or msg.get("role") != "assistant": + continue + content = msg.get("content") + if not isinstance(content, list): + continue + tool_use_ids = [ + b["id"] + for b in content + if isinstance(b, dict) and b.get("type") == "tool_use" and b.get("id") + ] + if not tool_use_ids: + continue + + next_idx = i + 1 + if ( + next_idx >= len(messages_list) + or not isinstance(messages_list[next_idx], dict) + or messages_list[next_idx].get("role") != "user" + ): + # 主循环正常情况下已保证 next 为 user;此处仅日志告警,不做隐式插入 + # 以避免与主循环职责重叠。 + logger.warning( + "Sanity pass: assistant at messages[%d] has tool_use without " + "user next message (tool_use_ids=%s). Main enforce loop may have a regression.", + i, + ", ".join(tool_use_ids), + ) + continue + + user_msg = messages_list[next_idx] + user_content = user_msg.get("content") + if not isinstance(user_content, list): + # 主循环 D 步已将 string content 归一化为 list;这里防御性兜底 + user_msg["content"] = ( + [{"type": "text", "text": user_content}] + if isinstance(user_content, str) + else [] + ) + user_content = user_msg["content"] + + existing_result_ids = { + b["tool_use_id"] + for b in user_content + if isinstance(b, dict) + and b.get("type") == "tool_result" + and b.get("tool_use_id") + } + for uid in tool_use_ids: + if uid in existing_result_ids: + continue + user_content.append( + { + "type": "tool_result", + "tool_use_id": uid, + "content": "", + "is_error": True, + } + ) + repaired.append((i, uid)) + + if not repaired: + return [] + + logger.warning( + "Sanity pass repaired %d unpaired tool_use(s) missed by main enforce loop. " + "Affected: %s", + len(repaired), + ", ".join(f"messages[{idx}]:{uid}" for idx, uid in repaired), + ) + return ["pairing_sanity_repaired"] + + def _strip_cache_control(body: dict[str, Any]) -> int: """从 system/messages/tools 中移除 cache_control 字段(就地). @@ -262,6 +367,59 @@ def _strip_cache_control(body: dict[str, Any]) -> int: return removed +# ── zhipu 共享清洗函数 ────────────────────────────────────────── + +# 跨供应商转换时主动剥离的顶层参数。 +# 首选 tier 场景的 thinking.type=adaptive 兼容转换由 +# ZhipuVendor._prepare_request 处理(转换为 enabled + budget,保留功能), +# 此处仅负责 failover 路径的全量剥离(跨供应商 thinking signature 失效)。 +_ZHIPU_UNSUPPORTED_PARAMS: frozenset[str] = frozenset( + {"thinking", "extended_thinking", "reasoning_effort"} +) + + +def normalize_for_zhipu(body: dict[str, Any]) -> tuple[dict[str, Any], list[str]]: + """为 zhipu GLM 的 Anthropic 兼容端点清洗请求体(就地,不 deep copy). + + 为跨供应商转换通道 ``prepare_copilot_to_zhipu`` 提供请求体清洗。 + + 清洗内容: + 1. 剥离 cache_control 字段(GLM 静默忽略,主动剥离以减少噪音) + 2. 移除顶层 thinking/extended_thinking/reasoning_effort 参数(GLM 原生支持 + thinking、静默忽略 reasoning_effort,但跨供应商场景下这些参数来自原供应商 + 的协议语义,主动剥离以确保请求语义一致性) + 3. 强制 tool_use/tool_result 配对约束 + + 不包含 thinking blocks 剥离:跨供应商场景下 history 中的 thinking blocks + 来自原供应商(签名失效),由调用方在调用本函数之前通过 + ``strip_thinking_blocks`` 单独处理。 + + 所有操作均为幂等,安全地在已清洗的请求体上重复调用。 + + Returns: + (body, adaptations) — body 为就地修改后的同一引用,adaptations 为变换描述列表。 + """ + adaptations: list[str] = [] + + # Step 1: 剥离 cache_control + removed_cc = _strip_cache_control(body) + if removed_cc: + adaptations.append(f"removed_{removed_cc}_cache_control_fields") + + # Step 2: 移除不支持的顶层参数 + for param in _ZHIPU_UNSUPPORTED_PARAMS: + if param in body: + del body[param] + adaptations.append(f"removed_{param}_param") + + # Step 3: 强制 tool_use/tool_result 配对 + pairing_fixes = enforce_anthropic_tool_pairing(body.get("messages", [])) + if pairing_fixes: + adaptations.extend(pairing_fixes) + + return body, adaptations + + def _remove_vendor_blocks(body: dict[str, Any], block_types: set[str]) -> int: """从 messages[].content[] 中就地移除指定 type 的内容块. @@ -294,8 +452,22 @@ def _rewrite_srvtoolu_ids(body: dict[str, Any]) -> tuple[int, dict[str, str]]: Anthropic API 要求 tool_use 类型与 ``toolu_*`` 格式的 ID。Zhipu 的 ``server_tool_use`` + ``srvtoolu_*`` 在上游 Anthropic 兼容端点可用,但无法 - 透传至其他供应商;同时还需重写紧随其后 user 消息中 ``tool_result.tool_use_id`` - 引用,保持配对关系。 + 透传至其他供应商;同时还需重写所有 ``tool_result.tool_use_id`` 引用,保持配对关系。 + + **两遍扫描(消除块顺序敏感性)**: + + - Pass 1: 仅遍历 ``role == "assistant"`` 的消息,按 assistant 出现顺序为每个 + 待改写的 tool_use 分配 ``toolu_normalized_N`` 新 ID,建立完整 ``id_map``。 + - Pass 2: 全量遍历消息,对任意 ``tool_result.tool_use_id ∈ id_map`` 的块 + 原地改写为新 ID(不分 user / assistant,覆盖 misplaced 与跨消息边界场景)。 + + 单遍方案在 GLM-5 偶发将 inline ``tool_result`` 输出在对应 ``server_tool_use`` + 之前的乱序场景下,会因 Case B 时 ``id_map`` 尚未填入而漏改 ``tool_use_id``, + 导致 ``enforce_anthropic_tool_pairing`` 后 ``extracted_tool_results`` 的 key + 与 ``tool_use_ids`` 不一致,进而把本应配对的 misplaced tool_result 默默丢弃, + 最终触发 Anthropic ``messages.x: tool_use ids were found without tool_result + blocks immediately after`` 400 错误。两遍扫描以"先建表、后改写"的次序消除该 + 时序耦合。 Returns: (rewritten_count, id_map) — 重写次数与 {原 ID: 新 ID} 映射。 @@ -308,45 +480,56 @@ def next_id() -> str: counter += 1 return f"toolu_normalized_{counter}" + # Pass 1: 扫描 assistant 消息,改写 tool_use / server_tool_use 的 id 与 type, + # 按出现顺序填充 id_map(保持与单遍版本相同的序号分配,避免破坏既有断言)。 for message in body.get("messages", []): - if not isinstance(message, dict): + if not isinstance(message, dict) or message.get("role") != "assistant": continue content = message.get("content") if not isinstance(content, list): continue - role = message.get("role") for block in content: if not isinstance(block, dict): continue block_type = block.get("type") + if block_type not in {"tool_use", "server_tool_use"}: + continue block_id = block.get("id") - - # Case A: assistant 消息里的 server_tool_use / srvtoolu_* → 改写 - if role == "assistant" and block_type in {"tool_use", "server_tool_use"}: - if isinstance(block_id, str) and _ANTHROPIC_SERVER_TOOL_USE_ID_RE.match( - block_id - ): - new_id = next_id() - id_map[block_id] = new_id - block["id"] = new_id - block["type"] = "tool_use" - elif ( - isinstance(block_id, str) - and block_id - and not _ANTHROPIC_TOOL_USE_ID_RE.match(block_id) - and block.get("name") - ): - # 非标准 ID(非 toolu_ / srvtoolu_),且具备 name 可改写 - new_id = next_id() - id_map[block_id] = new_id - block["id"] = new_id - block["type"] = "tool_use" - elif block_type == "server_tool_use" and isinstance(block_id, str): - # 兜底: 类型是 server_tool_use 但 ID 已是标准 toolu_ 形式,仅纠正类型 - block["type"] = "tool_use" - - # Case B: user 消息里的 tool_result.tool_use_id 同步重写 - if block_type == "tool_result": + if isinstance(block_id, str) and _ANTHROPIC_SERVER_TOOL_USE_ID_RE.match( + block_id + ): + new_id = next_id() + id_map[block_id] = new_id + block["id"] = new_id + block["type"] = "tool_use" + elif ( + isinstance(block_id, str) + and block_id + and not _ANTHROPIC_TOOL_USE_ID_RE.match(block_id) + and block.get("name") + ): + # 非标准 ID(非 toolu_ / srvtoolu_),且具备 name 可改写 + new_id = next_id() + id_map[block_id] = new_id + block["id"] = new_id + block["type"] = "tool_use" + elif block_type == "server_tool_use" and isinstance(block_id, str): + # 兜底: 类型是 server_tool_use 但 ID 已是标准 toolu_ 形式,仅纠正类型 + block["type"] = "tool_use" + + # Pass 2: 全量扫描,对任意 tool_result.tool_use_id 命中 id_map 的块同步改写。 + if id_map: + for message in body.get("messages", []): + if not isinstance(message, dict): + continue + content = message.get("content") + if not isinstance(content, list): + continue + for block in content: + if not isinstance(block, dict): + continue + if block.get("type") != "tool_result": + continue tool_use_id = block.get("tool_use_id") if isinstance(tool_use_id, str) and tool_use_id in id_map: block["tool_use_id"] = id_map[tool_use_id] @@ -414,26 +597,14 @@ def prepare_copilot_to_zhipu( prepared = copy.deepcopy(body) adaptations: list[str] = [] - # Step 1: 剥离 thinking/redacted_thinking 块 + # Step 1: 剥离 thinking/redacted_thinking 块(跨供应商签名失效) stripped = strip_thinking_blocks(prepared) if stripped: adaptations.append(f"stripped_{stripped}_thinking_blocks") - # Step 2: 移除 cache_control 字段 - removed_cc = _strip_cache_control(prepared) - if removed_cc: - adaptations.append(f"removed_{removed_cc}_cache_control_fields") - - # Step 3: 移除顶层 thinking/extended_thinking 参数(GLM-5 不支持) - for param in ("thinking", "extended_thinking"): - if param in prepared: - del prepared[param] - adaptations.append(f"removed_{param}_param") - - # Step 4: 强制 tool_use/tool_result 配对 - pairing_fixes = enforce_anthropic_tool_pairing(prepared.get("messages", [])) - if pairing_fixes: - adaptations.extend(pairing_fixes) + # Step 2: 共享清洗(cache_control、不支持的顶层参数、tool pairing) + _, norm_adaptations = normalize_for_zhipu(prepared) + adaptations.extend(norm_adaptations) return prepared, adaptations diff --git a/src/coding/proxy/logging/db.py b/src/coding/proxy/logging/db.py index ffe9b2c..8470966 100644 --- a/src/coding/proxy/logging/db.py +++ b/src/coding/proxy/logging/db.py @@ -190,6 +190,14 @@ def _local_month_udf(ts_str: str) -> str: ); """ +_CREATE_SESSION_META = """ +CREATE TABLE IF NOT EXISTS session_meta ( + session_key TEXT PRIMARY KEY, + title TEXT NOT NULL DEFAULT '', + created_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now')) +); +""" + _CREATE_INDEXES = """ CREATE INDEX IF NOT EXISTS idx_usage_ts ON usage_log(ts); CREATE INDEX IF NOT EXISTS idx_usage_vendor ON usage_log(vendor); @@ -245,6 +253,7 @@ async def init(self) -> None: self._db.row_factory = aiosqlite.Row await self._db.execute("PRAGMA journal_mode=WAL") await self._db.executescript(_CREATE_TABLES) + await self._db.executescript(_CREATE_SESSION_META) # 迁移必须在建索引之前执行,确保 vendor 列已存在 await self._migrate_rename_backend_to_vendor() await self._migrate_add_failover_from() @@ -316,6 +325,28 @@ async def _migrate_rename_backend_to_vendor(self) -> None: "Migration: renamed 'backend' column to 'vendor' in %s", table ) + async def set_session_title(self, session_key: str, title: str) -> None: + """为新 session 设置标题(幂等,仅首次写入).""" + if not self._db or not title or not session_key: + return + await self._db.execute( + "INSERT OR IGNORE INTO session_meta (session_key, title) VALUES (?, ?)", + (session_key, title), + ) + await self._db.commit() + + async def get_session_titles(self, session_keys: list[str]) -> dict[str, str]: + """批量查询 session 标题.""" + if not self._db or not session_keys: + return {} + placeholders = ",".join("?" for _ in session_keys) + cursor = await self._db.execute( + f"SELECT session_key, title FROM session_meta WHERE session_key IN ({placeholders})", + session_keys, + ) + rows = await cursor.fetchall() + return {row["session_key"]: row["title"] for row in rows} + async def log( self, vendor: str, @@ -604,7 +635,8 @@ async def query_recent_sessions( MIN(ts) AS first_seen_ts, MAX(ts) AS last_active_ts, COUNT(*) AS total_requests, - SUM(input_tokens + output_tokens) AS total_tokens, + SUM(input_tokens + output_tokens + + cache_creation_tokens + cache_read_tokens) AS total_tokens, SUM(input_tokens) AS total_input, SUM(output_tokens) AS total_output, GROUP_CONCAT(DISTINCT model_served) AS models, @@ -620,7 +652,13 @@ async def query_recent_sessions( (cutoff_iso, limit), ) rows = await cursor.fetchall() - return [dict(row) for row in rows] + sessions = [dict(row) for row in rows] + if sessions: + keys = [s["session_key"] for s in sessions] + titles = await self.get_session_titles(keys) + for s in sessions: + s["title"] = titles.get(s["session_key"], "") + return sessions async def query_session_profile(self, session_key: str) -> dict | None: """查询单个会话的完整聚合数据.""" @@ -631,7 +669,8 @@ async def query_session_profile(self, session_key: str) -> dict | None: MIN(ts) AS first_seen_ts, MAX(ts) AS last_active_ts, COUNT(*) AS total_requests, - SUM(input_tokens + output_tokens) AS total_tokens, + SUM(input_tokens + output_tokens + + cache_creation_tokens + cache_read_tokens) AS total_tokens, SUM(input_tokens) AS total_input, SUM(output_tokens) AS total_output, GROUP_CONCAT(DISTINCT model_served) AS models, diff --git a/src/coding/proxy/native_api/handler.py b/src/coding/proxy/native_api/handler.py index 790c5f2..ab7b344 100644 --- a/src/coding/proxy/native_api/handler.py +++ b/src/coding/proxy/native_api/handler.py @@ -13,11 +13,14 @@ from __future__ import annotations +import asyncio import json import logging +import re import time from collections.abc import AsyncIterator from typing import TYPE_CHECKING +from urllib.parse import unquote import httpx @@ -172,8 +175,16 @@ async def dispatch( ) method = request.method.upper() - operation = OperationClassifier.classify(provider, method, rest_path) - endpoint = rest_path if rest_path.startswith("/") else f"/{rest_path}" + # 防御性 URL 解码:确保 %3A → : 以兼容 Gemini :verb 路径语法。 + # ASGI 规范要求 scope["path"] 已解码,但部分服务器/反向代理对 + # 合法路径字符(如冒号)可能保留编码形态。 + decoded_rest_path = unquote(rest_path) + operation = OperationClassifier.classify(provider, method, decoded_rest_path) + endpoint = ( + decoded_rest_path + if decoded_rest_path.startswith("/") + else f"/{decoded_rest_path}" + ) upstream_headers = _filter_request_headers(dict(request.headers)) # 强制 identity —— 阻止上游压缩(httpx 默认会自动补 gzip,deflate; @@ -185,6 +196,28 @@ async def dispatch( start_ts = time.perf_counter() client = self._get_client(provider) + # ── Gemini embedding Vertex AI 格式转换 ────────────────── + # 当上游非官方 Google AI Studio(generativelanguage.googleapis.com)时, + # litellm 发送的 Google AI Studio 格式(v1beta/models/{model}:batchEmbedContents) + # 需转换为 Vertex AI 格式(v1beta1/publishers/google/models/{model}:embedContent)。 + vertex_rewrite = ( + provider == "gemini" + and operation in ("embedding", "embedding.batch") + and cfg.base_url + and "generativelanguage.googleapis.com" not in cfg.base_url + ) + if vertex_rewrite: + return await self._dispatch_gemini_vertex_embedding( + client=client, + operation=operation, + endpoint=endpoint, + body_bytes=body_bytes, + upstream_headers=upstream_headers, + query_string=query_string, + provider=provider, + start_ts=start_ts, + ) + # 构造上游 URL(保留 query) upstream_url = endpoint if query_string: @@ -286,6 +319,313 @@ async def dispatch( media_type=content_type or None, ) + # ── Gemini embedding → Vertex AI 格式转换 ────────────────── + + # Google AI Studio 路径正则:[v1beta/]models/{model}:{verb} + # 版本段允许缺失以兼容 litellm `_check_custom_proxy` 丢失 v1beta 前缀的 bug。 + _GEMINI_EMBED_PATH_RE = re.compile( + r"^/?(?:v1(?:beta1?)?/)?models/(?P[^/:]+)(?::|%3A)(?PembedContent|batchEmbedContents)/?$" + ) + + async def _dispatch_gemini_vertex_embedding( + self, + *, + client: httpx.AsyncClient, + operation: str, + endpoint: str, + body_bytes: bytes, + upstream_headers: dict[str, str], + query_string: str, + provider: str, + start_ts: float, + ) -> StarletteResponse: + """将 Google AI Studio 格式的 embedding 请求转换为 Vertex AI 格式. + + Google AI Studio: + POST v1beta/models/{model}:batchEmbedContents + Body: {"requests": [{"model": "models/{model}", "content": {...}}]} + + Vertex AI: + POST v1beta1/publishers/google/models/{model}:embedContent + Body: {"content": {...}} + """ + from fastapi.responses import Response as FastAPIResponse + + match = self._GEMINI_EMBED_PATH_RE.match(endpoint) + if not match: + return FastAPIResponse( + content=json.dumps( + { + "error": { + "message": f"unrecognized gemini embedding path: {endpoint}" + } + } + ).encode(), + status_code=400, + media_type="application/json", + ) + + model_name = match.group("model") + verb = match.group("verb") + + # 解析原始请求体 + try: + body = json.loads(body_bytes) if body_bytes else {} + except (json.JSONDecodeError, UnicodeDecodeError): + return FastAPIResponse( + content=json.dumps( + {"error": {"message": "invalid JSON body for embedding request"}} + ).encode(), + status_code=400, + media_type="application/json", + ) + + if verb == "batchEmbedContents": + return await self._vertex_batch_embed( + client=client, + model_name=model_name, + body=body, + upstream_headers=upstream_headers, + query_string=query_string, + provider=provider, + operation=operation, + endpoint=endpoint, + start_ts=start_ts, + ) + + # 单次 embedContent:直接转换 + content = body.get("content", body) + return await self._vertex_single_embed( + client=client, + model_name=model_name, + content=content, + upstream_headers=upstream_headers, + query_string=query_string, + provider=provider, + operation=operation, + endpoint=endpoint, + start_ts=start_ts, + ) + + async def _vertex_single_embed( + self, + *, + client: httpx.AsyncClient, + model_name: str, + content: dict, + upstream_headers: dict[str, str], + query_string: str, + provider: str, + operation: str, + endpoint: str, + start_ts: float, + ) -> StarletteResponse: + """发送单次 Vertex AI embedContent 请求.""" + from fastapi.responses import Response as FastAPIResponse + + vertex_path = f"/v1beta1/publishers/google/models/{model_name}:embedContent" + vertex_url = vertex_path + if query_string: + vertex_url = f"{vertex_path}?{query_string}" + + vertex_body = json.dumps({"content": content}).encode() + + req = client.build_request( + method="POST", + url=vertex_url, + content=vertex_body, + headers=upstream_headers, + ) + + try: + upstream_resp = await client.send(req, stream=True) + except ( + httpx.TimeoutException, + httpx.ConnectError, + httpx.ReadError, + httpx.RemoteProtocolError, + ) as exc: + duration_ms = int((time.perf_counter() - start_ts) * 1000) + await self._record_failure( + provider=provider, + operation=operation, + endpoint=endpoint, + duration_ms=duration_ms, + reason=str(exc), + ) + return FastAPIResponse( + content=json.dumps( + { + "error": { + "message": f"upstream unreachable: {exc}", + "type": "api_error", + } + } + ).encode(), + status_code=502, + media_type="application/json", + ) + + try: + raw_body = await upstream_resp.aread() + finally: + await upstream_resp.aclose() + + duration_ms = int((time.perf_counter() - start_ts) * 1000) + status = upstream_resp.status_code + content_type = upstream_resp.headers.get("content-type", "").lower() + resp_headers = _filter_response_headers(dict(upstream_resp.headers)) + + # 用量抽取 + extraction = ExtractionResult() + if "application/json" in content_type and raw_body: + try: + parsed = json.loads(raw_body.decode("utf-8", errors="replace")) + if isinstance(parsed, dict): + extraction = extract_usage( + provider, operation, parsed, status, dict(upstream_resp.headers) + ) + except (json.JSONDecodeError, UnicodeDecodeError): + pass + + vendor_label = _VENDOR_LABEL[provider] + await self._record_usage( + provider=provider, + operation=operation, + endpoint=endpoint, + duration_ms=duration_ms, + status=status, + extraction=extraction, + evidence_records=_build_nonstream_evidence( + vendor=vendor_label, extraction=extraction + ), + ) + + return FastAPIResponse( + content=raw_body, + status_code=status, + headers=resp_headers, + media_type=content_type or None, + ) + + async def _vertex_batch_embed( + self, + *, + client: httpx.AsyncClient, + model_name: str, + body: dict, + upstream_headers: dict[str, str], + query_string: str, + provider: str, + operation: str, + endpoint: str, + start_ts: float, + ) -> StarletteResponse: + """将 batchEmbedContents 拆分为多次 embedContent 调用并聚合响应.""" + from fastapi.responses import Response as FastAPIResponse + + requests_list = body.get("requests", []) + if not requests_list: + return FastAPIResponse( + content=json.dumps( + { + "error": { + "message": "batchEmbedContents requires non-empty 'requests' field" + } + } + ).encode(), + status_code=400, + media_type="application/json", + ) + + vertex_path = f"/v1beta1/publishers/google/models/{model_name}:embedContent" + vertex_url = vertex_path + if query_string: + vertex_url = f"{vertex_path}?{query_string}" + + # 并发发送所有 embedContent 请求 + async def _single(req_body: dict) -> tuple[dict, int]: + content = req_body.get("content", req_body) + vertex_body = json.dumps({"content": content}).encode() + req = client.build_request( + method="POST", + url=vertex_url, + content=vertex_body, + headers=upstream_headers, + ) + try: + resp = await client.send(req, stream=False) + except ( + httpx.TimeoutException, + httpx.ConnectError, + httpx.ReadError, + httpx.RemoteProtocolError, + ) as exc: + return {"error": {"message": f"upstream unreachable: {exc}"}}, 502 + try: + return resp.json(), resp.status_code + except Exception: + return {"error": {"message": resp.text[:200]}}, resp.status_code + + results = await asyncio.gather(*[_single(r) for r in requests_list]) + + # 检查是否有失败的请求 + embeddings = [] + for resp_json, resp_status in results: + if resp_status != 200: + # 返回第一个错误 + return FastAPIResponse( + content=json.dumps(resp_json).encode(), + status_code=resp_status, + media_type="application/json", + ) + embedding_data = resp_json.get("embedding", {}) + embeddings.append(embedding_data) + + # 聚合为 batchEmbedContents 响应格式 + batch_response = {"embeddings": embeddings} + duration_ms = int((time.perf_counter() - start_ts) * 1000) + + # 用量抽取 + extraction = ExtractionResult() + for resp_json, _ in results: + if isinstance(resp_json, dict): + ext = extract_usage(provider, operation, resp_json, 200, {}) + extraction = ExtractionResult( + input_tokens=extraction.input_tokens + ext.input_tokens, + output_tokens=extraction.output_tokens + ext.output_tokens, + cache_creation_tokens=extraction.cache_creation_tokens + + ext.cache_creation_tokens, + cache_read_tokens=extraction.cache_read_tokens + + ext.cache_read_tokens, + request_id=ext.request_id or extraction.request_id, + model_served=ext.model_served or extraction.model_served, + raw_usage=ext.raw_usage or extraction.raw_usage, + source_field_map=ext.source_field_map + or extraction.source_field_map, + evidence_kind=ext.evidence_kind or extraction.evidence_kind, + extra_usage=ext.extra_usage or extraction.extra_usage, + ) + + vendor_label = _VENDOR_LABEL[provider] + await self._record_usage( + provider=provider, + operation=operation, + endpoint=endpoint, + duration_ms=duration_ms, + status=200, + extraction=extraction, + evidence_records=_build_nonstream_evidence( + vendor=vendor_label, extraction=extraction + ), + ) + + return FastAPIResponse( + content=json.dumps(batch_response).encode(), + status_code=200, + media_type="application/json", + ) + # ── SSE 流式转发(同时累加 usage) ───────────────────────── async def _stream_and_accumulate( diff --git a/src/coding/proxy/native_api/operation.py b/src/coding/proxy/native_api/operation.py index 12f3307..2080b6c 100644 --- a/src/coding/proxy/native_api/operation.py +++ b/src/coding/proxy/native_api/operation.py @@ -48,30 +48,34 @@ class _Rule: ) # ── Gemini ──────────────────────────────────────────────────────── -# Gemini 的方法动词作为路径后缀(``:generateContent``),通过正则提取 +# Gemini 的方法动词作为路径后缀(``:generateContent``),通过正则提取。 +# ``v1(?:beta1?)?/`` 前缀允许缺失,以兼容 litellm `_check_custom_proxy` 在 +# 自定义 ``api_base`` 场景下丢失版本段的 bug(参考 litellm issue #17759)。 _GEMINI_RULES: tuple[_Rule, ...] = ( _Rule( - re.compile(r"^/?v1(?:beta)?/models/[^/]+:streamGenerateContent/?$"), + re.compile( + r"^/?(?:v1(?:beta1?)?/)?models/[^/]+(?:%3A|:)streamGenerateContent/?$" + ), "generate_content", ), _Rule( - re.compile(r"^/?v1(?:beta)?/models/[^/]+:generateContent/?$"), + re.compile(r"^/?(?:v1(?:beta1?)?/)?models/[^/]+(?:%3A|:)generateContent/?$"), "generate_content", ), _Rule( - re.compile(r"^/?v1(?:beta)?/models/[^/]+:countTokens/?$"), + re.compile(r"^/?(?:v1(?:beta1?)?/)?models/[^/]+(?:%3A|:)countTokens/?$"), "count_tokens", ), _Rule( - re.compile(r"^/?v1(?:beta)?/models/[^/]+:embedContent/?$"), + re.compile(r"^/?(?:v1(?:beta1?)?/)?models/[^/]+(?:%3A|:)embedContent/?$"), "embedding", ), _Rule( - re.compile(r"^/?v1(?:beta)?/models/[^/]+:batchEmbedContents/?$"), + re.compile(r"^/?(?:v1(?:beta1?)?/)?models/[^/]+(?:%3A|:)batchEmbedContents/?$"), "embedding.batch", ), _Rule( - re.compile(r"^/?v1(?:beta)?/models/[^/]+:predict/?$"), + re.compile(r"^/?(?:v1(?:beta1?)?/)?models/[^/]+(?:%3A|:)predict/?$"), "predict", ), _Rule( @@ -159,7 +163,8 @@ def is_stream_path(provider: str, path: str) -> bool: normalized = path if path.startswith("/") else f"/{path}" return bool( re.match( - r"^/?v1(?:beta)?/models/[^/]+:streamGenerateContent/?$", normalized + r"^/?v1(?:beta)?/models/[^/]+(?:%3A|:)streamGenerateContent/?$", + normalized, ) ) diff --git a/src/coding/proxy/routing/executor.py b/src/coding/proxy/routing/executor.py index 9d33ca9..4c37f02 100644 --- a/src/coding/proxy/routing/executor.py +++ b/src/coding/proxy/routing/executor.py @@ -6,7 +6,9 @@ from __future__ import annotations +import json import logging +import re import time from collections.abc import AsyncIterator from typing import Any @@ -43,10 +45,320 @@ # 向后兼容别名 BackendResponse = VendorResponse NoCompatibleBackendError = NoCompatibleVendorError -from ..compat.canonical import CompatibilityStatus, build_canonical_request +from ..compat.canonical import ( + CanonicalPartType, + CompatibilityStatus, + build_canonical_request, +) +from ..model.compat import CanonicalRequest logger = logging.getLogger(__name__) +_SESSION_TITLE_MAX_LEN = 30 + +# Claude Code 注入的"噪声"标签 — 系统级上下文,不应进入 Session 标题。 +# 这些标签由 CC harness 在首个 user 消息 content 中拼接,高度同质, +# 直接用作标题会导致跨会话标题无差异化,丧失辨识度。 +_NOISE_TAG_PATTERN = re.compile( + r"<(?Psystem-reminder|user-preferences|" + r"local-command-stdout|local-command-stderr|" + r"bash-input|bash-stdout|bash-stderr|" + r"ide_selection|stdin|system_instruction)\b[^>]*>" + r".*?", + flags=re.DOTALL | re.IGNORECASE, +) + +# Slash command 子标签:用于识别 /commit、/review 等命令式调用, +# 合成"命令 + 参数"式标题。 +_CMD_NAME_PATTERN = re.compile(r"(.*?)", flags=re.DOTALL) +_CMD_ARGS_PATTERN = re.compile(r"(.*?)", flags=re.DOTALL) +# 残留 command-* 包裹标签清除(command-message/command-stdout 等次要标签)。 +_CMD_WRAPPER_PATTERN = re.compile( + r".*?", flags=re.DOTALL +) + + +def _sanitize_user_text(raw: str) -> str: + """剔除 Claude Code 注入的系统级 XML 块,还原真实用户输入。 + + 处理顺序: + 1. Slash command 优先识别 — 若检测到 ,合成"命令 + 参数" + 式标题(因为残留文本通常为空,直接取标签内容更有意义)。 + 2. 通用噪声剥离 — 移除已知白名单内的 system-reminder 等标签。 + 3. 残留 command-* 包裹清除 — 兜底去除 command-message 等次要标签。 + 4. 前后空白归一化 — 折叠连续空白为单空格,便于 30 字截断。 + """ + if not raw: + return "" + + # 阶段一: slash command 短路 + cmd = _CMD_NAME_PATTERN.search(raw) + if cmd: + name = cmd.group(1).strip() + args_match = _CMD_ARGS_PATTERN.search(raw) + args = args_match.group(1).strip() if args_match else "" + composed = f"{name} {args}".strip() if args else name + if composed: + return composed + + # 阶段二: 通用噪声剥离 + cleaned = _NOISE_TAG_PATTERN.sub("", raw) + cleaned = _CMD_WRAPPER_PATTERN.sub("", cleaned) + + # 阶段三: 空白折叠 + return re.sub(r"\s+", " ", cleaned).strip() + + +def _extract_session_title(request: CanonicalRequest) -> str: + """从规范化请求中提取首个用户消息文本作为 session 标题。 + + 跳过 Claude Code 注入的系统级 XML 块(system-reminder、user-preferences 等), + 确保标题反映用户真实输入而非高同质化的系统模板。 + """ + for part in request.messages: + if part.role != "user" or part.type != CanonicalPartType.TEXT: + continue + cleaned = _sanitize_user_text(part.text) + if cleaned: + return cleaned[:_SESSION_TITLE_MAX_LEN] + return "" + + +def _build_semantic_rejection_diagnostic(body: dict[str, Any]) -> str: + """构建语义拒绝的请求体诊断上下文. + + 在 semantic rejection 日志中附加请求体的可疑参数快照, + 用于定位供应商参数校验失败的具体祸根参数。 + + 覆盖范围: + * 模型 / messages 数(baseline) + * thinking 系列顶层参数 + history thinking_blocks 数 + * system 形态(string / blocks,含 cache_control 计数) + * tools 数量 + tool_choice 形态 + * 采样参数(max_tokens / temperature / top_p / top_k / stop_sequences) + * stream / metadata 形态 + * cache_control 存在性 + * messages.content 类型分布 + * 请求体大小估算(json.dumps 字节数) + """ + parts: list[str] = [] + + # ── 模型 + 消息数(baseline,始终输出)── + parts.append(f"model={body.get('model', 'N/A')}") + parts.append(f"messages={len(body.get('messages', []))}") + + # ── 顶层 thinking 系列参数 ── + for key in ("thinking", "extended_thinking", "reasoning_effort"): + if key in body: + val = body[key] + parts.append(f"{key}={val!r:.80}") + + # ── system 形态 ── + system = body.get("system") + if isinstance(system, str): + parts.append(f"system_kind=string(len={len(system)})") + elif isinstance(system, list): + cc_count = sum( + 1 for item in system if isinstance(item, dict) and "cache_control" in item + ) + if cc_count: + parts.append(f"system_blocks={len(system)},cc={cc_count}") + else: + parts.append(f"system_blocks={len(system)}") + + # ── tools 与 tool_choice ── + tools = body.get("tools") + if isinstance(tools, list): + parts.append(f"tools={len(tools)}") + tool_choice = body.get("tool_choice") + if tool_choice is not None: + parts.append(f"tool_choice={tool_choice!r:.60}") + + # ── 采样参数(仅存在时输出)── + for key in ("max_tokens", "temperature", "top_p", "top_k"): + if key in body: + parts.append(f"{key}={body[key]!r:.40}") + stop_sequences = body.get("stop_sequences") + if isinstance(stop_sequences, list) and stop_sequences: + parts.append(f"stop_sequences={len(stop_sequences)}") + + # ── stream / metadata ── + if "stream" in body: + parts.append(f"stream={body['stream']}") + metadata = body.get("metadata") + if isinstance(metadata, dict) and metadata: + parts.append(f"metadata_keys={len(metadata)}") + + # ── 会话历史中的 thinking blocks 与 content_types 分布 ── + thinking_count = 0 + content_type_counts: dict[str, int] = {} + for msg in body.get("messages", []): + content = msg.get("content") + if isinstance(content, str): + content_type_counts["string"] = content_type_counts.get("string", 0) + 1 + continue + if not isinstance(content, list): + continue + for block in content: + if not isinstance(block, dict): + continue + btype = block.get("type") + if isinstance(btype, str): + content_type_counts[btype] = content_type_counts.get(btype, 0) + 1 + if btype in ("thinking", "redacted_thinking"): + thinking_count += 1 + if thinking_count: + parts.append(f"thinking_blocks_in_history={thinking_count}") + if content_type_counts: + type_repr = ",".join(f"{k}:{v}" for k, v in sorted(content_type_counts.items())) + parts.append(f"content_types={{{type_repr}}}") + + # ── cache_control 存在检测(messages / tools,不含 system 因已单独统计)── + has_cc = False + sections: list[Any] = [] + for m in body.get("messages", []): + if isinstance(m.get("content"), list): + sections.append(m["content"]) + if isinstance(body.get("tools"), list): + sections.append(body["tools"]) + for section in sections: + for item in section: + if isinstance(item, dict) and "cache_control" in item: + has_cc = True + break + if has_cc: + break + if has_cc: + parts.append("cache_control_fields=present") + + # ── 请求体大小估算 ── + try: + body_bytes = len(json.dumps(body, ensure_ascii=False).encode("utf-8")) + parts.append(f"body_bytes={body_bytes}") + except (TypeError, ValueError): + # 极少数情况下 body 含非可序列化对象,跳过 + pass + + return f" [{', '.join(parts)}]" if parts else "" + + +def _build_semantic_rejection_diagnostic(body: dict[str, Any]) -> str: + """构建语义拒绝的请求体诊断上下文. + + 在 semantic rejection 日志中附加请求体的可疑参数快照, + 用于定位供应商参数校验失败的具体祸根参数。 + + 覆盖范围: + * 模型 / messages 数(baseline) + * thinking 系列顶层参数 + history thinking_blocks 数 + * system 形态(string / blocks,含 cache_control 计数) + * tools 数量 + tool_choice 形态 + * 采样参数(max_tokens / temperature / top_p / top_k / stop_sequences) + * stream / metadata 形态 + * cache_control 存在性 + * messages.content 类型分布 + * 请求体大小估算(json.dumps 字节数) + """ + parts: list[str] = [] + + # ── 模型 + 消息数(baseline,始终输出)── + parts.append(f"model={body.get('model', 'N/A')}") + parts.append(f"messages={len(body.get('messages', []))}") + + # ── 顶层 thinking 系列参数 ── + for key in ("thinking", "extended_thinking", "reasoning_effort"): + if key in body: + val = body[key] + parts.append(f"{key}={val!r:.80}") + + # ── system 形态 ── + system = body.get("system") + if isinstance(system, str): + parts.append(f"system_kind=string(len={len(system)})") + elif isinstance(system, list): + cc_count = sum( + 1 for item in system if isinstance(item, dict) and "cache_control" in item + ) + if cc_count: + parts.append(f"system_blocks={len(system)},cc={cc_count}") + else: + parts.append(f"system_blocks={len(system)}") + + # ── tools 与 tool_choice ── + tools = body.get("tools") + if isinstance(tools, list): + parts.append(f"tools={len(tools)}") + tool_choice = body.get("tool_choice") + if tool_choice is not None: + parts.append(f"tool_choice={tool_choice!r:.60}") + + # ── 采样参数(仅存在时输出)── + for key in ("max_tokens", "temperature", "top_p", "top_k"): + if key in body: + parts.append(f"{key}={body[key]!r:.40}") + stop_sequences = body.get("stop_sequences") + if isinstance(stop_sequences, list) and stop_sequences: + parts.append(f"stop_sequences={len(stop_sequences)}") + + # ── stream / metadata ── + if "stream" in body: + parts.append(f"stream={body['stream']}") + metadata = body.get("metadata") + if isinstance(metadata, dict) and metadata: + parts.append(f"metadata_keys={len(metadata)}") + + # ── 会话历史中的 thinking blocks 与 content_types 分布 ── + thinking_count = 0 + content_type_counts: dict[str, int] = {} + for msg in body.get("messages", []): + content = msg.get("content") + if isinstance(content, str): + content_type_counts["string"] = content_type_counts.get("string", 0) + 1 + continue + if not isinstance(content, list): + continue + for block in content: + if not isinstance(block, dict): + continue + btype = block.get("type") + if isinstance(btype, str): + content_type_counts[btype] = content_type_counts.get(btype, 0) + 1 + if btype in ("thinking", "redacted_thinking"): + thinking_count += 1 + if thinking_count: + parts.append(f"thinking_blocks_in_history={thinking_count}") + if content_type_counts: + type_repr = ",".join(f"{k}:{v}" for k, v in sorted(content_type_counts.items())) + parts.append(f"content_types={{{type_repr}}}") + + # ── cache_control 存在检测(messages / tools,不含 system 因已单独统计)── + has_cc = False + sections: list[Any] = [] + for m in body.get("messages", []): + if isinstance(m.get("content"), list): + sections.append(m["content"]) + if isinstance(body.get("tools"), list): + sections.append(body["tools"]) + for section in sections: + for item in section: + if isinstance(item, dict) and "cache_control" in item: + has_cc = True + break + if has_cc: + break + if has_cc: + parts.append("cache_control_fields=present") + + # ── 请求体大小估算 ── + try: + body_bytes = len(json.dumps(body, ensure_ascii=False).encode("utf-8")) + parts.append(f"body_bytes={body_bytes}") + except (TypeError, ValueError): + # 极少数情况下 body 含非可序列化对象,跳过 + pass + + return f" [{', '.join(parts)}]" if parts else "" + def _log_http_error_detail( tier_name: str, @@ -341,10 +653,16 @@ async def execute_stream( failed_tier_name: str | None = None request_caps = build_request_capabilities(body) canonical_request = build_canonical_request(body, headers) - session_record = await self._session_mgr.get_or_create_record( + session_record, is_new_session = await self._session_mgr.get_or_create_record( canonical_request.session_key, canonical_request.trace_id, ) + if is_new_session: + title = _extract_session_title(canonical_request) + if title: + await self._recorder.set_session_title( + canonical_request.session_key, title + ) incompatible_reasons: list[str] = [] effective_tiers = self._resolve_effective_tiers(canonical_request.session_key) last_idx = len(effective_tiers) - 1 @@ -512,10 +830,16 @@ async def execute_message( failed_tier_name: str | None = None request_caps = build_request_capabilities(body) canonical_request = build_canonical_request(body, headers) - session_record = await self._session_mgr.get_or_create_record( + session_record, is_new_session = await self._session_mgr.get_or_create_record( canonical_request.session_key, canonical_request.trace_id, ) + if is_new_session: + title = _extract_session_title(canonical_request) + if title: + await self._recorder.set_session_title( + canonical_request.session_key, title + ) incompatible_reasons: list[str] = [] effective_tiers = self._resolve_effective_tiers(canonical_request.session_key) last_idx = len(effective_tiers) - 1 @@ -601,10 +925,17 @@ async def execute_message( ) if not is_last and is_semantic: + diagnostic = _build_semantic_rejection_diagnostic(body) + # zhipu 等供应商的错误体含字段级诊断(如 [1210] 错误码 + request_id), + # 500 字符足以覆盖完整错误体,避免截断丢失关键细节 + err_msg = (resp.error_message or "N/A")[:500] logger.warning( - "Tier %s semantic rejection (%s), trying next tier without recording failure", + "Tier %s semantic rejection (type=%s, msg=%s)%s, " + "trying next tier without recording failure", tier.name, resp.error_type or resp.status_code, + err_msg, + diagnostic, ) failed_tier_name = tier.name continue @@ -836,6 +1167,20 @@ async def _handle_http_error( ) if semantic_rejection and not is_last: + if request_body is not None: + diagnostic = _build_semantic_rejection_diagnostic(request_body) + stream_err_msg = ( + error.get("message") if isinstance(error, dict) else "N/A" + ) + # 扩展至 500 字符以保留完整字段级诊断信息 + logger.warning( + "Tier %s stream semantic rejection (type=%s, msg=%s)%s, " + "trying next tier without recording failure", + tier.name, + error.get("type") if isinstance(error, dict) else None, + stream_err_msg[:500], + diagnostic, + ) return True, tier.name, exc rl_info = parse_rate_limit_headers( diff --git a/src/coding/proxy/routing/session_manager.py b/src/coding/proxy/routing/session_manager.py index 845ac87..aaef0ba 100644 --- a/src/coding/proxy/routing/session_manager.py +++ b/src/coding/proxy/routing/session_manager.py @@ -19,13 +19,18 @@ def __init__(self, compat_session_store: CompatSessionStore | None = None) -> No async def get_or_create_record( self, session_key: str, trace_id: str - ) -> CompatSessionRecord | None: + ) -> tuple[CompatSessionRecord | None, bool]: + """获取或创建兼容性会话记录. + + Returns: + (record, is_new) — is_new 为 True 表示本次创建的新会话。 + """ if self._store is None: - return None + return None, False record = await self._store.get(session_key) if record is not None: - return record - return CompatSessionRecord(session_key=session_key, trace_id=trace_id) + return record, False + return CompatSessionRecord(session_key=session_key, trace_id=trace_id), True def apply_compat_context( self, diff --git a/src/coding/proxy/routing/usage_recorder.py b/src/coding/proxy/routing/usage_recorder.py index 525a6c1..8887c09 100644 --- a/src/coding/proxy/routing/usage_recorder.py +++ b/src/coding/proxy/routing/usage_recorder.py @@ -28,6 +28,11 @@ def __init__( def set_pricing_table(self, table: PricingTable) -> None: self._pricing_table = table + async def set_session_title(self, session_key: str, title: str) -> None: + """为新 session 设置标题(委托给 TokenLogger).""" + if self._token_logger: + await self._token_logger.set_session_title(session_key, title) + # ── 用量信息构建 ────────────────────────────────────── @staticmethod diff --git a/src/coding/proxy/server/dashboard.py b/src/coding/proxy/server/dashboard.py index 07bd6a3..75dd812 100644 --- a/src/coding/proxy/server/dashboard.py +++ b/src/coding/proxy/server/dashboard.py @@ -411,6 +411,7 @@ def _build_favicon() -> bytes: .session-table td.cell-tags { white-space: normal; overflow: visible; text-overflow: clip; line-height: 1.8; vertical-align: middle; } .session-table tr:hover td { background: var(--bg-card-hover); } .session-table .session-key { font-family: 'JetBrains Mono', monospace; font-size: 12px; color: var(--accent-blue); cursor: default; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; } + .session-table .session-title { font-size: 12px; color: var(--text-secondary); white-space: nowrap; overflow: hidden; text-overflow: ellipsis; max-width: 0; } .session-id { display: flex; align-items: center; gap: 4px; } .session-id-text { overflow: hidden; text-overflow: ellipsis; } .copy-btn { background: none; border: none; color: var(--text-tertiary); cursor: pointer; padding: 2px; border-radius: 4px; font-size: 12px; line-height: 1; opacity: .5; flex-shrink: 0; } @@ -556,6 +557,126 @@ def _build_favicon() -> bytes: .tab-btn:focus-visible { outline: 2px solid var(--accent-blue); outline-offset: 2px; } .tab-pane { display: none; } .tab-pane.active { display: block; } + + /* ── Model Calling 实时状态 ────────────────────────── */ + .model-calling-card { + margin-bottom: 5px; + } + .mc-empty { + text-align: center; + color: var(--text-muted); + padding: 16px 0; + font-size: 13px; + } + .mc-grid { + display: grid; + grid-template-columns: repeat(auto-fill, minmax(320px, 1fr)); + gap: 8px; + } + .mc-model-row { + display: flex; + align-items: center; + gap: 10px; + padding: 8px 12px; + background: var(--bg-secondary); + border-radius: var(--radius-sm); + border: 1px solid var(--border-subtle); + } + .mc-model-name { + font-family: 'JetBrains Mono', monospace; + font-size: 12px; + color: var(--text-primary); + min-width: 140px; + white-space: nowrap; + overflow: hidden; + text-overflow: ellipsis; + } + .mc-bar-wrap { + flex: 1; + min-width: 60px; + height: 6px; + background: rgba(255,255,255,.06); + border-radius: 3px; + overflow: hidden; + } + .mc-bar-fill { + height: 100%; + border-radius: 3px; + transition: width .3s ease, background .3s ease; + } + .mc-bar-fill.mc-low { background: var(--accent-green); } + .mc-bar-fill.mc-mid { background: var(--accent-yellow); } + .mc-bar-fill.mc-high { background: var(--accent-red); } + .mc-stats { + display: flex; + align-items: center; + gap: 6px; + font-size: 11px; + font-family: 'JetBrains Mono', monospace; + color: var(--text-muted); + white-space: nowrap; + } + .mc-badge { + display: inline-flex; + align-items: center; + padding: 1px 6px; + border-radius: 4px; + font-size: 10px; + font-weight: 600; + font-family: 'JetBrains Mono', monospace; + } + .mc-badge-pending { + background: rgba(251,146,60,.15); + color: #fb923c; + } + .mc-badge-active { + background: rgba(74,222,128,.12); + color: #4ade80; + } + .mc-vendor-tag { + font-size: 10px; + color: var(--text-muted); + background: rgba(255,255,255,.06); + padding: 1px 6px; + border-radius: 3px; + } + .mc-limit-editable { + cursor: pointer; + border-bottom: 1px dashed rgba(74,222,128,.4); + transition: border-color .2s, color .2s; + } + .mc-limit-editable:hover { + border-bottom-color: #4ade80; + color: #4ade80; + } + .mc-limit-input { + width: 36px; + background: var(--bg-primary); + border: 1px solid var(--accent-blue); + border-radius: 3px; + color: var(--text-primary); + font-size: 10px; + font-family: 'JetBrains Mono', monospace; + text-align: center; + padding: 0 2px; + outline: none; + -moz-appearance: textfield; + } + .mc-limit-input::-webkit-outer-spin-button, + .mc-limit-input::-webkit-inner-spin-button { + -webkit-appearance: none; + margin: 0; + } + .mc-limit-flash-ok { animation: mc-flash-ok .6s ease; } + .mc-limit-flash-err { animation: mc-flash-err .6s ease; } + @keyframes mc-flash-ok { + 0%,100% { color: inherit; } + 40% { color: #4ade80; } + } + @keyframes mc-flash-err { + 0%,100% { color: inherit; } + 40% { color: #f87171; } + } @@ -625,6 +746,14 @@ def _build_favicon() -> bytes: + +
+
📡 Model Calling 实时状态
+
+
加载中…
+
+
+
@@ -676,20 +805,22 @@ def _build_favicon() -> bytes:
- - + + + + + + - - - - - - + + + + @@ -702,7 +833,7 @@ def _build_favicon() -> bytes: - +
Session IDTitle Last Active Requests Tokens
Loading...
Loading...
@@ -1131,6 +1262,148 @@ def _build_favicon() -> bytes: }).join(''); } +// ── Model Calling 实时状态 ──────────────────────────────── +function updateModelCalling(status) { + var wrap = document.getElementById('model-calling-wrap'); + if (!wrap) return; + var tiers = status.tiers || []; + + // 收集所有带 concurrency 诊断的模型 + var models = []; + for (var i = 0; i < tiers.length; i++) { + var tier = tiers[i]; + var diag = tier.diagnostics || {}; + var conc = diag.concurrency; + if (!conc) continue; + var names = Object.keys(conc); + for (var j = 0; j < names.length; j++) { + var model = names[j]; + var d = conc[model]; + models.push({ + vendor: tier.name, + model: model, + limit: d.limit || 0, + in_use: d.in_use || 0, + available: d.available || 0, + pending: d.pending || 0, + }); + } + } + + if (!models.length) { + wrap.innerHTML = '
无活跃模型调用
'; + return; + } + + var html = '
'; + for (var k = 0; k < models.length; k++) { + var m = models[k]; + var pct = m.limit > 0 ? Math.round((m.in_use / m.limit) * 100) : 0; + var barClass = pct <= 50 ? 'mc-low' : (pct <= 80 ? 'mc-mid' : 'mc-high'); + + html += '
' + + '' + escapeHtml(m.vendor + '/' + m.model) + '' + + '
' + + '
' + + '' + m.in_use + + '/' + m.limit + '' + + (m.pending > 0 ? '⏳ ' + m.pending + '' : '') + + '
' + + '
'; + } + html += '
'; + wrap.innerHTML = html; +} + +// Model Calling 独立短间隔轮询 +var _mcTimer = null; +function startModelCallingPoll() { + stopModelCallingPoll(); + function tick() { + fetchJSON('/api/status').then(function(status) { + updateModelCalling(status); + }).catch(function() {}); + } + tick(); + _mcTimer = setInterval(tick, 5000); +} +function stopModelCallingPoll() { + if (_mcTimer) { clearInterval(_mcTimer); _mcTimer = null; } +} + +// ── 并行度运行时编辑 ────────────────────────────────────── +var _mcEditing = false; +document.addEventListener('click', function(e) { + if (_mcEditing) return; + var el = e.target.closest('.mc-limit-editable'); + if (!el) return; + e.preventDefault(); + _mcEditing = true; + var oldVal = el.getAttribute('data-limit'); + var tier = el.getAttribute('data-tier'); + var model = el.getAttribute('data-model'); + var input = document.createElement('input'); + input.type = 'number'; + input.className = 'mc-limit-input'; + input.min = '1'; + input.max = '20'; + input.value = oldVal; + el.style.display = 'none'; + el.parentNode.insertBefore(input, el.nextSibling); + input.focus(); + input.select(); + + var _cancelled = false; + + function restore() { + _mcEditing = false; + if (input.parentNode) input.parentNode.removeChild(input); + el.style.display = ''; + } + + function flash(cls) { + el.classList.add(cls); + setTimeout(function() { el.classList.remove(cls); }, 600); + } + + input.addEventListener('keydown', function(ev) { + if (ev.key === 'Escape') { _cancelled = true; restore(); return; } + if (ev.key !== 'Enter') return; + ev.preventDefault(); + submit(); + }); + + input.addEventListener('blur', function() { + setTimeout(function() { if (!_cancelled) submit(); }, 50); + }); + + function submit() { + if (_cancelled) return; + var v = parseInt(input.value, 10); + if (isNaN(v) || v < 1 || v > 20) { restore(); flash('mc-limit-flash-err'); return; } + if (String(v) === oldVal) { restore(); return; } + fetch('/api/concurrency', { + method: 'PUT', + headers: {'Content-Type': 'application/json'}, + body: JSON.stringify({tier: tier, model: model, limit: v}) + }).then(function(res) { + if (res.ok) { + return res.json().then(function() { + el.textContent = v; + el.setAttribute('data-limit', v); + flash('mc-limit-flash-ok'); + }); + } else { + flash('mc-limit-flash-err'); + } + }).catch(function() { + flash('mc-limit-flash-err'); + }).finally(function() { + restore(); + }); + } +}); + // ── 按 tiers 顺序排序 vendor 列表 ───────────────────────── function sortByTierOrder(vendors, tierOrder) { if (!tierOrder || !tierOrder.length) return vendors.sort(); @@ -1573,7 +1846,7 @@ def _build_favicon() -> bytes: var tbody = document.getElementById('sessions-tbody'); if (!total) { - tbody.innerHTML = '
📭
No session data'; + tbody.innerHTML = '
📭
No session data'; } else { tbody.innerHTML = page.map(function(s) { var parsed = parseSessionKey(s.session_key); @@ -1582,6 +1855,7 @@ def _build_favicon() -> bytes: var modelsFull = (s.models || '').split(',').map(function(c){return c.trim();}); var vendorsFull = (s.vendors || '').split(',').map(function(v){return formatVendorLabel(v.trim());}); var sr = s.success_rate != null ? Math.round(s.success_rate) : null; + var sessionTitle = s.title || ''; return '' + '' + '
' + @@ -1592,6 +1866,7 @@ def _build_favicon() -> bytes: 'dev:' + escapeHtml(shortId(parsed.device_id, 8)) + ' · acct:' + escapeHtml(shortId(parsed.account_uuid, 8)) + '
' + '' + + '' + (sessionTitle ? escapeHtml(sessionTitle) : '–') + '' + '' + relativeTime(s.last_active_ts) + '' + '' + fmtNum(s.total_requests) + '' + '' + fmtTokens(s.total_tokens) + '' + @@ -1602,9 +1877,10 @@ def _build_favicon() -> bytes: '' + selectHtml + '' + '' + formatCategories(s.client_categories) + '' + '' + - '
' + + '
' + '
' + '
Session ID
' + escapeHtml(parsed.session_id || s.session_key) + '
' + + '
Title
' + (sessionTitle ? escapeHtml(sessionTitle) : '–') + '
' + '
Device
' + (parsed.device_id ? escapeHtml(parsed.device_id) : '–') + '
' + '
Account
' + (parsed.account_uuid ? escapeHtml(parsed.account_uuid) : '–') + '
' + '
' + @@ -1707,6 +1983,7 @@ def _build_favicon() -> bytes: updateKPI(summary); updateVendorStatus(status); + updateModelCalling(status); updateChartTitles(days); const rows = timeline.rows || []; @@ -1782,6 +2059,8 @@ def _build_favicon() -> bytes: currentTab = name; applyTabState(name); syncTabUrl(name); + // Model Calling 轮询随页签切换启停 + if (name === 'overview') { startModelCallingPoll(); } else { stopModelCallingPoll(); } refresh(); } @@ -1801,6 +2080,7 @@ def _build_favicon() -> bytes: }).catch(function(){}); refresh(); // 仅加载初始页签的数据 setInterval(refresh, 600000); // 每 10 分钟刷新当前页签 + if (initial === 'overview') startModelCallingPoll(); })(); diff --git a/src/coding/proxy/server/factory.py b/src/coding/proxy/server/factory.py index a1f64a3..4e7632d 100644 --- a/src/coding/proxy/server/factory.py +++ b/src/coding/proxy/server/factory.py @@ -156,13 +156,17 @@ def _create_vendor_from_config( cfg = _resolve_antigravity_credentials(cfg, token_store) return AntigravityVendor(cfg, failover_cfg, mapper) case "zhipu": - cfg = ZhipuConfig( - enabled=vendor_cfg.enabled, - base_url=vendor_cfg.base_url + zhipu_kwargs: dict[str, Any] = { + "enabled": vendor_cfg.enabled, + "base_url": vendor_cfg.base_url or "https://open.bigmodel.cn/api/anthropic", - api_key=vendor_cfg.api_key, - timeout_ms=vendor_cfg.timeout_ms, - ) + "api_key": vendor_cfg.api_key, + "timeout_ms": vendor_cfg.timeout_ms, + } + # 仅当显式配置了 concurrency 时转发,否则使用 ZhipuConfig 默认值 + if vendor_cfg.concurrency is not None: + zhipu_kwargs["concurrency"] = vendor_cfg.concurrency + cfg = ZhipuConfig(**zhipu_kwargs) return ZhipuVendor(cfg, mapper, failover_cfg) case "minimax": cfg = MinimaxConfig( diff --git a/src/coding/proxy/server/routes.py b/src/coding/proxy/server/routes.py index 7f157f0..7c13d2f 100644 --- a/src/coding/proxy/server/routes.py +++ b/src/coding/proxy/server/routes.py @@ -150,14 +150,15 @@ async def count_tokens(request: Request) -> Response: source = infer_source_vendor_from_body(body) if source: - channel_fn = get_transition_channel(source, target_vendor.name) + target_name = target_vendor.get_name() + channel_fn = get_transition_channel(source, target_name) if channel_fn is not None: body, adaptations = channel_fn(body) if adaptations: logger.debug( "count_tokens channel %s → %s: %s", source, - target_vendor.name, + target_name, ", ".join(adaptations), ) @@ -224,6 +225,61 @@ async def status() -> dict: return result +def register_concurrency_route(app: Any, router: Any) -> None: + """注册运行时并发限制调整路由.""" + + @app.put("/api/concurrency") + async def update_concurrency(request: Request) -> Response: + try: + body = await request.json() + except Exception: + return json_error_response( + 400, error_type="invalid_request_error", message="body must be JSON" + ) + tier_name = body.get("tier") + model = body.get("model") + limit = body.get("limit") + if not tier_name or not model or limit is None: + return json_error_response( + 400, + error_type="invalid_request_error", + message="requires tier, model, limit", + ) + if not isinstance(limit, int) or limit < 1 or limit > 20: + return json_error_response( + 400, + error_type="invalid_request_error", + message="limit must be an integer between 1 and 20", + ) + for tier in router.tiers: + if tier.name == tier_name: + vendor = tier.vendor + update_fn = getattr(vendor, "update_concurrency", None) + if update_fn is None: + return json_error_response( + 400, + error_type="invalid_request_error", + message=f"vendor '{tier_name}' does not support concurrency", + ) + try: + update_fn(model, limit) + except (ValueError, AttributeError) as exc: + return json_error_response( + 400, error_type="invalid_request_error", message=str(exc) + ) + return Response( + content=json.dumps( + {"ok": True, "tier": tier_name, "model": model, "limit": limit}, + ensure_ascii=False, + ).encode(), + status_code=200, + media_type="application/json", + ) + return json_error_response( + 404, error_type="not_found", message=f"tier '{tier_name}' not found" + ) + + def register_copilot_routes(app: Any, router: Any) -> None: """注册 Copilot 诊断与模型探测路由.""" from .factory import _find_copilot_vendor @@ -456,6 +512,7 @@ def register_all_routes( register_core_routes(app, router) register_health_routes(app) register_status_route(app, router) + register_concurrency_route(app, router) register_copilot_routes(app, router) register_admin_routes(app, router) register_session_vendor_routes(app, router) diff --git a/src/coding/proxy/vendors/antigravity.py b/src/coding/proxy/vendors/antigravity.py index b9bbfb5..b4d7199 100644 --- a/src/coding/proxy/vendors/antigravity.py +++ b/src/coding/proxy/vendors/antigravity.py @@ -141,7 +141,14 @@ def __init__( config.refresh_token, ) TokenBackendMixin.__init__(self, token_manager) - BaseVendor.__init__(self, config.base_url, config.timeout_ms, failover_config) + # v1internal 模式:base_url 需要去除 /v1internal 路径后缀, + # 因为 endpoint 使用完整路径 /v1internal:generateContent(冒号格式)。 + # httpx 会将 base_url path 与 endpoint path 拼接, + # 如果 base_url 含 /v1internal 会导致路径重复。 + init_base_url = config.base_url + if init_base_url.rstrip("/").endswith("/v1internal"): + init_base_url = init_base_url.rstrip("/").removesuffix("/v1internal") + BaseVendor.__init__(self, init_base_url, config.timeout_ms, failover_config) self._model_endpoint = config.model_endpoint self._model_mapper = model_mapper self._default_model = config.model_endpoint.removeprefix("models/") @@ -149,6 +156,7 @@ def __init__( self._safety_settings = config.safety_settings # v1internal 协议字段 self._project_id: str = config.project_id + self._v1internal_enabled: bool = "v1internal" in config.base_url self._session_id: str = uuid.uuid4().hex[:16] self._message_count: int = 0 # project_id 自动发现状态 @@ -159,8 +167,11 @@ def get_name(self) -> str: return "antigravity" def _is_v1internal_mode(self) -> bool: - """检测是否启用 v1internal 协议模式(与 Antigravity-Manager 对齐).""" - return bool(self._effective_project_id) and "v1internal" in self._base_url + """检测是否启用 v1internal 协议模式(与 Antigravity-Manager 对齐). + + v1internal 协议由原始配置的 base_url 路径或 project_id 自动发现触发。 + """ + return self._v1internal_enabled @property def _effective_project_id(self) -> str: @@ -229,7 +240,11 @@ async def _discover_project_id(self, access_token: str) -> str: return "" # 发现成功:原子性切换到 v1internal 模式 - self._base_url = _V1INTERNAL_BASE_URL + # base_url 只保留域名部分(去除 /v1internal 路径后缀) + self._base_url = _V1INTERNAL_BASE_URL.rstrip("/").removesuffix( + "/v1internal" + ) + self._v1internal_enabled = True self._project_id_discovered = project_id # 重建 HTTP 客户端(base_url 是初始化参数) @@ -339,8 +354,13 @@ async def _prepare_request( self._last_request_adaptations = converted.adaptations token = await self._token_manager.get_token() - # 懒加载:未配置 project_id 时自动发现并切换 v1internal 模式 - if not self._project_id and not self._project_discovery_attempted: + # 懒加载:未配置 project_id 时尝试自动发现(仅标准 GLA 模式需要) + # v1internal 模式不依赖 project_id,跳过发现 + if ( + not self._project_id + and not self._project_discovery_attempted + and not self._v1internal_enabled + ): discovered = await self._discover_project_id(token) if discovered: logger.info( @@ -450,11 +470,11 @@ async def send_message( body, prepared_headers = await self._prepare_request(request_body, headers) client = self._get_client() resolved_model = self._last_resolved_model - endpoint = ( - ":generateContent" - if self._is_v1internal_mode() - else f"/models/{resolved_model}:generateContent" - ) + if self._is_v1internal_mode(): + # v1internal 端点需要完整路径(冒号格式)覆盖 base_url 的 path 部分 + endpoint = "/v1internal:generateContent" + else: + endpoint = f"/models/{resolved_model}:generateContent" logger.debug("send_message: POST %s", endpoint) response = await client.post(endpoint, json=body, headers=prepared_headers) @@ -496,11 +516,10 @@ async def send_message_stream( body, prepared_headers = await self._prepare_request(request_body, headers) client = self._get_client() resolved_model = self._last_resolved_model - endpoint = ( - ":streamGenerateContent?alt=sse" - if self._is_v1internal_mode() - else f"/models/{resolved_model}:streamGenerateContent?alt=sse" - ) + if self._is_v1internal_mode(): + endpoint = "/v1internal:streamGenerateContent?alt=sse" + else: + endpoint = f"/models/{resolved_model}:streamGenerateContent?alt=sse" logger.debug("send_message_stream: POST %s", endpoint) diff --git a/src/coding/proxy/vendors/concurrency.py b/src/coding/proxy/vendors/concurrency.py new file mode 100644 index 0000000..7944bdd --- /dev/null +++ b/src/coding/proxy/vendors/concurrency.py @@ -0,0 +1,162 @@ +"""每模型并发限制器 — 支持运行时动态调整的公平排队. + +为每个映射后的模型(如 ``glm-5v-turbo``)独立维护一个 ``_ConcurrencySlot`, +确保同一时间点该模型的并行请求数不超过配置的上限。当所有槽位被占满时, +新请求按 FIFO 顺序排队等待,直到有槽位释放。 + +设计要点: + - **惰性创建**:仅在首次请求到达时才为该模型创建 Slot,避免冷启动开销 + - **FIFO 公平**:``asyncio.Event`` + while 循环天然满足 FIFO 排队语义 + - **动态调整**:支持运行时修改 per-model limit,无需重启进程 + - **按映射后模型名键控**:与上游真实承载能力对齐,而非按客户端请求名 +""" + +from __future__ import annotations + +import asyncio +import logging + +from ..config.vendors import ZhipuConcurrencyConfig + +logger = logging.getLogger(__name__) + + +class _ConcurrencySlot: + """支持动态 limit 的并发槽位. + + 使用 ``asyncio.Event`` 作为等待/通知原语,在 ``acquire`` 中 await 等待, + 在 ``release`` / ``set_limit`` 中唤醒。``set_limit`` 修改上限后立即唤醒 + 所有等待者,由它们重新判断是否可获得槽位。 + """ + + def __init__(self, limit: int) -> None: + self._limit = limit + self._in_use: int = 0 + self._pending: int = 0 + self._wake = asyncio.Event() + self._wake.set() + + async def acquire(self) -> _ConcurrencySlot: + """获取一个并发槽位,必要时阻塞排队. + + 返回 ``self``,调用方在请求完成后调用 ``release()``。 + """ + # Fast path + if self._in_use < self._limit: + self._in_use += 1 + return self + # Slow path — 等待槽位释放 + self._pending += 1 + try: + while True: + self._wake.clear() + await self._wake.wait() + if self._in_use < self._limit: + self._in_use += 1 + return self + finally: + self._pending -= 1 + + def release(self) -> None: + """释放一个并发槽位.""" + self._in_use = max(0, self._in_use - 1) + self._wake.set() + + def set_limit(self, new_limit: int) -> None: + """动态调整并发上限. + + 增大 limit 时立即唤醒等待者;缩小时已持有的槽位不受影响, + 新 limit 在后续 acquire 中自然生效。 + """ + self._limit = new_limit + self._wake.set() + + @property + def limit(self) -> int: + return self._limit + + @property + def in_use(self) -> int: + return self._in_use + + @property + def available(self) -> int: + return max(0, self._limit - self._in_use) + + @property + def pending(self) -> int: + return self._pending + + +class ModelConcurrencyLimiter: + """按模型名提供独立并发槽位的限制器. + + 用法:: + + limiter = ModelConcurrencyLimiter(config) + slot = await limiter.acquire("glm-5v-turbo") + try: + ... # 执行请求 + finally: + slot.release() + """ + + def __init__(self, config: ZhipuConcurrencyConfig) -> None: + self._config = config + self._slots: dict[str, _ConcurrencySlot] = {} + + def _get_or_create_slot(self, model: str) -> _ConcurrencySlot: + """获取(或惰性创建)指定模型的并发槽位.""" + slot = self._slots.get(model) + if slot is None: + limit = self._config.get_limit(model) + slot = _ConcurrencySlot(limit) + self._slots[model] = slot + logger.debug( + "ModelConcurrencyLimiter: created slot model=%s limit=%d", + model, + limit, + ) + return slot + + async def acquire(self, model: str) -> _ConcurrencySlot: + """获取指定模型的并发槽位,必要时阻塞排队. + + 返回已获取的 Slot 实例,调用方负责在请求完成后调用 ``release()``。 + """ + slot = self._get_or_create_slot(model) + await slot.acquire() + return slot + + def set_limit(self, model: str, new_limit: int) -> None: + """运行时修改指定模型的并发上限. + + 同时更新 config.models 以确保后续惰性创建使用新值。 + """ + slot = self._slots.get(model) + if slot is None: + slot = _ConcurrencySlot(new_limit) + self._slots[model] = slot + else: + slot.set_limit(new_limit) + self._config.models[model] = new_limit + logger.info( + "ModelConcurrencyLimiter: updated limit model=%s new_limit=%d", + model, + new_limit, + ) + + def get_diagnostics(self) -> dict[str, dict[str, int]]: + """返回每个模型的并发状态快照(用于可观测性).""" + snapshot: dict[str, dict[str, int]] = {} + for model, slot in self._slots.items(): + snapshot[model] = { + "limit": slot.limit, + "in_use": slot.in_use, + "available": slot.available, + "pending": slot.pending, + } + return snapshot + + +__all__ = ["ModelConcurrencyLimiter"] diff --git a/src/coding/proxy/vendors/zhipu.py b/src/coding/proxy/vendors/zhipu.py index 528cabf..64407ba 100644 --- a/src/coding/proxy/vendors/zhipu.py +++ b/src/coding/proxy/vendors/zhipu.py @@ -1,23 +1,64 @@ -"""智谱 GLM 供应商 — 原生 Anthropic 兼容端点薄透传代理. +"""智谱 GLM 供应商 — 原生 Anthropic 兼容端点代理(兼容转换 + 429 重试). -官方端点 (https://open.bigmodel.cn/api/anthropic) 已完整支持 -Anthropic Messages API 协议,本模块仅做两项最小适配: +官方端点 (https://open.bigmodel.cn/api/anthropic) 支持大部分 +Anthropic Messages API 协议,本模块做以下适配: 1. 模型名映射(Claude -> GLM) 2. 认证头替换(x-api-key) + 3. 首选 tier 参数兼容转换(_prepare_request) + +实测验证 GLM 对 Anthropic 扩展参数的处理方式: +- thinking.type="enabled":原生支持(GLM 有自己的 thinking 机制) +- thinking.type="adaptive":不支持,触发 [1210] 参数错误 → 转换为 enabled + budget +- cache_control 字段:静默忽略(GLM 使用隐式自动缓存) +- reasoning_effort 参数:静默忽略 +- metadata 字段:暂不处理(待进一步诊断确认兼容性) + +额外提供 429 Rate Limit 专用重试挽回机制: + - max_attempt = 5(1 初始 + 4 重试) + - 指数退避 + Full Jitter(1s → 2s → 4s → 8s) + - 优先尊重 server retry-after header """ from __future__ import annotations +import asyncio +import json +import logging +from collections.abc import AsyncIterator +from typing import Any + +import httpx + from ..config.schema import FailoverConfig, ZhipuConfig from ..routing.model_mapper import ModelMapper +from ..routing.rate_limit import ( + compute_effective_retry_seconds, + parse_rate_limit_headers, +) +from ..routing.retry import RetryConfig, calculate_delay +from .base import VendorResponse +from .concurrency import ModelConcurrencyLimiter from .native_anthropic import NativeAnthropicVendor +logger = logging.getLogger(__name__) + +# 429 Rate Limit 重试默认配置 +_RATE_LIMIT_RETRY = RetryConfig( + max_retries=4, # 4 次重试 + 1 次初始 = 5 总尝试 + initial_delay_ms=1000, + max_delay_ms=30000, + backoff_multiplier=2.0, + jitter=True, +) + class ZhipuVendor(NativeAnthropicVendor): - """智谱 GLM 原生 Anthropic 兼容端点供应商(薄透传). + """智谱 GLM 原生 Anthropic 兼容端点供应商(薄透传 + 429 重试挽回). 通过官方 /api/anthropic 端点转发请求, 仅替换模型名和认证头,其余原样透传。 + + 429 Rate Limit 时自动重试(指数退避),降低 failover 频率。 """ _vendor_name = "zhipu" @@ -30,7 +71,269 @@ def __init__( failover_config: FailoverConfig | None = None, ) -> None: super().__init__(config, model_mapper, failover_config) + self._rl_retry = _RATE_LIMIT_RETRY + # 每模型并发限制器(config.concurrency 为 None 时禁用) + self._concurrency_limiter: ModelConcurrencyLimiter | None = ( + ModelConcurrencyLimiter(config.concurrency) + if config.concurrency is not None + else None + ) + + # ── 首选 tier 参数兼容转换 ──────────────────────────────── + + # adaptive thinking → enabled 的默认预算(Anthropic 推荐的 adaptive 等价值) + _ADAPTIVE_THINKING_BUDGET = 16000 + + async def _prepare_request( + self, + request_body: dict[str, Any], + headers: dict[str, Any], + ) -> tuple[dict[str, Any], dict[str, str]]: + """深拷贝 + 模型映射 + 认证头替换 + GLM 兼容转换. + + 当 zhipu 作为首选 tier 时(source_vendor=None),请求体来自原始客户端, + 不经过跨供应商转换通道。此处对已知的 GLM 不兼容参数做兼容转换(而非移除), + 保留完整的 CC (Claude Code) 功能特性。 + """ + body, new_headers = await super()._prepare_request(request_body, headers) + + adaptations: list[str] = [] + + # thinking.type="adaptive" 是 Anthropic Claude 4.x 新增的类型, + # GLM 不支持此类型值,会触发 [1210] 参数错误。 + # 转换为 enabled + budget 保留 thinking 能力。 + thinking = body.get("thinking") + if isinstance(thinking, dict) and thinking.get("type") == "adaptive": + body["thinking"] = { + "type": "enabled", + "budget_tokens": self._ADAPTIVE_THINKING_BUDGET, + } + adaptations.append( + f"converted_thinking_adaptive→enabled" + f"(budget={self._ADAPTIVE_THINKING_BUDGET})" + ) + + if adaptations: + logger.debug( + "ZhipuVendor first-tier compat: %s%s", + ", ".join(adaptations), + _build_zhipu_request_snapshot(body), + ) + + return body, new_headers + + # ── 非流式:429 重试 ──────────────────────────────────── + + async def send_message( + self, + request_body: dict[str, Any], + headers: dict[str, str], + ) -> VendorResponse: + """非流式请求,429 时自动重试. + + 在 429 重试循环外层套上每模型并发槽位获取,确保同一时间点同一模型的 + 在途请求数不超过配置上限;超过时新请求 FIFO 排队等待。 + """ + sem = await self._maybe_acquire_concurrency_slot(request_body) + try: + return await self._send_message_with_retry(request_body, headers) + finally: + if sem is not None: + sem.release() + + async def _send_message_with_retry( + self, + request_body: dict[str, Any], + headers: dict[str, str], + ) -> VendorResponse: + """原 send_message 主体逻辑(不含并发控制).""" + max_attempts = self._rl_retry.max_attempts + + for attempt in range(max_attempts): + resp = await super().send_message(request_body, headers) + if resp.status_code != 429: + return resp + + if attempt == max_attempts - 1: + logger.warning( + "Zhipu 429 rate limit exhausted after %d attempts", + max_attempts, + ) + return resp + + delay = self._compute_retry_delay_from_headers( + resp.response_headers, attempt + ) + logger.info( + "Zhipu 429 rate limit, retry %d/%d in %.1fms", + attempt + 1, + max_attempts - 1, + delay, + ) + await asyncio.sleep(delay / 1000.0) + + return resp # pragma: no cover + + # ── 流式:429 重试 ────────────────────────────────────── + + async def send_message_stream( + self, + request_body: dict[str, Any], + headers: dict[str, str], + ) -> AsyncIterator[bytes]: + """流式请求,429 时自动重试. + + 安全性:429 在 BaseVendor.send_message_stream 中于 + status code 检查阶段即 raise(在任何 chunk yield 之前), + 因此重试不会导致已发出数据不一致。 + + 在 429 重试循环外层套上每模型并发槽位获取,确保流式请求与非流式请求 + 共用同一信号量,统一限制同一模型的总在途并发数。 + """ + sem = await self._maybe_acquire_concurrency_slot(request_body) + max_attempts = self._rl_retry.max_attempts + + try: + for attempt in range(max_attempts): + try: + # 429 在 status code 检查阶段即 raise(在任何 chunk 之前), + # 因此 __anext__ 安全:要么拿到首个 chunk,要么抛异常。 + ait = super().send_message_stream(request_body, headers) + head = await ait.__anext__() + except StopAsyncIteration: + return + except httpx.HTTPStatusError as exc: + if exc.response is None or exc.response.status_code != 429: + raise + if attempt == max_attempts - 1: + logger.warning( + "Zhipu 429 stream rate limit exhausted after %d attempts", + max_attempts, + ) + raise + + delay = self._compute_retry_delay_from_response( + exc.response, attempt + ) + logger.info( + "Zhipu 429 stream rate limit, retry %d/%d in %.1fms", + attempt + 1, + max_attempts - 1, + delay, + ) + await asyncio.sleep(delay / 1000.0) + continue + + # yield 在 try/except 之外,避免捕获外部 athrow 的异常 + yield head + async for chunk in ait: + yield chunk + return + finally: + if sem is not None: + sem.release() + + # ── 并发控制 ──────────────────────────────────────────── + + async def _maybe_acquire_concurrency_slot( + self, + request_body: dict[str, Any], + ) -> asyncio.Semaphore | None: + """按映射后模型名获取并发槽位;未配置 concurrency 时返回 None. + + ``map_model()`` 是纯同步字典查找,在 Semaphore 等待前调用是安全的, + 且能确保排队键与上游真实承载模型对齐。 + """ + if self._concurrency_limiter is None: + return None + raw_model = request_body.get("model", "") if request_body else "" + mapped_model = self.map_model(raw_model) if raw_model else "" + if not mapped_model: + return None + return await self._concurrency_limiter.acquire(mapped_model) + + # ── 诊断信息 ───────────────────────────────────────────── + + def get_diagnostics(self) -> dict[str, Any]: + """返回供应商运行时诊断信息,包含每模型并发状态.""" + diagnostics = super().get_diagnostics() + if self._concurrency_limiter is not None: + diagnostics["concurrency"] = self._concurrency_limiter.get_diagnostics() + return diagnostics + + def update_concurrency(self, model: str, limit: int) -> None: + """运行时更新指定模型的并发限制.""" + if self._concurrency_limiter is None: + msg = "Concurrency limiter is not enabled for this vendor" + raise ValueError(msg) + self._concurrency_limiter.set_limit(model, limit) + + # ── 延迟计算 ──────────────────────────────────────────── + + def _compute_retry_delay_from_headers( + self, + headers: dict[str, str] | None, + attempt: int, + ) -> float: + """计算重试延迟(毫秒),优先使用 server retry-after.""" + rl_info = parse_rate_limit_headers(headers, 429, None) + server_delay_s = compute_effective_retry_seconds(rl_info) + if server_delay_s is not None: + return min(server_delay_s * 1000, self._rl_retry.max_delay_ms) + return calculate_delay(attempt, self._rl_retry) + + def _compute_retry_delay_from_response( + self, + response: httpx.Response, + attempt: int, + ) -> float: + """计算重试延迟(毫秒),从 httpx.Response 提取 header.""" + rl_info = parse_rate_limit_headers( + response.headers, + response.status_code, + response.text[:500] if response.text else None, + ) + server_delay_s = compute_effective_retry_seconds(rl_info) + if server_delay_s is not None: + return min(server_delay_s * 1000, self._rl_retry.max_delay_ms) + return calculate_delay(attempt, self._rl_retry) # 向后兼容别名 ZhipuBackend = ZhipuVendor + + +def _build_zhipu_request_snapshot(body: dict[str, Any]) -> str: + """构建发往 zhipu 请求的轻量参数快照,用于诊断日志. + + 输出格式与 executor._build_semantic_rejection_diagnostic 一致, + 使成功请求和失败请求的日志可直接 diff 对比,定位差异维度。 + + 仅在转换发生时输出(DEBUG 级别),避免常态化日志噪声。 + """ + parts: list[str] = [] + parts.append(f"messages={len(body.get('messages', []))}") + + thinking = body.get("thinking") + if isinstance(thinking, dict): + parts.append(f"thinking_type={thinking.get('type', 'unknown')}") + + metadata = body.get("metadata") + if isinstance(metadata, dict) and metadata: + parts.append(f"metadata_keys={len(metadata)}") + + tools = body.get("tools") + if isinstance(tools, list): + parts.append(f"tools={len(tools)}") + + system = body.get("system") + if isinstance(system, list): + parts.append(f"system_blocks={len(system)}") + + try: + body_bytes = len(json.dumps(body, ensure_ascii=False).encode("utf-8")) + parts.append(f"body_bytes={body_bytes}") + except (TypeError, ValueError): + pass + + return f" [{', '.join(parts)}]" if parts else "" diff --git a/tests/e2e/__init__.py b/tests/e2e/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tests/e2e/conftest.py b/tests/e2e/conftest.py new file mode 100644 index 0000000..cf41f45 --- /dev/null +++ b/tests/e2e/conftest.py @@ -0,0 +1,199 @@ +"""E2E 集成测试共享 fixtures — Antigravity 真实凭证加载与测试对象构建.""" + +from __future__ import annotations + +import os +from typing import Any + +import pytest + +# ── 模块级门控:未设置环境变量时跳过整个 e2e 包 ── + +_SKIP_REASON = "Set RUN_ANTIGRAVITY_E2E=1 to enable Antigravity E2E tests" + + +def pytest_configure(config: pytest.Config) -> None: + config.addinivalue_line( + "markers", "e2e: End-to-end tests requiring real Antigravity credentials" + ) + + +def _load_real_credentials() -> dict[str, str] | None: + """从 ~/.coding-proxy/ 加载真实的 Google OAuth 凭证.""" + from coding.proxy.auth.providers.google import ( + _DEFAULT_CLIENT_ID, + _DEFAULT_CLIENT_SECRET, + ) + from coding.proxy.auth.store import TokenStoreManager + from coding.proxy.config.loader import load_config + + try: + token_store = TokenStoreManager() + token_store.load() + google_tokens = token_store.get("google") + if not google_tokens.refresh_token: + return None + + config = load_config() + + # 从 vendors 列表查找 antigravity 配置 + client_id = "" + client_secret = "" + base_url = "" + model_endpoint = "models/claude-sonnet-4-20250514" + project_id = "" + + for vc in config.vendors: + if vc.vendor == "antigravity": + client_id = vc.client_id or _DEFAULT_CLIENT_ID + client_secret = vc.client_secret or _DEFAULT_CLIENT_SECRET + base_url = ( + vc.base_url or "https://generativelanguage.googleapis.com/v1beta" + ) + model_endpoint = vc.model_endpoint or model_endpoint + break + + # 优先使用 config.yaml 中的 refresh_token,否则使用 token store + refresh_token = "" + for vc in config.vendors: + if vc.vendor == "antigravity" and vc.refresh_token: + refresh_token = vc.refresh_token + break + if not refresh_token: + refresh_token = google_tokens.refresh_token + + return { + "client_id": client_id, + "client_secret": client_secret, + "refresh_token": refresh_token, + "base_url": base_url, + "model_endpoint": model_endpoint, + "project_id": project_id, + } + except Exception: + return None + + +# ── Fixtures ── + + +@pytest.fixture(scope="session") +def e2e_credentials() -> dict[str, str]: + """加载真实 Antigravity OAuth 凭证,失败则跳过.""" + if os.environ.get("RUN_ANTIGRAVITY_E2E") != "1": + pytest.skip(_SKIP_REASON) + creds = _load_real_credentials() + if creds is None: + pytest.skip("No valid Antigravity credentials found in ~/.coding-proxy/") + return creds + + +@pytest.fixture(scope="session") +def antigravity_config(e2e_credentials: dict[str, str]) -> Any: + """构建标准 GLA 模式的 AntigravityConfig.""" + from coding.proxy.config.vendors import AntigravityConfig + + return AntigravityConfig( + enabled=True, + client_id=e2e_credentials["client_id"], + client_secret=e2e_credentials["client_secret"], + refresh_token=e2e_credentials["refresh_token"], + base_url=e2e_credentials["base_url"], + model_endpoint=e2e_credentials["model_endpoint"], + timeout_ms=60000, + ) + + +@pytest.fixture(scope="session") +def antigravity_config_v1internal(e2e_credentials: dict[str, str]) -> Any: + """构建 v1internal 模式的 AntigravityConfig(无 project_id,触发自动发现).""" + from coding.proxy.config.vendors import AntigravityConfig + + return AntigravityConfig( + enabled=True, + client_id=e2e_credentials["client_id"], + client_secret=e2e_credentials["client_secret"], + refresh_token=e2e_credentials["refresh_token"], + base_url="https://cloudcode-pa.googleapis.com/v1internal", + model_endpoint=e2e_credentials["model_endpoint"], + timeout_ms=60000, + ) + + +@pytest.fixture +async def antigravity_vendor(antigravity_config: Any) -> Any: + """构建标准 GLA 模式的 AntigravityVendor(function scope,每次测试独立).""" + from coding.proxy.config.schema import FailoverConfig + from coding.proxy.routing.model_mapper import ModelMapper + from coding.proxy.vendors.antigravity import AntigravityVendor + + vendor = AntigravityVendor(antigravity_config, FailoverConfig(), ModelMapper([])) + yield vendor + await vendor.close() + + +@pytest.fixture +async def antigravity_vendor_v1internal(antigravity_config_v1internal: Any) -> Any: + """构建 v1internal 模式的 AntigravityVendor.""" + from coding.proxy.config.schema import FailoverConfig + from coding.proxy.routing.model_mapper import ModelMapper + from coding.proxy.vendors.antigravity import AntigravityVendor + + vendor = AntigravityVendor( + antigravity_config_v1internal, FailoverConfig(), ModelMapper([]) + ) + yield vendor + await vendor.close() + + +@pytest.fixture +def minimal_request_body() -> dict[str, Any]: + """最小 Anthropic 格式请求体(用于最小化 token 消耗).""" + return { + "model": "claude-sonnet-4-20250514", + "messages": [{"role": "user", "content": "Say exactly: pong"}], + "max_tokens": 32, + } + + +@pytest.fixture(scope="session") +def e2e_app(e2e_credentials: dict[str, str]) -> Any: + """构建仅启用 Antigravity 的 FastAPI 应用(临时 DB).""" + import tempfile + + from coding.proxy.config.schema import ProxyConfig + from coding.proxy.server.app import create_app + + tmpdir = tempfile.mkdtemp(prefix="e2e-antigravity-") + db_path = os.path.join(tmpdir, "usage.db") + compat_path = os.path.join(tmpdir, "compat.db") + + config = ProxyConfig( + vendors=[ + { + "vendor": "antigravity", + "enabled": True, + "client_id": e2e_credentials["client_id"], + "client_secret": e2e_credentials["client_secret"], + "refresh_token": e2e_credentials["refresh_token"], + "base_url": "https://cloudcode-pa.googleapis.com/v1internal", + "model_endpoint": e2e_credentials["model_endpoint"], + "timeout_ms": 60000, + }, + ], + tiers=["antigravity"], + database={"path": db_path, "compat_state_path": compat_path}, + ) + return create_app(config) + + +@pytest.fixture +async def e2e_client(e2e_app: Any) -> Any: + """构建异步 HTTP 客户端(支持 SSE 流式测试).""" + import httpx + + transport = httpx.ASGITransport(app=e2e_app) + async with httpx.AsyncClient( + transport=transport, base_url="http://test", timeout=60.0 + ) as client: + yield client diff --git a/tests/e2e/test_e2e_http.py b/tests/e2e/test_e2e_http.py new file mode 100644 index 0000000..fe84db5 --- /dev/null +++ b/tests/e2e/test_e2e_http.py @@ -0,0 +1,263 @@ +"""Level 3 E2E: 完整 HTTP 端到端 — 模拟 Claude Code 通过 coding-proxy 使用 Antigravity.""" + +from __future__ import annotations + +import json + +import pytest + +# Claude Code 发送的典型 headers +CLAUDE_CODE_HEADERS = { + "anthropic-version": "2023-06-01", + "content-type": "application/json", + "x-api-key": "sk-ant-placeholder", +} + + +def _is_quota_exhausted(response: object) -> bool: + """检查响应是否为配额耗尽 (429).""" + if response.status_code != 429: + return False + try: + body = response.json() + err = body.get("error", {}) + msg = err.get("message", "").lower() + return "resource" in msg or "quota" in msg or "exhausted" in msg + except Exception: + return False + + +def _is_scope_error(response: object) -> bool: + """检查响应是否为 scope 不足 (403).""" + if response.status_code != 403: + return False + try: + body = response.json() + err = body.get("error", {}) + return "scope" in json.dumps(err).lower() + except Exception: + return False + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_http_non_streaming( + e2e_client: object, + minimal_request_body: dict, +) -> None: + """POST /v1/messages 非流式 → 验证协议对接正确.""" + response = await e2e_client.post( + "/v1/messages", + json=minimal_request_body, + headers=CLAUDE_CODE_HEADERS, + ) + + if _is_scope_error(response): + pytest.skip("GLA 端点 scope 不足,需要 v1internal 模式") + if _is_quota_exhausted(response): + print("\n[E2E] HTTP non-streaming: 协议对接正确,但配额已耗尽 (429)") + return + + assert response.status_code == 200, ( + f"预期 200,实际 {response.status_code}: {response.text[:300]}" + ) + + body = response.json() + assert body["type"] == "message", f"预期 type=message,实际: {body.get('type')}" + assert body["role"] == "assistant" + assert len(body["content"]) > 0, "content 为空" + assert body["content"][0]["type"] == "text" + assert body["usage"]["input_tokens"] > 0, "input_tokens 应 > 0" + + print( + f"\n[E2E] HTTP non-streaming 成功: model={body.get('model')}, " + f"input={body['usage']['input_tokens']}, output={body['usage']['output_tokens']}" + ) + print(f" content: {body['content'][0].get('text', '')[:100]}") + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_http_streaming(e2e_client: object) -> None: + """POST /v1/messages (stream=true) → 验证 SSE 协议.""" + body = { + "model": "claude-sonnet-4-20250514", + "messages": [{"role": "user", "content": "Say exactly: pong"}], + "max_tokens": 32, + "stream": True, + } + + events: list[str] = [] + content_chunks: list[str] = [] + + try: + async with e2e_client.stream( + "POST", "/v1/messages", json=body, headers=CLAUDE_CODE_HEADERS + ) as response: + if response.status_code == 429: + print("\n[E2E] HTTP streaming: 协议对接正确,但配额已耗尽 (429)") + return + + assert response.status_code == 200, f"预期 200,实际 {response.status_code}" + + async for line in response.aiter_lines(): + line = line.strip() + if not line: + continue + if line.startswith("event:"): + events.append(line[6:].strip()) + elif line.startswith("data:"): + payload = line[5:].strip() + if payload == "[DONE]": + continue + try: + data = json.loads(payload) + if data.get("type") == "content_block_delta": + delta = data.get("delta", {}) + if delta.get("type") == "text_delta": + content_chunks.append(delta.get("text", "")) + except json.JSONDecodeError: + pass + + assert "message_start" in events, f"缺少 message_start,实际: {events[:10]}" + assert "content_block_delta" in events, "缺少 content_block_delta" + assert "message_stop" in events, "缺少 message_stop" + + full_text = "".join(content_chunks) + print( + f"\n[E2E] HTTP streaming 成功: events={len(events)}, content='{full_text[:100]}'" + ) + except Exception as exc: + error_str = str(exc) + if "429" in error_str or "exhausted" in error_str.lower(): + print("\n[E2E] HTTP streaming: 协议对接正确,但配额已耗尽 (429)") + return + raise + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_http_with_tools(e2e_client: object) -> None: + """POST /v1/messages 带 tools 定义 → 请求正常往返.""" + body = { + "model": "claude-sonnet-4-20250514", + "messages": [ + {"role": "user", "content": "What is 2+2? Reply with just the number."} + ], + "max_tokens": 128, + "tools": [ + { + "name": "calculator", + "description": "Performs arithmetic", + "input_schema": { + "type": "object", + "properties": {"expression": {"type": "string"}}, + "required": ["expression"], + }, + } + ], + } + response = await e2e_client.post( + "/v1/messages", json=body, headers=CLAUDE_CODE_HEADERS + ) + + if _is_scope_error(response): + pytest.skip("GLA 端点 scope 不足") + if _is_quota_exhausted(response): + print("\n[E2E] HTTP with tools: 协议对接正确,配额耗尽") + return + + assert response.status_code == 200, ( + f"预期 200,实际 {response.status_code}: {response.text[:300]}" + ) + + resp_body = response.json() + assert resp_body["type"] == "message" + assert len(resp_body["content"]) > 0 + content_types = [b["type"] for b in resp_body["content"]] + print(f"\n[E2E] HTTP with tools 成功: content_types={content_types}") + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_http_health_probe(e2e_client: object) -> None: + """HEAD / 和 GET /health → 200(Claude Code 连通性探测).""" + head_resp = await e2e_client.head("/") + assert head_resp.status_code == 200, ( + f"HEAD / 预期 200,实际 {head_resp.status_code}" + ) + + get_resp = await e2e_client.get("/") + assert get_resp.status_code == 200, f"GET / 预期 200,实际 {get_resp.status_code}" + + health_resp = await e2e_client.get("/health") + assert health_resp.status_code == 200 + assert health_resp.json() == {"status": "ok"} + + print("\n[E2E] HTTP health probe 成功: HEAD /=200, GET /=200, /health=ok") + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_http_status_diagnostics(e2e_client: object) -> None: + """GET /api/status → 包含 antigravity tier 诊断信息.""" + response = await e2e_client.get("/api/status") + assert response.status_code == 200 + + data = response.json() + assert "tiers" in data + antigravity_tiers = [t for t in data["tiers"] if t["name"] == "antigravity"] + assert len(antigravity_tiers) == 1, ( + f"预期 1 个 antigravity tier,实际: {len(antigravity_tiers)}" + ) + + tier = antigravity_tiers[0] + assert "diagnostics" in tier, "缺少 diagnostics" + + diag = tier["diagnostics"] + print("\n[E2E] status diagnostics:") + for k, v in diag.items(): + if isinstance(v, dict): + print(f" {k}: {json.dumps(v, ensure_ascii=False)[:200]}") + else: + print(f" {k}: {v}") + + # token_manager 诊断可能为空(若未发生错误),仅验证其存在性 + if "token_manager" in diag: + print(" token_manager diagnostics present") + else: + print(" (token_manager diagnostics empty — no token errors)") + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_http_claude_code_headers(e2e_client: object) -> None: + """带完整 Claude Code headers 的请求正常(验证 x-api-key 不干扰 Antigravity).""" + headers = { + "anthropic-version": "2023-06-01", + "content-type": "application/json", + "x-api-key": "sk-ant-api03-fake-key-for-testing", + "accept": "application/json", + } + body = { + "model": "claude-sonnet-4-20250514", + "messages": [{"role": "user", "content": "Say: ok"}], + "max_tokens": 16, + } + response = await e2e_client.post("/v1/messages", json=body, headers=headers) + + if _is_quota_exhausted(response): + print("\n[E2E] Claude Code headers: 协议对接正确,配额耗尽") + return + + assert response.status_code == 200, ( + f"预期 200,实际 {response.status_code}: {response.text[:300]}" + ) + + resp_body = response.json() + assert resp_body["type"] == "message" + assert len(resp_body["content"]) > 0 + + print( + f"\n[E2E] Claude Code headers 成功: content='{resp_body['content'][0].get('text', '')[:80]}'" + ) diff --git a/tests/e2e/test_e2e_token.py b/tests/e2e/test_e2e_token.py new file mode 100644 index 0000000..dd3bb7b --- /dev/null +++ b/tests/e2e/test_e2e_token.py @@ -0,0 +1,93 @@ +"""Level 1 E2E: Google OAuth2 Token 刷新 — 验证真实凭证链路.""" + +from __future__ import annotations + +import pytest + +from coding.proxy.vendors.antigravity import GoogleOAuthTokenManager +from coding.proxy.vendors.token_manager import TokenAcquireError, TokenErrorKind + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_real_token_refresh(e2e_credentials: dict[str, str]) -> None: + """真实 refresh_token 应返回有效的 access_token(ya29. 前缀).""" + tm = GoogleOAuthTokenManager( + e2e_credentials["client_id"], + e2e_credentials["client_secret"], + e2e_credentials["refresh_token"], + ) + try: + token = await tm.get_token() + assert token, "access_token 为空" + assert token.startswith("ya29."), f"access_token 前缀异常: {token[:10]}..." + print(f"[E2E DIAG] access_token={token[:10]}... (len={len(token)})") + finally: + await tm.close() + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_real_token_caching(e2e_credentials: dict[str, str]) -> None: + """连续调用 get_token() 应返回缓存的同一 token.""" + tm = GoogleOAuthTokenManager( + e2e_credentials["client_id"], + e2e_credentials["client_secret"], + e2e_credentials["refresh_token"], + ) + try: + token1 = await tm.get_token() + token2 = await tm.get_token() + assert token1 == token2, "缓存未生效,两次返回不同 token" + assert tm._expires_at > 0, "expires_at 未被设置" + print(f"[E2E DIAG] caching OK: expires_at={tm._expires_at}") + finally: + await tm.close() + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_invalid_refresh_token_raises(e2e_credentials: dict[str, str]) -> None: + """错误的 refresh_token 应抛出 TokenAcquireError(INVALID_CREDENTIALS).""" + tm = GoogleOAuthTokenManager( + e2e_credentials["client_id"], + e2e_credentials["client_secret"], + "1//invalid_token_for_e2e_test_00000000", + ) + try: + with pytest.raises(TokenAcquireError) as exc_info: + await tm.get_token() + assert exc_info.value.kind == TokenErrorKind.INVALID_CREDENTIALS, ( + f"预期 INVALID_CREDENTIALS,实际: {exc_info.value.kind}" + ) + assert exc_info.value.needs_reauth is True + print(f"[E2E DIAG] invalid_grant 正确捕获: {exc_info.value}") + finally: + await tm.close() + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_token_invalidation_triggers_refresh( + e2e_credentials: dict[str, str], +) -> None: + """invalidate() 后重新获取应成功.""" + tm = GoogleOAuthTokenManager( + e2e_credentials["client_id"], + e2e_credentials["client_secret"], + e2e_credentials["refresh_token"], + ) + try: + token1 = await tm.get_token() + assert token1, "首次获取失败" + + tm.invalidate() + assert tm._expires_at == 0.0, "invalidate 后 expires_at 应为 0" + + token2 = await tm.get_token() + assert token2, "invalidate 后重新获取失败" + print( + f"[E2E DIAG] invalidation OK: token1={token1[:10]}... token2={token2[:10]}..." + ) + finally: + await tm.close() diff --git a/tests/e2e/test_e2e_vendor.py b/tests/e2e/test_e2e_vendor.py new file mode 100644 index 0000000..1781235 --- /dev/null +++ b/tests/e2e/test_e2e_vendor.py @@ -0,0 +1,327 @@ +"""Level 2 E2E: AntigravityVendor 直接调用 — 验证 GLA 和 v1internal 协议端到端.""" + +from __future__ import annotations + +import json + +import pytest + + +def _print_diagnostics(vendor: object, label: str) -> None: + diag = vendor.get_diagnostics() + print(f"\n[E2E DIAG] {label}:") + for k, v in diag.items(): + if isinstance(v, dict): + print(f" {k}: {json.dumps(v, ensure_ascii=False)[:200]}") + else: + print(f" {k}: {v}") + + +def _is_quota_exhausted(resp: object) -> bool: + """检查响应是否为配额耗尽(429 RESOURCE_EXHAUSTED). + + 429 表示协议对接正确但配额已用完,测试应标记为预期行为。 + """ + if resp.status_code != 429: + return False + error_msg = (resp.error_message or "").lower() + return "resource" in error_msg or "quota" in error_msg or "exhausted" in error_msg + + +def _is_scope_error(resp: object) -> bool: + """检查响应是否为 scope 不足错误.""" + if resp.status_code != 403: + return False + return "scope" in (resp.error_message or "").lower() + + +# ── 标准 GLA 模式 ── + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_gla_non_streaming_text( + antigravity_vendor: object, + minimal_request_body: dict, +) -> None: + """GLA 模式非流式请求 — 验证协议对接正确.""" + resp = await antigravity_vendor.send_message(minimal_request_body, {}) + _print_diagnostics(antigravity_vendor, "GLA non-streaming") + + # 403 scope 不足说明 GLA 端点不适用于当前凭证(正常,需要 v1internal) + if _is_scope_error(resp): + pytest.skip("GLA 端点 scope 不足,需要 v1internal 模式") + + # 429 配额耗尽 = 协议对接正确,仅配额问题 + if _is_quota_exhausted(resp): + print("\n[E2E] GLA non-streaming: 协议对接正确,但配额已耗尽 (429)") + return + + assert resp.status_code == 200, ( + f"预期 200,实际 {resp.status_code}: {resp.error_message}" + ) + + body = json.loads(resp.raw_body) + assert body["type"] == "message", f"预期 type=message,实际: {body.get('type')}" + assert body["role"] == "assistant" + assert len(body["content"]) > 0, "content 为空" + assert body["content"][0]["type"] == "text" + assert body["stop_reason"] in ("end_turn", "max_tokens") + assert body["usage"]["input_tokens"] > 0, "input_tokens 应 > 0" + + print( + f"\n[E2E] GLA non-streaming 成功: model={body.get('model')}, " + f"input={body['usage']['input_tokens']}, output={body['usage']['output_tokens']}, " + f"stop_reason={body['stop_reason']}" + ) + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_gla_streaming_text( + antigravity_vendor: object, + minimal_request_body: dict, +) -> None: + """GLA 模式流式请求 — 验证 SSE 协议对接.""" + minimal_request_body["stream"] = True + + events: list[str] = [] + content_chunks: list[str] = [] + quota_exhausted = False + + try: + async for chunk in antigravity_vendor.send_message_stream( + minimal_request_body, {} + ): + text = chunk.decode("utf-8", errors="replace") + for line in text.split("\n"): + line = line.strip() + if line.startswith("event:"): + events.append(line[6:].strip()) + elif line.startswith("data:"): + try: + data = json.loads(line[5:].strip()) + if data.get("type") == "content_block_delta": + delta = data.get("delta", {}) + if delta.get("type") == "text_delta": + content_chunks.append(delta.get("text", "")) + except json.JSONDecodeError: + pass + except Exception as exc: + error_str = str(exc).lower() + if "403" in error_str and "scope" in error_str: + pytest.skip("GLA 端点 scope 不足,需要 v1internal 模式") + if "429" in error_str or "quota" in error_str or "exhausted" in error_str: + quota_exhausted = True + print("\n[E2E] GLA streaming: 协议对接正确,但配额已耗尽 (429)") + else: + raise + + if not quota_exhausted: + _print_diagnostics(antigravity_vendor, "GLA streaming") + assert "message_start" in events, ( + f"缺少 message_start 事件,实际事件: {events[:10]}" + ) + assert "content_block_delta" in events, "缺少 content_block_delta 事件" + assert "message_stop" in events, "缺少 message_stop 事件" + + full_text = "".join(content_chunks) + print( + f"\n[E2E] GLA streaming 成功: events={len(events)}, content='{full_text[:100]}'" + ) + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_gla_with_system_prompt( + antigravity_vendor: object, + minimal_request_body: dict, +) -> None: + """GLA 模式带 system prompt 的请求正常.""" + minimal_request_body["system"] = ( + "You are a test assistant. Always respond with exactly one word." + ) + resp = await antigravity_vendor.send_message(minimal_request_body, {}) + + if _is_scope_error(resp): + pytest.skip("GLA 端点 scope 不足") + if _is_quota_exhausted(resp): + print("\n[E2E] GLA with system prompt: 协议对接正确,配额耗尽") + return + + assert resp.status_code == 200, ( + f"预期 200,实际 {resp.status_code}: {resp.error_message}" + ) + body = json.loads(resp.raw_body) + assert body["type"] == "message" + assert len(body["content"]) > 0 + + print( + f"\n[E2E] GLA with system prompt 成功: content='{body['content'][0].get('text', '')[:80]}'" + ) + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_gla_with_tools( + antigravity_vendor: object, + minimal_request_body: dict, +) -> None: + """GLA 模式带 tools 定义的请求正常往返.""" + minimal_request_body["tools"] = [ + { + "name": "calculator", + "description": "Performs arithmetic", + "input_schema": { + "type": "object", + "properties": {"expression": {"type": "string"}}, + "required": ["expression"], + }, + } + ] + minimal_request_body["messages"] = [ + {"role": "user", "content": "What is 2+2? Reply with just the number."} + ] + resp = await antigravity_vendor.send_message(minimal_request_body, {}) + + if _is_scope_error(resp): + pytest.skip("GLA 端点 scope 不足") + if _is_quota_exhausted(resp): + print("\n[E2E] GLA with tools: 协议对接正确,配额耗尽") + return + + assert resp.status_code == 200, ( + f"预期 200,实际 {resp.status_code}: {resp.error_message}" + ) + body = json.loads(resp.raw_body) + assert body["type"] == "message" + assert len(body["content"]) > 0 + + _print_diagnostics(antigravity_vendor, "GLA with tools") + print( + f"\n[E2E] GLA with tools 成功: content_types={[b['type'] for b in body['content']]}" + ) + + +# ── v1internal 模式 ── + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_v1internal_non_streaming( + antigravity_vendor_v1internal: object, + minimal_request_body: dict, +) -> None: + """v1internal 模式非流式请求 — 验证协议对接.""" + resp = await antigravity_vendor_v1internal.send_message(minimal_request_body, {}) + + _print_diagnostics(antigravity_vendor_v1internal, "v1internal non-streaming") + + # 429 = 协议对接正确,仅配额问题 + if _is_quota_exhausted(resp): + diag = antigravity_vendor_v1internal.get_diagnostics() + print( + f"\n[E2E] v1internal non-streaming: 协议对接正确 (is_v1internal={diag.get('is_v1internal_mode')}),但配额已耗尽 (429)" + ) + return + + assert resp.status_code == 200, ( + f"预期 200,实际 {resp.status_code}: {resp.error_message}" + ) + body = json.loads(resp.raw_body) + assert body["type"] == "message" + assert body["role"] == "assistant" + assert len(body["content"]) > 0 + + diag = antigravity_vendor_v1internal.get_diagnostics() + print( + f"\n[E2E] v1internal non-streaming 成功: " + f"is_v1internal={diag.get('is_v1internal_mode')}, " + f"project_id_source={diag.get('project_id_source')}, " + f"input={body['usage']['input_tokens']}, output={body['usage']['output_tokens']}" + ) + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_v1internal_streaming( + antigravity_vendor_v1internal: object, + minimal_request_body: dict, +) -> None: + """v1internal 模式流式请求 — 验证 SSE 协议.""" + minimal_request_body["stream"] = True + + events: list[str] = [] + content_chunks: list[str] = [] + quota_exhausted = False + + try: + async for chunk in antigravity_vendor_v1internal.send_message_stream( + minimal_request_body, {} + ): + text = chunk.decode("utf-8", errors="replace") + for line in text.split("\n"): + line = line.strip() + if line.startswith("event:"): + events.append(line[6:].strip()) + elif line.startswith("data:"): + try: + data = json.loads(line[5:].strip()) + if data.get("type") == "content_block_delta": + delta = data.get("delta", {}) + if delta.get("type") == "text_delta": + content_chunks.append(delta.get("text", "")) + except json.JSONDecodeError: + pass + except Exception as exc: + error_str = str(exc) + if "429" in error_str: + quota_exhausted = True + print("\n[E2E] v1internal streaming: 协议对接正确,但配额已耗尽 (429)") + else: + raise + + if not quota_exhausted: + _print_diagnostics(antigravity_vendor_v1internal, "v1internal streaming") + assert "message_start" in events, "缺少 message_start" + assert "content_block_delta" in events, "缺少 content_block_delta" + assert "message_stop" in events, "缺少 message_stop" + + full_text = "".join(content_chunks) + print( + f"\n[E2E] v1internal streaming 成功: events={len(events)}, content='{full_text[:100]}'" + ) + + +@pytest.mark.e2e +@pytest.mark.asyncio +async def test_project_id_auto_discovery( + antigravity_vendor_v1internal: object, + minimal_request_body: dict, +) -> None: + """首次请求后 v1internal 模式状态和 project_id 发现结果.""" + resp = await antigravity_vendor_v1internal.send_message(minimal_request_body, {}) + + diag = antigravity_vendor_v1internal.get_diagnostics() + source = diag.get("project_id_source", "unknown") + is_v1 = diag.get("is_v1internal_mode", False) + + print(f"\n[E2E] project_id discovery: source={source}, is_v1internal={is_v1}") + + # v1internal 模式应已启用(由 base_url 配置驱动) + assert is_v1 is True, "v1internal 模式应已启用" + assert source in ("discovered", "none", "configured"), ( + f"未知的 project_id_source: {source}" + ) + + # 请求应到达了 API 端点(429 配额耗尽或 200 成功都说明协议对接正确) + assert resp.status_code in (200, 429), ( + f"预期 200/429,实际 {resp.status_code}: {resp.error_message[:200]}" + ) + + if resp.status_code == 429: + print(" 配额已耗尽 (429),但协议对接验证正确") + elif source == "discovered": + print(f" discovered_project_id={diag.get('discovered_project_id')}") + elif source == "none": + print(" 未发现 project_id,v1internal 无需 project_id") diff --git a/tests/test_antigravity.py b/tests/test_antigravity.py index 6256bfb..cc93127 100644 --- a/tests/test_antigravity.py +++ b/tests/test_antigravity.py @@ -384,12 +384,12 @@ def test_is_v1internal_mode_with_project_id_and_v1internal_url(): def test_is_v1internal_mode_without_project_id(): - """未配置 project_id 时即使 URL 含 v1internal 也不启用.""" + """v1internal 模式由 base_url 驱动,无需 project_id(与参考项目对齐).""" config = AntigravityConfig( base_url="https://cloudcode-pa.googleapis.com/v1internal", ) vendor = AntigravityVendor(config, FailoverConfig(), ModelMapper([])) - assert vendor._is_v1internal_mode() is False + assert vendor._is_v1internal_mode() is True def test_is_v1internal_mode_standard_gla_url(): @@ -527,7 +527,7 @@ async def test_discover_project_id_single_active_project(): assert result == "my-gcp-123" assert vendor._project_id_discovered == "my-gcp-123" - assert vendor._base_url == "https://cloudcode-pa.googleapis.com/v1internal" + assert vendor._base_url == "https://cloudcode-pa.googleapis.com" assert vendor._is_v1internal_mode() is True @@ -743,20 +743,19 @@ async def mock_discover(token): def test_is_v1internal_mode_uses_effective_project_id(): - """_is_v1internal_mode 应基于 _effective_project_id 判断.""" + """_is_v1internal_mode 应基于 base_url 判断(不再依赖 project_id).""" config = AntigravityConfig(base_url=_V1INTERNAL_BASE_URL) vendor = AntigravityVendor(config, FailoverConfig(), ModelMapper([])) - # 未配置、未发现 → False - assert vendor._is_v1internal_mode() is False + # base_url 含 v1internal → True(即使无 project_id) + assert vendor._is_v1internal_mode() is True - # 发现后 → True + # 发现 project_id 不影响 v1internal 模式判断 vendor._project_id_discovered = "found-it" assert vendor._is_v1internal_mode() is True - # 配置值覆盖发现值 + # 清除发现值也不影响 vendor._project_id_discovered = "" - vendor._project_id = "manual" assert vendor._is_v1internal_mode() is True diff --git a/tests/test_app_routes.py b/tests/test_app_routes.py index 8df0277..4c460e3 100644 --- a/tests/test_app_routes.py +++ b/tests/test_app_routes.py @@ -286,6 +286,76 @@ def test_count_tokens_falls_back_to_tiers0_on_cold_start(): assert resp.json()["input_tokens"] == 88 +def test_count_tokens_triggers_zhipu_to_target_channel(caplog): + """count_tokens 请求体含 zhipu 私有产物时,应触发跨供应商通道并返回 200. + + 回归测试:routes.py 历史上错误访问 target_vendor.name(BaseVendor 仅暴露 get_name() + 方法,并无 name 属性),当 infer_source_vendor_from_body() 推断出非空 source 时 + 会抛 AttributeError 返回 500。本用例通过注入 zhipu 私有产物(srvtoolu_* id 与 + server_tool_use 块)触发该路径,断言 200 且 adaptations 日志被打印。 + """ + config = ProxyConfig( + tiers=[ + {"vendor": "anthropic", "enabled": True, "api_key": "sk-ant-test"}, + ], + database={"path": "/tmp/test-count-tokens-zhipu-channel.db"}, + ) + app = create_app(config) + + mock_response = MagicMock() + mock_response.content = b'{"input_tokens": 99}' + mock_response.status_code = 200 + + body_with_zhipu_artifact = { + "model": "claude-sonnet-4-20250514", + "messages": [ + {"role": "user", "content": "Hello"}, + { + "role": "assistant", + "content": [ + { + "type": "server_tool_use", + "id": "srvtoolu_abc123", + "name": "web_search", + "input": {"query": "test"}, + }, + ], + }, + { + "role": "user", + "content": [ + { + "type": "tool_result", + "tool_use_id": "srvtoolu_abc123", + "content": "result", + }, + ], + }, + ], + } + + with TestClient(app) as client: + with patch.object( + httpx.AsyncClient, + "post", + new_callable=AsyncMock, + return_value=mock_response, + ): + with caplog.at_level(logging.DEBUG, logger="coding.proxy.server.routes"): + resp = client.post( + "/v1/messages/count_tokens?beta=true", + json=body_with_zhipu_artifact, + headers={"authorization": "Bearer sk-test"}, + ) + assert resp.status_code == 200 + assert resp.json()["input_tokens"] == 99 + # 通道被实际触发的证据:debug 日志含 "count_tokens channel zhipu → anthropic" + assert any( + "count_tokens channel zhipu" in record.message + for record in caplog.records + ), "expected zhipu→anthropic channel adaptation log" + + def test_status_exposes_vendor_diagnostics(): """状态接口暴露供应商诊断信息,便于排查凭证交换异常.""" config = ProxyConfig( diff --git a/tests/test_native_api_handler.py b/tests/test_native_api_handler.py index d8db031..14be66c 100644 --- a/tests/test_native_api_handler.py +++ b/tests/test_native_api_handler.py @@ -14,6 +14,7 @@ from __future__ import annotations +import json from collections.abc import Iterator import httpx @@ -372,3 +373,191 @@ def factory(make_transport): r = client.request(method, "/api/openai/v1/files/abc") assert r.status_code == 200 assert captured[0].method == method + + +# ── Gemini batchEmbedContents 端到端 ───────────────────────────── + + +def test_gemini_batch_embed_forwards_correctly() -> None: + """Gemini batchEmbedContents 端点(字面冒号)正确转发.""" + + def route(request: httpx.Request) -> httpx.Response: + return httpx.Response( + 200, + json={"embeddings": [{"values": [0.1, 0.2]}]}, + ) + + def factory(make_transport): + cfg = NativeApiConfig( + gemini=NativeProviderConfig( + enabled=True, base_url="https://generativelanguage.googleapis.com" + ), + ) + transport = make_transport(route) + return NativeProxyHandler(cfg, transport=transport), transport + + for client, captured in _make_app(factory): + r = client.post( + "/api/gemini/v1beta/models/gemini-embedding-001:batchEmbedContents?key=secret123", + json={ + "requests": [ + { + "model": "models/gemini-embedding-001", + "content": {"parts": [{"text": "hello"}]}, + } + ] + }, + ) + assert r.status_code == 200 + assert r.json()["embeddings"][0]["values"] == [0.1, 0.2] + upstream = captured[0] + # 上游 URL 必须含字面冒号,不含 %3A + upstream_str = str(upstream.url) + assert ":batchEmbedContents" in upstream_str + assert "%3A" not in upstream_str + assert upstream.url.params.get("key") == "secret123" + + +def test_gemini_url_encoded_colon_decoded_for_upstream() -> None: + """当 %3A 到达代理时,上游必须收到字面冒号.""" + + def route(request: httpx.Request) -> httpx.Response: + return httpx.Response(200, json={"ok": True}) + + def factory(make_transport): + cfg = NativeApiConfig( + gemini=NativeProviderConfig( + enabled=True, base_url="https://generativelanguage.googleapis.com" + ), + ) + transport = make_transport(route) + return NativeProxyHandler(cfg, transport=transport), transport + + for client, captured in _make_app(factory): + r = client.post( + "/api/gemini/v1beta/models/gemini-embedding-001%3AbatchEmbedContents?key=k", + json={"requests": []}, + ) + assert r.status_code == 200 + upstream = captured[0] + upstream_str = str(upstream.url) + # 上游 URL 必须含字面冒号,不含 %3A + assert "%3A" not in upstream_str + assert ":batchEmbedContents" in upstream_str + + +# ── Gemini embedding Vertex AI 格式转换 ───────────────────────── + + +def test_gemini_vertex_embed_content_single() -> None: + """非官方上游时,embedContent 转为 Vertex AI 格式.""" + + def route(request: httpx.Request) -> httpx.Response: + body = json.loads(request.content) + assert "content" in body + assert "model" not in body + assert "requests" not in body + assert ":embedContent" in str(request.url) + assert "v1beta1/publishers/google/models" in str(request.url) + return httpx.Response(200, json={"embedding": {"values": [0.1, 0.2]}}) + + def factory(make_transport): + cfg = NativeApiConfig( + gemini=NativeProviderConfig(enabled=True, base_url="http://llms.as-in.io"), + ) + transport = make_transport(route) + return NativeProxyHandler(cfg, transport=transport), transport + + for client, captured in _make_app(factory): + r = client.post( + "/api/gemini/v1beta/models/gemini-embedding-2-preview:embedContent", + json={ + "model": "models/gemini-embedding-2-preview", + "content": {"parts": [{"text": "hello"}]}, + }, + ) + assert r.status_code == 200 + assert "embedding" in r.json() + + +def test_gemini_vertex_batch_embed_contents() -> None: + """非官方上游时,batchEmbedContents 拆分为多次 embedContent 并聚合.""" + + call_count = 0 + + def route(request: httpx.Request) -> httpx.Response: + nonlocal call_count + call_count += 1 + body = json.loads(request.content) + assert "content" in body + assert ":embedContent" in str(request.url) + assert "v1beta1/publishers/google/models" in str(request.url) + return httpx.Response( + 200, + json={"embedding": {"values": [float(call_count), 0.5]}}, + ) + + def factory(make_transport): + cfg = NativeApiConfig( + gemini=NativeProviderConfig(enabled=True, base_url="http://llms.as-in.io"), + ) + transport = make_transport(route) + return NativeProxyHandler(cfg, transport=transport), transport + + for client, captured in _make_app(factory): + r = client.post( + "/api/gemini/v1beta/models/gemini-embedding-2-preview:batchEmbedContents", + json={ + "requests": [ + { + "model": "models/gemini-embedding-2-preview", + "content": {"parts": [{"text": "hello"}]}, + }, + { + "model": "models/gemini-embedding-2-preview", + "content": {"parts": [{"text": "world"}]}, + }, + ] + }, + ) + assert r.status_code == 200 + data = r.json() + assert "embeddings" in data + assert len(data["embeddings"]) == 2 + assert data["embeddings"][0]["values"] == [1.0, 0.5] + assert data["embeddings"][1]["values"] == [2.0, 0.5] + assert call_count == 2 + + +def test_gemini_vertex_embed_official_upstream_unchanged() -> None: + """官方上游时,batchEmbedContents 走原始透传路径,不做格式转换.""" + + def route(request: httpx.Request) -> httpx.Response: + return httpx.Response(200, json={"embeddings": [{"values": [0.1, 0.2]}]}) + + def factory(make_transport): + cfg = NativeApiConfig( + gemini=NativeProviderConfig( + enabled=True, base_url="https://generativelanguage.googleapis.com" + ), + ) + transport = make_transport(route) + return NativeProxyHandler(cfg, transport=transport), transport + + for client, captured in _make_app(factory): + r = client.post( + "/api/gemini/v1beta/models/gemini-embedding-001:batchEmbedContents?key=k", + json={ + "requests": [ + { + "model": "models/gemini-embedding-001", + "content": {"parts": [{"text": "hello"}]}, + } + ] + }, + ) + assert r.status_code == 200 + # 官方上游走原始路径,URL 保持 v1beta/models/ 格式 + upstream = captured[0] + assert "v1beta/models" in str(upstream.url) + assert "v1beta1/publishers" not in str(upstream.url) diff --git a/tests/test_native_api_operation.py b/tests/test_native_api_operation.py index 64cd160..fc237bc 100644 --- a/tests/test_native_api_operation.py +++ b/tests/test_native_api_operation.py @@ -55,6 +55,12 @@ def test_classify_openai(path: str, expected: str) -> None: ("/v1beta/models/text-embedding-004:embedContent", "embedding"), ("/v1beta/models/text-embedding-004:batchEmbedContents", "embedding.batch"), ("/v1beta/models/imagegeneration@006:predict", "predict"), + # %3A (URL 编码冒号) 兼容性 + ("/v1beta/models/gemini-embedding-001%3AbatchEmbedContents", "embedding.batch"), + ("/v1beta/models/text-embedding-004%3AembedContent", "embedding"), + ("/v1beta/models/gemini-2.0-flash%3AgenerateContent", "generate_content"), + ("/v1beta/models/gemini-2.0-flash%3AstreamGenerateContent", "generate_content"), + ("/v1beta/models/gemini-1.5-pro%3AcountTokens", "count_tokens"), ("/v1beta/cachedContents", "cache"), ("/v1beta/cachedContents/cachedContents-xyz", "cache"), ("/v1beta/files", "file"), @@ -128,3 +134,14 @@ def test_is_stream_path() -> None: # OpenAI / Anthropic 不走路径判定(以响应 content-type 为准) assert not OperationClassifier.is_stream_path("openai", "/v1/chat/completions") assert not OperationClassifier.is_stream_path("anthropic", "/v1/messages") + + +def test_is_stream_path_with_encoded_colon() -> None: + """%3A (URL 编码冒号) 也应被 is_stream_path 识别.""" + assert OperationClassifier.is_stream_path( + "gemini", "/v1beta/models/gemini-2.0-flash%3AstreamGenerateContent" + ) + # %3A + 非流式路径仍应返回 False + assert not OperationClassifier.is_stream_path( + "gemini", "/v1beta/models/gemini-2.0-flash%3AgenerateContent" + ) diff --git a/tests/test_router_executor.py b/tests/test_router_executor.py index 1e40ea6..9506e67 100644 --- a/tests/test_router_executor.py +++ b/tests/test_router_executor.py @@ -20,11 +20,15 @@ build_canonical_request, ) from coding.proxy.routing.executor import ( + _SESSION_TITLE_MAX_LEN, _VENDOR_PROTOCOL_LABEL_MAP, + _build_semantic_rejection_diagnostic, + _extract_session_title, _has_tool_results, _is_likely_request_format_error, _log_vendor_response_error, _RouteExecutor, + _sanitize_user_text, ) from coding.proxy.routing.session_manager import RouteSessionManager from coding.proxy.routing.tier import VendorTier @@ -222,7 +226,7 @@ async def test_eligible_when_all_checks_pass(self): headers = {} caps = RequestCapabilities() req = build_canonical_request(body, headers) - session_record = await exec_inst._session_mgr.get_or_create_record( + session_record, _is_new = await exec_inst._session_mgr.get_or_create_record( req.session_key, req.trace_id ) reasons: list[str] = [] @@ -246,7 +250,7 @@ async def test_skip_when_capability_unsupported(self): body = {"model": "test"} headers = {} req = build_canonical_request(body, headers) - session_record = await exec_inst._session_mgr.get_or_create_record( + session_record, _is_new = await exec_inst._session_mgr.get_or_create_record( req.session_key, req.trace_id ) reasons: list[str] = [] @@ -275,7 +279,7 @@ async def test_skip_when_unsafe_compatibility(self): body = {"model": "test", "thinking": {"type": "enabled"}} headers = {} req = build_canonical_request(body, headers) - session_record = await exec_inst._session_mgr.get_or_create_record( + session_record, _is_new = await exec_inst._session_mgr.get_or_create_record( req.session_key, req.trace_id ) reasons: list[str] = [] @@ -651,9 +655,10 @@ class TestRouteSessionManagerIntegration: @pytest.mark.asyncio async def test_get_or_create_without_store(self): mgr = RouteSessionManager(compat_session_store=None) - record = await mgr.get_or_create_record("sk_test", "trace_1") - # 无 store 时返回 None(由 executor 层面处理空 record 场景) + record, is_new = await mgr.get_or_create_record("sk_test", "trace_1") + # 无 store 时返回 (None, False) assert record is None + assert is_new is False @pytest.mark.asyncio async def test_persist_session_without_store_is_noop(self): @@ -1948,3 +1953,374 @@ def test_returns_body_for_unknown_tier(self): result = exec_inst._prepare_body_for_tier(body, tier, source_vendor="zhipu") assert result is body + + +class TestBuildSemanticRejectionDiagnostic: + """覆盖 _build_semantic_rejection_diagnostic 函数 — 用于诊断 [1210] 等供应商语义拒绝. + + 重点验证: + - baseline 字段(model / messages)始终输出 + - 仅当参数存在时才输出相关项(避免日志噪声) + - 各字段输出格式稳定 + """ + + def test_baseline_minimal_body(self): + """最小请求体:仅输出 model + messages.""" + body = {"model": "glm-5-turbo", "messages": [{"role": "user", "content": "hi"}]} + result = _build_semantic_rejection_diagnostic(body) + assert "model=glm-5-turbo" in result + assert "messages=1" in result + # 不应输出未使用的字段 + assert "thinking" not in result + assert "tools" not in result + assert "cache_control" not in result + + def test_includes_thinking_param(self): + body = { + "model": "glm-5-turbo", + "messages": [], + "thinking": {"type": "enabled", "budget_tokens": 1024}, + } + result = _build_semantic_rejection_diagnostic(body) + assert "thinking=" in result + assert "budget_tokens" in result + + def test_includes_system_string(self): + body = { + "model": "glm-5-turbo", + "messages": [], + "system": "You are helpful." * 5, + } + result = _build_semantic_rejection_diagnostic(body) + assert "system_kind=string(len=" in result + + def test_includes_system_blocks_with_cache_control(self): + body = { + "model": "glm-5-turbo", + "messages": [], + "system": [ + { + "type": "text", + "text": "rule1", + "cache_control": {"type": "ephemeral"}, + }, + {"type": "text", "text": "rule2"}, + ], + } + result = _build_semantic_rejection_diagnostic(body) + assert "system_blocks=2,cc=1" in result + + def test_includes_tools_and_tool_choice(self): + body = { + "model": "glm-5-turbo", + "messages": [], + "tools": [{"name": "a"}, {"name": "b"}, {"name": "c"}], + "tool_choice": {"type": "auto"}, + } + result = _build_semantic_rejection_diagnostic(body) + assert "tools=3" in result + assert "tool_choice=" in result + + def test_includes_sampling_params(self): + body = { + "model": "glm-5-turbo", + "messages": [], + "max_tokens": 8192, + "temperature": 0.7, + "top_p": 0.9, + "top_k": 40, + "stop_sequences": ["\n\n", "END"], + } + result = _build_semantic_rejection_diagnostic(body) + assert "max_tokens=8192" in result + assert "temperature=0.7" in result + assert "top_p=0.9" in result + assert "top_k=40" in result + assert "stop_sequences=2" in result + + def test_includes_stream_and_metadata(self): + body = { + "model": "glm-5-turbo", + "messages": [], + "stream": True, + "metadata": {"user_id": "x", "session_id": "y"}, + } + result = _build_semantic_rejection_diagnostic(body) + assert "stream=True" in result + assert "metadata_keys=2" in result + + def test_content_type_distribution(self): + body = { + "model": "glm-5-turbo", + "messages": [ + { + "role": "user", + "content": [ + {"type": "text", "text": "hi"}, + {"type": "text", "text": "bye"}, + {"type": "image", "source": {}}, + ], + }, + { + "role": "assistant", + "content": [ + {"type": "tool_use", "id": "t1", "name": "x", "input": {}}, + ], + }, + ], + } + result = _build_semantic_rejection_diagnostic(body) + # 排序为字母序 + assert "content_types={image:1,text:2,tool_use:1}" in result + + def test_content_type_string_messages(self): + """messages.content 为 string 时计入 string:N.""" + body = { + "model": "glm-5-turbo", + "messages": [ + {"role": "user", "content": "hello"}, + {"role": "assistant", "content": "hi"}, + ], + } + result = _build_semantic_rejection_diagnostic(body) + assert "content_types={string:2}" in result + + def test_thinking_blocks_in_history(self): + body = { + "model": "glm-5-turbo", + "messages": [ + { + "role": "assistant", + "content": [ + {"type": "thinking", "thinking": "..."}, + {"type": "redacted_thinking", "data": "..."}, + {"type": "text", "text": "result"}, + ], + } + ], + } + result = _build_semantic_rejection_diagnostic(body) + assert "thinking_blocks_in_history=2" in result + + def test_cache_control_in_messages_or_tools(self): + body = { + "model": "glm-5-turbo", + "messages": [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": "x", + "cache_control": {"type": "ephemeral"}, + }, + ], + } + ], + } + result = _build_semantic_rejection_diagnostic(body) + assert "cache_control_fields=present" in result + + def test_body_bytes_estimated(self): + body = {"model": "glm-5-turbo", "messages": [{"role": "user", "content": "ok"}]} + result = _build_semantic_rejection_diagnostic(body) + assert "body_bytes=" in result + + def test_body_bytes_skipped_when_unserializable(self): + """请求体含非可序列化对象时不抛异常.""" + + class NonSerializable: + pass + + body = { + "model": "glm-5-turbo", + "messages": [], + "metadata": {"obj": NonSerializable()}, + } + # 不应抛异常 + result = _build_semantic_rejection_diagnostic(body) + assert "model=glm-5-turbo" in result + + def test_combined_real_world_failure_case(self): + """模拟真实失败请求形态(messages=1,无 thinking/cache_control,含 system + tools).""" + body = { + "model": "glm-5-turbo", + "messages": [{"role": "user", "content": "需要修复一个 bug"}], + "system": [{"type": "text", "text": "You are Claude Code."}], + "tools": [{"name": "Read"}, {"name": "Edit"}], + "max_tokens": 8192, + "temperature": 1.0, + "metadata": {"user_id": "x"}, + "stream": True, + } + result = _build_semantic_rejection_diagnostic(body) + assert "model=glm-5-turbo" in result + assert "messages=1" in result + assert "system_blocks=1" in result + assert "tools=2" in result + assert "max_tokens=8192" in result + assert "temperature=1.0" in result + assert "metadata_keys=1" in result + assert "stream=True" in result + # 不应包含未出现的项 + assert "thinking_blocks_in_history" not in result + assert "cache_control_fields" not in result + + +# ── Session 标题清洗与抽取测试 ───────────────────────────────── + + +class TestSanitizeUserText: + """``_sanitize_user_text`` — 剥离 CC 注入的系统级 XML 块. + + 覆盖典型 system-reminder/user-preferences 噪声、slash command + 短路、空白折叠与边界场景。 + """ + + def test_strips_system_reminder(self): + raw = "MCP 指令这是用户真实输入" + assert _sanitize_user_text(raw) == "这是用户真实输入" + + def test_strips_user_preferences(self): + raw = "用户问题遵循 AGENTS.md" + assert _sanitize_user_text(raw) == "用户问题" + + def test_strips_multiple_noise_blocks(self): + raw = ( + "A" + "B" + "C" + "D" + "真实输入文本" + "P" + ) + assert _sanitize_user_text(raw) == "真实输入文本" + + def test_strips_multiline_system_reminder(self): + """多行 system-reminder 块需被 DOTALL 完整匹配剥离.""" + raw = ( + "\n" + "# MCP Server Instructions\n" + "Use this server to fetch ...\n" + "\n" + "TITLE 中的 Session 标题应当取自用户输入" + ) + assert _sanitize_user_text(raw) == "TITLE 中的 Session 标题应当取自用户输入" + + def test_strips_tag_with_attributes(self): + """容忍标签携带属性(如 ).""" + raw = 'noise真实' + assert _sanitize_user_text(raw) == "真实" + + def test_slash_command_with_args(self): + raw = ( + "commit (user)" + "/commit" + "修复标题" + ) + assert _sanitize_user_text(raw) == "/commit 修复标题" + + def test_slash_command_no_args(self): + raw = "/review" + assert _sanitize_user_text(raw) == "/review" + + def test_collapses_whitespace(self): + raw = "X\n\n 多余 空白\t\t折叠 " + assert _sanitize_user_text(raw) == "多余 空白 折叠" + + def test_empty_after_strip(self): + raw = "仅噪声" + assert _sanitize_user_text(raw) == "" + + def test_empty_input(self): + assert _sanitize_user_text("") == "" + + def test_preserves_user_xml_like_content(self): + """用户输入中合法的 XML/HTML 片段(非白名单标签)需完整保留.""" + raw = "请帮我审查这段代码:
hello
是否符合规范?" + assert _sanitize_user_text(raw) == raw + + def test_strips_local_command_output(self): + raw = "build ok构建后的下一步问题" + assert _sanitize_user_text(raw) == "构建后的下一步问题" + + +class TestExtractSessionTitle: + """``_extract_session_title`` — 端到端从 CanonicalRequest 抽取标题.""" + + @staticmethod + def _build_request(messages: list[dict]): + return build_canonical_request({"model": "test", "messages": messages}, {}) + + def test_truncates_to_max_len(self): + long_text = "用户输入文本" * 20 + req = self._build_request([{"role": "user", "content": long_text}]) + title = _extract_session_title(req) + assert len(title) == _SESSION_TITLE_MAX_LEN + assert title == long_text[:_SESSION_TITLE_MAX_LEN] + + def test_strips_noise_from_first_user_message(self): + raw = ( + "MCP 指令" + "偏好" + "测试标题 ABC" + ) + req = self._build_request([{"role": "user", "content": raw}]) + assert _extract_session_title(req) == "测试标题 ABC" + + def test_handles_real_cc_first_message_shape(self): + """模拟 CC 真实首条消息(多个连续 system-reminder + 用户文本).""" + raw = ( + "\n# MCP Server Instructions\n..." + "\nThe following skills...\n" + "\nPlan mode is active...\n" + "\n\nTITLE 中的 Session 标题应当取自用户输入的信息前 30 个字\n\n" + "始终遵循 AGENTS.md" + ) + req = self._build_request([{"role": "user", "content": raw}]) + title = _extract_session_title(req) + assert title.startswith("TITLE 中的 Session") + assert len(title) <= _SESSION_TITLE_MAX_LEN + + def test_extracts_slash_command(self): + raw = ( + "/commit" + "feat: 新增标题清洗" + ) + req = self._build_request([{"role": "user", "content": raw}]) + assert _extract_session_title(req) == "/commit feat: 新增标题清洗" + + def test_returns_empty_when_only_noise(self): + raw = "纯噪声" + req = self._build_request([{"role": "user", "content": raw}]) + assert _extract_session_title(req) == "" + + def test_returns_empty_for_no_user_messages(self): + req = self._build_request([{"role": "assistant", "content": "你好"}]) + assert _extract_session_title(req) == "" + + def test_skips_noise_only_part_to_find_real_input(self): + """首个 user text part 全噪声时,fallback 到下一个非空 user part.""" + messages = [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": "noise", + }, + {"type": "text", "text": "真实问题"}, + ], + } + ] + req = self._build_request(messages) + assert _extract_session_title(req) == "真实问题" + + def test_skips_assistant_role(self): + """assistant 角色的文本不应被作为标题候选.""" + messages = [ + {"role": "assistant", "content": "上一轮回答"}, + {"role": "user", "content": "新的用户问题"}, + ] + req = self._build_request(messages) + assert _extract_session_title(req) == "新的用户问题" diff --git a/tests/test_schema.py b/tests/test_schema.py index ae7120e..30d691c 100644 --- a/tests/test_schema.py +++ b/tests/test_schema.py @@ -31,7 +31,8 @@ def test_antigravity_fields_set(): def test_zhipu_fields_set(): assert "api_key" in _ZHIPU_FIELDS - assert len(_ZHIPU_FIELDS) == 1 + assert "concurrency" in _ZHIPU_FIELDS + assert len(_ZHIPU_FIELDS) == 2 def test_vendor_exclusive_fields_mapping_complete(): diff --git a/tests/test_session_aware.py b/tests/test_session_aware.py index 0c08449..29518e5 100644 --- a/tests/test_session_aware.py +++ b/tests/test_session_aware.py @@ -160,6 +160,8 @@ async def test_query_recent_sessions_basic(logger): model_served="claude-sonnet", input_tokens=100 * (i + 1), output_tokens=50 * (i + 1), + cache_creation_tokens=10 * (i + 1), + cache_read_tokens=1000 * (i + 1), session_key="session-alpha", duration_ms=100 + i * 50, ) @@ -186,9 +188,15 @@ async def test_query_recent_sessions_basic(logger): alpha = next(s for s in sessions if s["session_key"] == "session-alpha") assert alpha["total_requests"] == 3 - assert alpha["total_tokens"] == (100 + 200 + 300) + (50 + 100 + 150) - assert alpha["total_input"] == 100 + 200 + 300 - assert alpha["total_output"] == 50 + 100 + 150 + expected_input = 100 + 200 + 300 + expected_output = 50 + 100 + 150 + expected_cache_creation = 10 + 20 + 30 + expected_cache_read = 1000 + 2000 + 3000 + assert alpha["total_tokens"] == ( + expected_input + expected_output + expected_cache_creation + expected_cache_read + ) + assert alpha["total_input"] == expected_input + assert alpha["total_output"] == expected_output assert "claude-sonnet" in alpha["models"] assert "anthropic" in alpha["vendors"] assert alpha["success_rate"] == 100.0 @@ -269,12 +277,15 @@ async def test_query_session_profile_found(logger): model_served="m", input_tokens=100, output_tokens=50, + cache_creation_tokens=20, + cache_read_tokens=400, session_key="profile-test", ) profile = await logger.query_session_profile("profile-test") assert profile is not None assert profile["session_key"] == "profile-test" assert profile["total_requests"] == 1 + assert profile["total_tokens"] == 100 + 50 + 20 + 400 @pytest.mark.asyncio diff --git a/tests/test_vendor_channels.py b/tests/test_vendor_channels.py index 774b85a..f9c9bb5 100644 --- a/tests/test_vendor_channels.py +++ b/tests/test_vendor_channels.py @@ -15,12 +15,14 @@ from coding.proxy.convert.vendor_channels import ( VENDOR_TRANSITIONS, + _enforce_pairing_sanity_pass, _remove_vendor_blocks, _rewrite_srvtoolu_ids, _strip_cache_control, enforce_anthropic_tool_pairing, get_transition_channel, infer_source_vendor_from_body, + normalize_for_zhipu, prepare_copilot_to_zhipu, prepare_zhipu_to_anthropic, prepare_zhipu_to_copilot, @@ -1008,6 +1010,91 @@ def test_skips_non_matching_user_tool_result(self): assert count == 0 assert body["messages"][0]["content"][0]["tool_use_id"] == "toolu_other" + def test_two_pass_handles_inline_tool_result_before_server_tool_use(self): + """乱序回归: 同一 assistant content 内 tool_result 出现在 server_tool_use 之前. + + Zhipu GLM-5 流式响应中已观察到的真实形态。若使用单遍扫描, + Case B 在 tool_result 块上执行时 ``id_map`` 尚未被 Case A 填入, + 会漏改 ``tool_result.tool_use_id``,留下旧的 ``srvtoolu_*`` 引用, + 最终触发 Anthropic API 的 ``messages.x: tool_use ids were found + without tool_result blocks immediately after`` 400 错误。 + + 修复后的两遍扫描必须保证 ``id_map`` 在 Pass 1 完整建立、 + Pass 2 再统一改写 tool_result.tool_use_id, 与块出现顺序无关。 + """ + body = { + "messages": [ + {"role": "user", "content": "ask"}, + { + "role": "assistant", + "content": [ + { + "type": "tool_result", + "tool_use_id": "srvtoolu_oof", + "content": "out", + }, + { + "type": "server_tool_use", + "id": "srvtoolu_oof", + "name": "bash", + "input": {}, + }, + ], + }, + ], + } + count, id_map = _rewrite_srvtoolu_ids(body) + assert count == 1 + new_id = id_map["srvtoolu_oof"] + assert new_id.startswith("toolu_normalized_") + + blocks = body["messages"][1]["content"] + tool_result_block = next(b for b in blocks if b.get("type") == "tool_result") + tool_use_block = next(b for b in blocks if b.get("type") == "tool_use") + assert tool_result_block["tool_use_id"] == new_id + assert tool_use_block["id"] == new_id + assert tool_use_block["type"] == "tool_use" + + def test_two_pass_handles_tool_result_in_earlier_user_message(self): + """跨消息边界乱序: tool_result 在更早的 user 消息中先出现. + + 旧单遍扫描遍历到 msg[1] 的 user tool_result 时 ``id_map`` 还未含 + ``srvtoolu_late``(对应 tool_use 在 msg[2]),导致漏改; + 两遍扫描必须保证此场景下 tool_result.tool_use_id 仍能正确改写. + """ + body = { + "messages": [ + { + "role": "user", + "content": [ + { + "type": "tool_result", + "tool_use_id": "srvtoolu_late", + "content": "prefetched", + }, + ], + }, + { + "role": "assistant", + "content": [ + { + "type": "server_tool_use", + "id": "srvtoolu_late", + "name": "bash", + "input": {}, + }, + ], + }, + ], + } + count, id_map = _rewrite_srvtoolu_ids(body) + assert count == 1 + new_id = id_map["srvtoolu_late"] + assert body["messages"][0]["content"][0]["tool_use_id"] == new_id, ( + "Pass 2 必须改写出现位置早于 tool_use 的 tool_result.tool_use_id" + ) + assert body["messages"][1]["content"][0]["id"] == new_id + # ── infer_source_vendor_from_body 单元测试 ───────────────────────── @@ -1582,6 +1669,209 @@ def test_next_message_is_assistant_inserts_user(self): assert messages[2]["role"] == "assistant" +# ── _enforce_pairing_sanity_pass 单元测试(纵深防御兜底层) ───────────── + + +class TestEnforcePairingSanityPass: + """``_enforce_pairing_sanity_pass`` 单元测试. + + 这层是 enforce 主循环结束后的纵深防御。直接以 helper 为被测单元, + 确保即使主循环未来重构出现遗漏,sanity 仍能稳定守住 Anthropic 配对约束。 + """ + + def test_noop_when_all_paired(self): + """所有 tool_use 都已正确配对时返回空列表,不修改输入.""" + messages = [ + { + "role": "assistant", + "content": [ + {"type": "tool_use", "id": "toolu_x", "name": "bash", "input": {}} + ], + }, + { + "role": "user", + "content": [ + {"type": "tool_result", "tool_use_id": "toolu_x", "content": "ok"} + ], + }, + ] + snapshot = copy.deepcopy(messages) + result = _enforce_pairing_sanity_pass(messages) + assert result == [] + assert messages == snapshot + + def test_appends_is_error_placeholder_when_user_lacks_tool_result(self): + """assistant tool_use 但 user 缺 tool_result 时追加 is_error 占位.""" + messages = [ + { + "role": "assistant", + "content": [ + {"type": "tool_use", "id": "toolu_x", "name": "bash", "input": {}} + ], + }, + {"role": "user", "content": [{"type": "text", "text": "ok"}]}, + ] + result = _enforce_pairing_sanity_pass(messages) + assert result == ["pairing_sanity_repaired"] + user_content = messages[1]["content"] + appended = next(b for b in user_content if b.get("type") == "tool_result") + assert appended == { + "type": "tool_result", + "tool_use_id": "toolu_x", + "content": "", + "is_error": True, + } + + def test_repairs_only_missing_ids_when_partially_paired(self): + """3 tool_use 但 user 只配 2 个 tool_result 时仅补缺失项.""" + messages = [ + { + "role": "assistant", + "content": [ + {"type": "tool_use", "id": "toolu_a", "name": "bash", "input": {}}, + {"type": "tool_use", "id": "toolu_b", "name": "read", "input": {}}, + {"type": "tool_use", "id": "toolu_c", "name": "write", "input": {}}, + ], + }, + { + "role": "user", + "content": [ + {"type": "tool_result", "tool_use_id": "toolu_a", "content": "a"}, + {"type": "tool_result", "tool_use_id": "toolu_c", "content": "c"}, + ], + }, + ] + result = _enforce_pairing_sanity_pass(messages) + assert result == ["pairing_sanity_repaired"] + result_ids = { + b["tool_use_id"] + for b in messages[1]["content"] + if b.get("type") == "tool_result" + } + assert result_ids == {"toolu_a", "toolu_b", "toolu_c"} + # 仅 toolu_b 是兜底合成的 is_error 占位 + b_block = next( + b for b in messages[1]["content"] if b.get("tool_use_id") == "toolu_b" + ) + assert b_block.get("is_error") is True + a_block = next( + b for b in messages[1]["content"] if b.get("tool_use_id") == "toolu_a" + ) + assert a_block.get("is_error") is not True + + def test_warns_when_next_message_not_user(self, caplog): + """next 非 user 时只发 WARNING、不修改、不返回 adaptation. + + 主循环正常情况下已保证 next 为 user;这是退化场景的可观测性兜底。 + """ + messages = [ + { + "role": "assistant", + "content": [ + {"type": "tool_use", "id": "toolu_x", "name": "bash", "input": {}} + ], + }, + { + "role": "assistant", + "content": [{"type": "text", "text": "weird"}], + }, + ] + snapshot = copy.deepcopy(messages) + import logging + + with caplog.at_level( + logging.WARNING, logger="coding.proxy.convert.vendor_channels" + ): + result = _enforce_pairing_sanity_pass(messages) + assert result == [] + assert messages == snapshot + assert any("Sanity pass" in rec.message for rec in caplog.records) + + def test_normalizes_user_string_content_before_repair(self): + """user content 为 string 时归一化为 list 再补占位.""" + messages = [ + { + "role": "assistant", + "content": [ + {"type": "tool_use", "id": "toolu_x", "name": "bash", "input": {}} + ], + }, + {"role": "user", "content": "ack"}, + ] + result = _enforce_pairing_sanity_pass(messages) + assert result == ["pairing_sanity_repaired"] + user_content = messages[1]["content"] + assert isinstance(user_content, list) + assert user_content[0] == {"type": "text", "text": "ack"} + assert user_content[1]["tool_use_id"] == "toolu_x" + assert user_content[1]["is_error"] is True + + def test_skips_non_assistant_messages(self): + """user / system / 异常消息一律跳过.""" + messages = [ + {"role": "user", "content": "hi"}, + {"role": "system", "content": "ctx"}, + "not a dict", # type: ignore[list-item] + ] + snapshot = copy.deepcopy(messages) + result = _enforce_pairing_sanity_pass(messages) + assert result == [] + assert messages == snapshot + + def test_skips_assistant_without_tool_use(self): + """assistant 纯文本(无 tool_use)短路,不影响下一条 user.""" + messages = [ + { + "role": "assistant", + "content": [{"type": "text", "text": "just chatting"}], + }, + {"role": "user", "content": "ok"}, + ] + snapshot = copy.deepcopy(messages) + result = _enforce_pairing_sanity_pass(messages) + assert result == [] + assert messages == snapshot + + def test_enforce_main_loop_chains_sanity_helper(self): + """主 enforce 流程末尾应当调用 sanity helper,标签会出现在 adaptations.""" + # 构造主循环无法剥离/合成的退化场景:直接放一个未配对 tool_use, + # 且 user 端事先放无关 tool_result,绕过主循环的 existing check + messages = [ + { + "role": "assistant", + "content": [ + { + "type": "tool_use", + "id": "toolu_main", + "name": "bash", + "input": {}, + } + ], + }, + { + "role": "user", + "content": [ + { + "type": "tool_result", + "tool_use_id": "toolu_unrelated", + "content": "x", + } + ], + }, + ] + fixes = enforce_anthropic_tool_pairing(messages) + # 主循环 F 步会先合成 orphaned_tool_use_repaired, sanity 不再触发 + assert "orphaned_tool_use_repaired" in fixes + assert "pairing_sanity_repaired" not in fixes + # 但 toolu_main 必须最终有对应 tool_result + result_ids = { + b["tool_use_id"] + for b in messages[1]["content"] + if b.get("type") == "tool_result" + } + assert "toolu_main" in result_ids + + # ── 通道层端到端集成(zhipu 产物全量清洗) ─────────────────────────── @@ -1687,6 +1977,135 @@ def test_full_zhipu_artifacts_combined(self): assert relocated[0]["tool_use_id"] == new_id assert any("misplaced_tool_result_relocated" in a for a in adaptations) + def test_handles_out_of_order_inline_tool_result_end_to_end(self): + """端到端复现日志故障场景: assistant content 内 tool_result 排在 server_tool_use 之前. + + 生产日志 `messages.3: tool_use ids were found without tool_result blocks + immediately after: toolu_normalized_2` 错误的等价最小复现. + + 旧单遍 ``_rewrite_srvtoolu_ids`` 会漏改这种 misplaced tool_result 的 + ``tool_use_id``,使 enforce 在 extracted_tool_results 字典中以旧 ID 作 key, + 而 tool_use_ids 已是新 ID,造成 pairing 错位; 修复后两遍扫描确保 + 每个 assistant.tool_use_id 与下一条 user.tool_result.tool_use_id + 一一匹配,且消息体内不再残留任何 ``srvtoolu_*`` / ``server_tool_use``。 + """ + body = { + "messages": [ + {"role": "user", "content": "begin"}, + # 第一轮: 普通配对,建立 toolu_normalized_1 + { + "role": "assistant", + "content": [ + { + "type": "thinking", + "thinking": "...", + "signature": "zhipu_sig_1", + }, + { + "type": "server_tool_use", + "id": "srvtoolu_first", + "name": "bash", + "input": {}, + }, + ], + }, + { + "role": "user", + "content": [ + { + "type": "tool_result", + "tool_use_id": "srvtoolu_first", + "content": "first ok", + } + ], + }, + # 第二轮: 故障形态,tool_result 内联在 server_tool_use 之前 + { + "role": "assistant", + "content": [ + { + "type": "thinking", + "thinking": "...", + "signature": "zhipu_sig_2", + }, + { + "type": "tool_result", + "tool_use_id": "srvtoolu_second", + "content": "inline glm5", + }, + { + "type": "server_tool_use", + "id": "srvtoolu_second", + "name": "bash", + "input": {}, + }, + ], + }, + {"role": "user", "content": "continue"}, + ], + } + prepared, adaptations = prepare_zhipu_to_anthropic(body) + messages = prepared["messages"] + + # 所有 assistant 消息不得残留 server_tool_use / srvtoolu_* / tool_result + for msg in messages: + if msg.get("role") != "assistant": + continue + for b in msg.get("content", []): + assert isinstance(b, dict) + assert b.get("type") != "server_tool_use" + assert b.get("type") != "tool_result" + bid = b.get("id") + if isinstance(bid, str): + assert not bid.startswith("srvtoolu_"), ( + f"assistant content 残留 srvtoolu_* ID: {bid}" + ) + + # 任意 tool_result.tool_use_id 不得保留为 srvtoolu_* 形式 + for msg in messages: + for b in msg.get("content") or []: + if isinstance(b, dict) and b.get("type") == "tool_result": + tid = b.get("tool_use_id") + assert isinstance(tid, str) + assert not tid.startswith("srvtoolu_"), ( + f"tool_result 残留旧 srvtoolu_* 引用: {tid}" + ) + + # 每个 assistant 的 tool_use.id 都能在下一条 user 的 tool_result 中找到匹配 + for i, msg in enumerate(messages): + if msg.get("role") != "assistant": + continue + tool_use_ids = [ + b["id"] + for b in (msg.get("content") or []) + if isinstance(b, dict) and b.get("type") == "tool_use" and b.get("id") + ] + if not tool_use_ids: + continue + next_msg = messages[i + 1] + assert next_msg.get("role") == "user" + next_tool_result_ids = { + b["tool_use_id"] + for b in (next_msg.get("content") or []) + if isinstance(b, dict) + and b.get("type") == "tool_result" + and b.get("tool_use_id") + } + for uid in tool_use_ids: + assert uid in next_tool_result_ids, ( + f"messages[{i}].tool_use_id={uid} 在 messages[{i + 1}] 中" + f"找不到对应 tool_result(next ids = {next_tool_result_ids})" + ) + + # adaptations 覆盖关键变换 + assert any("srvtoolu_ids" in a for a in adaptations) + assert any("misplaced_tool_result_relocated" in a for a in adaptations) + assert any("thinking_blocks" in a for a in adaptations) + # sanity 不应触发: 两遍扫描 + 主 enforce 已经把所有配对补齐 + assert "pairing_sanity_repaired" not in adaptations + # 主 enforce 应当能正确把内联 tool_result 重定位、配对完整 + assert "orphaned_tool_use_repaired" not in adaptations + class TestZhipuToCopilotChannelFullCleanup: """验证 prepare_zhipu_to_copilot 对 zhipu 产物的完整清洗.""" @@ -1729,3 +2148,80 @@ def test_rewrites_srvtoolu_and_strips_vendor_delta(self): assert prepared["messages"][1]["content"][0]["tool_use_id"] == new_id assert any("zhipu_vendor_blocks" in a for a in adaptations) assert any("srvtoolu_ids" in a for a in adaptations) + + +# ── normalize_for_zhipu 共享清洗函数 ──────────────────────── + + +class TestNormalizeForZhipu: + """normalize_for_zhipu 共享清洗函数测试.""" + + def test_strips_cache_control_and_params(self): + body = { + "model": "claude-sonnet-4-20250514", + "messages": [], + "thinking": {"type": "enabled", "budget_tokens": 5000}, + "extended_thinking": {"type": "enabled"}, + "reasoning_effort": "high", + "system": [ + { + "type": "text", + "text": "sys", + "cache_control": {"type": "ephemeral"}, + }, + ], + "tools": [ + { + "name": "Bash", + "input_schema": {"type": "object"}, + "cache_control": {"type": "ephemeral"}, + }, + ], + } + result, adaptations = normalize_for_zhipu(body) + + assert "thinking" not in result + assert "extended_thinking" not in result + assert "reasoning_effort" not in result + assert "cache_control" not in result["system"][0] + assert "cache_control" not in result["tools"][0] + assert any("cache_control" in a for a in adaptations) + assert any("thinking" in a for a in adaptations) + assert any("reasoning_effort" in a for a in adaptations) + + def test_operates_in_place(self): + body = {"model": "x", "messages": []} + result, _ = normalize_for_zhipu(body) + assert result is body + + def test_idempotent(self): + body = { + "model": "x", + "messages": [], + "thinking": {"type": "enabled"}, + } + normalize_for_zhipu(body) + _, adaptations = normalize_for_zhipu(body) + assert adaptations == [] + + def test_no_deep_copy(self): + messages = [{"role": "user", "content": "hi"}] + body = {"model": "x", "messages": messages} + result, _ = normalize_for_zhipu(body) + assert result["messages"] is messages + + def test_preserves_supported_params(self): + body = { + "model": "x", + "messages": [{"role": "user", "content": "hello"}], + "max_tokens": 1024, + "temperature": 0.7, + "stream": True, + "metadata": {"user_id": "test"}, + } + result, adaptations = normalize_for_zhipu(body) + assert result["max_tokens"] == 1024 + assert result["temperature"] == 0.7 + assert result["stream"] is True + assert result["metadata"] == {"user_id": "test"} + assert adaptations == [] diff --git a/tests/test_vendors.py b/tests/test_vendors.py index f771e9b..bc72602 100644 --- a/tests/test_vendors.py +++ b/tests/test_vendors.py @@ -396,7 +396,7 @@ async def test_zhipu_prepare_request_preserves_metadata(): @pytest.mark.asyncio async def test_zhipu_prepare_request_preserves_thinking(): - """ZhipuVendor._prepare_request 应原样保留 thinking 字段(原生端点支持).""" + """ZhipuVendor._prepare_request 应原样保留 thinking.type=enabled(GLM 原生支持).""" mapper = ModelMapper([]) zhipu_vendor = ZhipuVendor(ZhipuConfig(api_key="sk-test"), mapper) body = { @@ -405,12 +405,35 @@ async def test_zhipu_prepare_request_preserves_thinking(): "thinking": {"type": "enabled", "budget_tokens": 10000}, } prepared_body, _ = await zhipu_vendor._prepare_request(body, {}) - # thinking 原样透传,不再剥离任何字段 + # thinking.type=enabled 原样透传(GLM 原生支持) assert prepared_body["thinking"] == {"type": "enabled", "budget_tokens": 10000} # 原始 body 不应被修改 assert body["thinking"]["budget_tokens"] == 10000 +@pytest.mark.asyncio +async def test_zhipu_prepare_request_converts_thinking_adaptive(): + """ZhipuVendor._prepare_request 应将 thinking.type=adaptive 转换为 enabled+budget. + + GLM 不支持 adaptive 类型,转换为已确认安全的 enabled + budget_tokens 格式, + 保留 thinking 能力不被阉割。 + """ + mapper = ModelMapper([]) + zhipu_vendor = ZhipuVendor(ZhipuConfig(api_key="sk-test"), mapper) + body = { + "model": "claude-opus-4-7", + "messages": [], + "thinking": {"type": "adaptive"}, + } + prepared_body, _ = await zhipu_vendor._prepare_request(body, {}) + + # adaptive 应被转换为 enabled + budget + assert prepared_body["thinking"]["type"] == "enabled" + assert prepared_body["thinking"]["budget_tokens"] == 16000 + # 原始 body 不应被修改 + assert body["thinking"] == {"type": "adaptive"} + + @pytest.mark.asyncio async def test_zhipu_prepare_request_preserves_anthropic_beta_header(): zhipu_vendor = ZhipuVendor(ZhipuConfig(api_key="sk-test"), ModelMapper([])) diff --git a/tests/test_zhipu.py b/tests/test_zhipu.py index 2eceb41..aa05b21 100644 --- a/tests/test_zhipu.py +++ b/tests/test_zhipu.py @@ -5,20 +5,23 @@ - 其余请求体/响应原样透传 - 401 错误归一化 - 能力声明全部为 NATIVE + - 429 Rate Limit 重试挽回 """ import json +from unittest.mock import AsyncMock, patch +import httpx import pytest from coding.proxy.compat.canonical import CompatibilityStatus from coding.proxy.config.schema import ModelMappingRule, ZhipuConfig from coding.proxy.routing.model_mapper import ModelMapper +from coding.proxy.vendors.native_anthropic import NativeAnthropicVendor from coding.proxy.vendors.zhipu import ZhipuVendor -@pytest.fixture -def zhipu_vendor(): +def _make_zhipu_vendor(api_key: str = "test-zhipu-key") -> ZhipuVendor: """创建使用默认配置的 ZhipuVendor 实例.""" mapper = ModelMapper( [ @@ -42,7 +45,13 @@ def zhipu_vendor(): ), ] ) - return ZhipuVendor(ZhipuConfig(api_key="test-zhipu-key"), mapper) + return ZhipuVendor(ZhipuConfig(api_key=api_key), mapper) + + +@pytest.fixture +def zhipu_vendor(): + """创建使用默认配置的 ZhipuVendor 实例.""" + return _make_zhipu_vendor() # ── 模型映射 ────────────────────────────────────────────── @@ -69,7 +78,7 @@ def test_unknown_model_falls_back_to_default(self, zhipu_vendor): class TestRequestPassthrough: - """验证 _prepare_request 仅修改 model 和 headers.""" + """验证 _prepare_request 的模型映射、headers 替换和兼容转换.""" @pytest.mark.asyncio async def test_body_passthrough_except_model(self, zhipu_vendor): @@ -94,24 +103,60 @@ async def test_body_passthrough_except_model(self, zhipu_vendor): # 仅 model 被映射 assert prepared_body["model"] == "glm-5.1" + # thinking.type=enabled 原样保留(GLM 原生支持) + assert prepared_body["thinking"] == {"type": "enabled", "budget_tokens": 5000} # 其余字段原样保留 assert prepared_body["max_tokens"] == 1024 assert prepared_body["temperature"] == 0.7 assert prepared_body["top_p"] == 0.9 assert prepared_body["stream"] is True - # thinking 不再被剥离 - assert prepared_body["thinking"] == {"type": "enabled", "budget_tokens": 5000} - # metadata 不再被剥离 assert prepared_body["metadata"] == {"user_id": "test-user"} - # system 不被删除 assert prepared_body["system"] == "You are a helpful assistant." - # tools 不被截断或过滤 assert len(prepared_body["tools"]) == 3 - # tool_choice 不被修改 assert prepared_body["tool_choice"] == {"type": "auto"} # 原始 body 未被修改(deep copy) assert body["model"] == "claude-sonnet-4-20250514" + @pytest.mark.asyncio + async def test_thinking_adaptive_converted_to_enabled(self, zhipu_vendor): + """thinking.type=adaptive 应被转换为 enabled+budget(GLM 不支持 adaptive).""" + body = { + "model": "claude-opus-4-7", + "messages": [], + "thinking": {"type": "adaptive"}, + } + prepared_body, _ = await zhipu_vendor._prepare_request(body, {}) + + assert prepared_body["thinking"]["type"] == "enabled" + assert prepared_body["thinking"]["budget_tokens"] == 16000 + # 原始 body 未被修改 + assert body["thinking"] == {"type": "adaptive"} + + @pytest.mark.asyncio + async def test_thinking_enabled_preserved_unchanged(self, zhipu_vendor): + """thinking.type=enabled 应原样保留(GLM 原生支持).""" + body = { + "model": "claude-sonnet-4-20250514", + "messages": [], + "thinking": {"type": "enabled", "budget_tokens": 8000}, + } + prepared_body, _ = await zhipu_vendor._prepare_request(body, {}) + + assert prepared_body["thinking"] == {"type": "enabled", "budget_tokens": 8000} + assert body["thinking"]["budget_tokens"] == 8000 + + @pytest.mark.asyncio + async def test_no_thinking_param_unchanged(self, zhipu_vendor): + """无 thinking 参数时不触发任何转换.""" + body = { + "model": "claude-sonnet-4-20250514", + "messages": [{"role": "user", "content": "hi"}], + } + prepared_body, _ = await zhipu_vendor._prepare_request(body, {}) + + assert "thinking" not in prepared_body + assert prepared_body["model"] == "glm-5.1" + @pytest.mark.asyncio async def test_headers_replaces_auth(self, zhipu_vendor): """验证 x-api-key 被正确设置,authorization 被剥离.""" @@ -292,3 +337,332 @@ def test_never_triggers_failover(self, zhipu_vendor): async def test_health_check_always_true(self, zhipu_vendor): result = await zhipu_vendor.check_health() assert result is True + + +# ── 429 Rate Limit 重试挽回 ───────────────────────────────── + + +def _make_429_response( + headers: dict[str, str] | None = None, +) -> httpx.Response: + """构造 429 HTTP 响应.""" + return httpx.Response( + status_code=429, + content=b'{"error":{"type":"rate_limit_error","message":"Too many requests"}}', + headers=headers or {}, + request=httpx.Request( + "POST", "https://open.bigmodel.cn/api/anthropic/v1/messages" + ), + ) + + +def _make_200_response() -> httpx.Response: + """构造 200 HTTP 响应.""" + body = json.dumps( + { + "id": "msg_test", + "type": "message", + "role": "assistant", + "content": [{"type": "text", "text": "hello"}], + "model": "glm-5.1", + "usage": {"input_tokens": 10, "output_tokens": 5}, + } + ).encode() + return httpx.Response( + status_code=200, + content=body, + headers={"content-type": "application/json"}, + request=httpx.Request( + "POST", "https://open.bigmodel.cn/api/anthropic/v1/messages" + ), + ) + + +class TestRateLimitRetry: + """429 Rate Limit 重试挽回机制.""" + + # ── 非流式 ───────────────────────────────────────────── + + @pytest.mark.asyncio + async def test_nonstream_429_retries_and_succeeds(self): + """429 两次后 200,重试成功.""" + vendor = _make_zhipu_vendor() + call_count = 0 + + async def mock_post(*args, **kwargs): + nonlocal call_count + call_count += 1 + if call_count <= 2: + return _make_429_response() + return _make_200_response() + + with patch.object(vendor, "_get_client") as mock_client: + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + resp = await vendor.send_message( + {"model": "claude-sonnet-4-20250514", "messages": []}, + {}, + ) + + assert resp.status_code == 200 + assert call_count == 3 + + @pytest.mark.asyncio + async def test_nonstream_429_exhausted_retries(self): + """连续 5 次 429,耗尽重试后返回 429.""" + vendor = _make_zhipu_vendor() + call_count = 0 + + async def mock_post(*args, **kwargs): + nonlocal call_count + call_count += 1 + return _make_429_response() + + with patch.object(vendor, "_get_client") as mock_client: + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + with patch("asyncio.sleep", new_callable=AsyncMock): + resp = await vendor.send_message( + {"model": "claude-sonnet-4-20250514", "messages": []}, + {}, + ) + + assert resp.status_code == 429 + assert call_count == 5 + + @pytest.mark.asyncio + async def test_nonstream_non_429_no_retry(self): + """500 不触发重试.""" + vendor = _make_zhipu_vendor() + call_count = 0 + + async def mock_post(*args, **kwargs): + nonlocal call_count + call_count += 1 + return httpx.Response( + status_code=500, + content=b'{"error":{"type":"api_error","message":"Internal error"}}', + request=httpx.Request("POST", "https://example.com"), + ) + + with patch.object(vendor, "_get_client") as mock_client: + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + resp = await vendor.send_message( + {"model": "claude-sonnet-4-20250514", "messages": []}, + {}, + ) + + assert resp.status_code == 500 + assert call_count == 1 + + # ── 流式 ─────────────────────────────────────────────── + + @pytest.mark.asyncio + async def test_stream_429_retries_and_succeeds(self): + """流式 429 两次后成功.""" + call_count = 0 + + async def fake_stream(self, body, headers): + nonlocal call_count + call_count += 1 + if call_count <= 2: + resp = _make_429_response() + raise httpx.HTTPStatusError( + "429", + request=resp.request, + response=resp, + ) + yield b'data: {"type":"content_block_start"}\n\n' + yield b'data: {"type":"content_block_delta"}\n\n' + + vendor = _make_zhipu_vendor() + chunks = [] + with ( + patch.object(NativeAnthropicVendor, "send_message_stream", fake_stream), + patch("asyncio.sleep", new_callable=AsyncMock), + ): + async for chunk in vendor.send_message_stream( + {"model": "claude-sonnet-4-20250514", "messages": []}, + {}, + ): + chunks.append(chunk) + + assert len(chunks) == 2 + assert call_count == 3 + + @pytest.mark.asyncio + async def test_stream_429_exhausted_retries_raises(self): + """流式连续 429,耗尽重试后 raise.""" + call_count = 0 + + async def fake_stream(self, body, headers): + nonlocal call_count + call_count += 1 + resp = _make_429_response() + raise httpx.HTTPStatusError( + "429", + request=resp.request, + response=resp, + ) + yield # 使函数成为 async generator(不可达,仅影响类型) + + vendor = _make_zhipu_vendor() + with ( + patch.object(NativeAnthropicVendor, "send_message_stream", fake_stream), + patch("asyncio.sleep", new_callable=AsyncMock), + pytest.raises(httpx.HTTPStatusError) as exc_info, + ): + async for _ in vendor.send_message_stream( + {"model": "claude-sonnet-4-20250514", "messages": []}, + {}, + ): + pass + + assert exc_info.value.response.status_code == 429 + assert call_count == 5 + + @pytest.mark.asyncio + async def test_stream_500_no_retry_raises(self): + """流式 500 不触发重试,直接 raise.""" + call_count = 0 + + async def fake_stream(self, body, headers): + nonlocal call_count + call_count += 1 + resp = httpx.Response( + status_code=500, + content=b'{"error":{"type":"api_error"}}', + request=httpx.Request("POST", "https://example.com"), + ) + raise httpx.HTTPStatusError( + "500", + request=resp.request, + response=resp, + ) + yield # 使函数成为 async generator + + vendor = _make_zhipu_vendor() + with ( + patch.object(NativeAnthropicVendor, "send_message_stream", fake_stream), + pytest.raises(httpx.HTTPStatusError) as exc_info, + ): + async for _ in vendor.send_message_stream( + {"model": "claude-sonnet-4-20250514", "messages": []}, + {}, + ): + pass + + assert exc_info.value.response.status_code == 500 + assert call_count == 1 + + # ── retry-after header ───────────────────────────────── + + @pytest.mark.asyncio + async def test_respects_retry_after_header(self): + """响应含 retry-after 时使用 server 建议延迟.""" + vendor = _make_zhipu_vendor() + call_count = 0 + sleep_delays = [] + + async def mock_post(*args, **kwargs): + nonlocal call_count + call_count += 1 + if call_count == 1: + return _make_429_response(headers={"retry-after": "2"}) + return _make_200_response() + + async def mock_sleep(delay): + sleep_delays.append(delay) + + with ( + patch.object(vendor, "_get_client") as mock_client, + patch("asyncio.sleep", side_effect=mock_sleep), + ): + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + resp = await vendor.send_message( + {"model": "claude-sonnet-4-20250514", "messages": []}, + {}, + ) + + assert resp.status_code == 200 + assert len(sleep_delays) == 1 + # retry-after=2 → 2 * 1.1 = 2.2s → 2200ms → sleep(2.2) + assert 2.0 <= sleep_delays[0] <= 2.2 + + # ── 退避延迟增长 ─────────────────────────────────────── + + @pytest.mark.asyncio + async def test_backoff_delays_increase(self): + """无 retry-after 时延迟按指数增长.""" + vendor = _make_zhipu_vendor() + sleep_delays = [] + + async def mock_sleep(delay): + sleep_delays.append(delay) + + # 禁用 jitter 以精确验证延迟 + import dataclasses + + original_jitter = vendor._rl_retry.jitter + vendor._rl_retry = dataclasses.replace(vendor._rl_retry, jitter=False) + + call_count = 0 + + async def mock_post(*args, **kwargs): + nonlocal call_count + call_count += 1 + if call_count <= 4: + return _make_429_response() + return _make_200_response() + + try: + with ( + patch.object(vendor, "_get_client") as mock_client, + patch("asyncio.sleep", side_effect=mock_sleep), + ): + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + resp = await vendor.send_message( + {"model": "claude-sonnet-4-20250514", "messages": []}, + {}, + ) + + assert resp.status_code == 200 + assert len(sleep_delays) == 4 + # initial=1000ms, multiplier=2.0 + # attempt 0: 1000 * 2^0 = 1000ms → sleep(1.0) + # attempt 1: 1000 * 2^1 = 2000ms → sleep(2.0) + # attempt 2: 1000 * 2^2 = 4000ms → sleep(4.0) + # attempt 3: 1000 * 2^3 = 8000ms → sleep(8.0) + assert sleep_delays[0] == pytest.approx(1.0) + assert sleep_delays[1] == pytest.approx(2.0) + assert sleep_delays[2] == pytest.approx(4.0) + assert sleep_delays[3] == pytest.approx(8.0) + finally: + vendor._rl_retry = dataclasses.replace( + vendor._rl_retry, jitter=original_jitter + ) + + # ── API key 缺失 ────────────────────────────────────── + + @pytest.mark.asyncio + async def test_missing_api_key_skips_retry(self): + """API key 缺失时 401 快速失败,不触发 429 重试.""" + vendor = _make_zhipu_vendor(api_key="") + resp = await vendor.send_message( + {"model": "claude-sonnet-4-20250514", "messages": []}, + {}, + ) + assert resp.status_code == 401 diff --git a/tests/test_zhipu_concurrency.py b/tests/test_zhipu_concurrency.py new file mode 100644 index 0000000..7566b24 --- /dev/null +++ b/tests/test_zhipu_concurrency.py @@ -0,0 +1,557 @@ +"""Zhipu 每模型并发限制专项测试. + +验证 ``ModelConcurrencyLimiter`` 与 ``ZhipuVendor`` 集成后的并发控制行为: + - 默认 ``concurrency.default=3`` 时同一模型最多 3 个并发 + - 超出上限时按 FIFO 排队,槽位释放后才唤醒 + - 不同模型彼此独立,互不阻塞 + - 异常路径下 Semaphore 仍能释放,避免泄漏 + - 流式请求与非流式请求共享同一信号量 + - 与 429 重试机制兼容(重试期间持续占用槽位) + - ``concurrency=None`` 时禁用限制(向后兼容) +""" + +from __future__ import annotations + +import asyncio +import json +from unittest.mock import AsyncMock, patch + +import httpx +import pytest + +from coding.proxy.config.schema import ( + ModelMappingRule, + ZhipuConcurrencyConfig, + ZhipuConfig, +) +from coding.proxy.routing.model_mapper import ModelMapper +from coding.proxy.vendors.concurrency import ModelConcurrencyLimiter +from coding.proxy.vendors.native_anthropic import NativeAnthropicVendor +from coding.proxy.vendors.zhipu import ZhipuVendor + +# ─── 测试工具 ─────────────────────────────────────────────── + + +def _make_mapper() -> ModelMapper: + """构造标准三模型映射的 ModelMapper.""" + return ModelMapper( + [ + ModelMappingRule( + pattern="claude-sonnet-.*", + target="glm-5v-turbo", + is_regex=True, + vendors=["zhipu"], + ), + ModelMappingRule( + pattern="claude-opus-.*", + target="glm-5.1", + is_regex=True, + vendors=["zhipu"], + ), + ModelMappingRule( + pattern="claude-haiku-.*", + target="glm-4.5-air", + is_regex=True, + vendors=["zhipu"], + ), + ] + ) + + +def _make_vendor( + concurrency: ZhipuConcurrencyConfig | None = None, + api_key: str = "test-zhipu-key", +) -> ZhipuVendor: + """构造一个 ZhipuVendor,默认启用并发限制(default=3).""" + cfg_kwargs: dict = {"api_key": api_key} + if concurrency is not None: + cfg_kwargs["concurrency"] = concurrency + return ZhipuVendor(ZhipuConfig(**cfg_kwargs), _make_mapper()) + + +def _make_200_response() -> httpx.Response: + body = json.dumps( + { + "id": "msg_test", + "type": "message", + "role": "assistant", + "content": [{"type": "text", "text": "ok"}], + "model": "glm-5.1", + "usage": {"input_tokens": 1, "output_tokens": 1}, + } + ).encode() + return httpx.Response( + status_code=200, + content=body, + headers={"content-type": "application/json"}, + request=httpx.Request( + "POST", "https://open.bigmodel.cn/api/anthropic/v1/messages" + ), + ) + + +def _make_429_response() -> httpx.Response: + return httpx.Response( + status_code=429, + content=b'{"error":{"type":"rate_limit_error","message":"slow down"}}', + headers={}, + request=httpx.Request( + "POST", "https://open.bigmodel.cn/api/anthropic/v1/messages" + ), + ) + + +# ─── 配置层测试 ───────────────────────────────────────────── + + +class TestZhipuConcurrencyConfig: + """ZhipuConcurrencyConfig 配置模型行为.""" + + def test_defaults(self) -> None: + cfg = ZhipuConcurrencyConfig() + assert cfg.default == 3 + assert cfg.models == {} + + def test_get_limit_falls_back_to_default(self) -> None: + cfg = ZhipuConcurrencyConfig(default=5) + assert cfg.get_limit("glm-5.1") == 5 + assert cfg.get_limit("any-unknown-model") == 5 + + def test_get_limit_uses_per_model_override(self) -> None: + cfg = ZhipuConcurrencyConfig(default=3, models={"glm-5v-turbo": 1}) + assert cfg.get_limit("glm-5v-turbo") == 1 + assert cfg.get_limit("glm-5.1") == 3 # 未覆盖时回退 default + + def test_default_must_be_positive(self) -> None: + with pytest.raises(ValueError): + ZhipuConcurrencyConfig(default=0) + + def test_zhipu_config_default_concurrency(self) -> None: + cfg = ZhipuConfig() + assert cfg.concurrency is not None + assert cfg.concurrency.default == 3 + + +# ─── ModelConcurrencyLimiter 单元测试 ────────────────────── + + +class TestModelConcurrencyLimiter: + """ModelConcurrencyLimiter 基础行为.""" + + @pytest.mark.asyncio + async def test_lazy_semaphore_creation(self) -> None: + limiter = ModelConcurrencyLimiter(ZhipuConcurrencyConfig(default=2)) + slot_a = limiter._get_or_create_slot("model-a") + slot_b = limiter._get_or_create_slot("model-b") + # 不同模型独立 slot + assert slot_a is not slot_b + # 相同模型复用 slot + assert limiter._get_or_create_slot("model-a") is slot_a + + @pytest.mark.asyncio + async def test_acquire_blocks_when_full(self) -> None: + limiter = ModelConcurrencyLimiter(ZhipuConcurrencyConfig(default=2)) + + # 占满 2 个槽位 + sem1 = await limiter.acquire("glm-5.1") + sem2 = await limiter.acquire("glm-5.1") + assert sem1 is sem2 # 同一 semaphore + + # 第 3 次 acquire 必须阻塞 + task = asyncio.create_task(limiter.acquire("glm-5.1")) + await asyncio.sleep(0.05) + assert not task.done(), "第三个请求应在排队等待" + + # 释放一个槽位后,等待者被唤醒 + sem1.release() + await asyncio.sleep(0.05) + assert task.done() + (await task).release() + sem2.release() + + @pytest.mark.asyncio + async def test_per_model_independent(self) -> None: + limiter = ModelConcurrencyLimiter( + ZhipuConcurrencyConfig(default=1, models={"glm-5.1": 1}) + ) + # 占满 glm-5.1 + sem_51 = await limiter.acquire("glm-5.1") + # glm-5v-turbo 仍可立即获取 + sem_5v = await asyncio.wait_for(limiter.acquire("glm-5v-turbo"), timeout=0.5) + assert sem_51 is not sem_5v + sem_51.release() + sem_5v.release() + + def test_diagnostics_snapshot(self) -> None: + limiter = ModelConcurrencyLimiter(ZhipuConcurrencyConfig(default=3)) + # 触发 slot 创建 + limiter._get_or_create_slot("glm-5.1") + snap = limiter.get_diagnostics() + assert "glm-5.1" in snap + assert snap["glm-5.1"]["limit"] == 3 + assert snap["glm-5.1"]["available"] == 3 + assert snap["glm-5.1"]["in_use"] == 0 + + +# ─── ZhipuVendor 集成测试:非流式 ──────────────────────────── + + +class TestZhipuVendorNonStreamConcurrency: + """非流式 send_message 的并发限制行为.""" + + @pytest.mark.asyncio + async def test_limits_parallel_requests(self) -> None: + """concurrency.default=2 时,3 个并发请求中只有 2 个同时执行.""" + vendor = _make_vendor(ZhipuConcurrencyConfig(default=2)) + active = 0 + peak = 0 + gate = asyncio.Event() + + async def mock_post(*_, **__) -> httpx.Response: + nonlocal active, peak + active += 1 + peak = max(peak, active) + # 等待外部释放,保证并发观测窗口 + await gate.wait() + active -= 1 + return _make_200_response() + + with patch.object(vendor, "_get_client") as mock_client: + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + tasks = [ + asyncio.create_task( + vendor.send_message( + {"model": "claude-opus-4-6", "messages": []}, + {}, + ) + ) + for _ in range(3) + ] + # 等待两个请求进入 active 状态 + for _ in range(40): + if active >= 2: + break + await asyncio.sleep(0.01) + + assert active == 2, "应有恰好 2 个请求在执行(第 3 个排队)" + gate.set() + results = await asyncio.gather(*tasks) + assert all(r.status_code == 200 for r in results) + assert peak == 2, "并发峰值不应超过 2" + + @pytest.mark.asyncio + async def test_per_model_independent(self) -> None: + """不同模型的槽位互不影响.""" + cfg = ZhipuConcurrencyConfig( + default=3, + models={"glm-5v-turbo": 1, "glm-5.1": 1}, + ) + vendor = _make_vendor(cfg) + gate = asyncio.Event() + seen_models: list[str] = [] + + async def mock_post(*_args, **kwargs) -> httpx.Response: + body = kwargs.get("json", {}) + seen_models.append(body.get("model", "")) + await gate.wait() + return _make_200_response() + + with patch.object(vendor, "_get_client") as mock_client: + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + # claude-opus → glm-5.1, claude-sonnet → glm-5v-turbo, + # 分属两个独立信号量,应同时执行 + task_opus = asyncio.create_task( + vendor.send_message( + {"model": "claude-opus-4-6", "messages": []}, + {}, + ) + ) + task_sonnet = asyncio.create_task( + vendor.send_message( + {"model": "claude-sonnet-4-6", "messages": []}, + {}, + ) + ) + for _ in range(40): + if len(seen_models) >= 2: + break + await asyncio.sleep(0.01) + + assert len(seen_models) == 2, "两个不同模型应并发执行" + assert set(seen_models) == {"glm-5.1", "glm-5v-turbo"} + gate.set() + await asyncio.gather(task_opus, task_sonnet) + + @pytest.mark.asyncio + async def test_semaphore_released_on_exception(self) -> None: + """上游抛异常时 Semaphore 仍应释放,后续请求不阻塞.""" + vendor = _make_vendor(ZhipuConcurrencyConfig(default=1)) + call_count = 0 + + async def mock_post(*_, **__) -> httpx.Response: + nonlocal call_count + call_count += 1 + if call_count == 1: + raise RuntimeError("upstream boom") + return _make_200_response() + + with patch.object(vendor, "_get_client") as mock_client: + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + with pytest.raises(RuntimeError): + await vendor.send_message( + {"model": "claude-opus-4-6", "messages": []}, + {}, + ) + + # 槽位应已释放,第二次请求可正常完成 + resp = await asyncio.wait_for( + vendor.send_message( + {"model": "claude-opus-4-6", "messages": []}, + {}, + ), + timeout=1.0, + ) + assert resp.status_code == 200 + + @pytest.mark.asyncio + async def test_429_retry_holds_slot(self) -> None: + """429 重试期间持续占用槽位,重试结束后释放.""" + vendor = _make_vendor(ZhipuConcurrencyConfig(default=1)) + call_count = 0 + + async def mock_post(*_, **__) -> httpx.Response: + nonlocal call_count + call_count += 1 + if call_count <= 2: + return _make_429_response() + return _make_200_response() + + with ( + patch.object(vendor, "_get_client") as mock_client, + patch("asyncio.sleep", new_callable=AsyncMock), + ): + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + resp = await vendor.send_message( + {"model": "claude-opus-4-6", "messages": []}, + {}, + ) + assert resp.status_code == 200 + assert call_count == 3 # 两次 429 + 一次成功,且共用同一槽位 + + @pytest.mark.asyncio + async def test_no_concurrency_when_config_is_none(self) -> None: + """concurrency=None 时禁用并发限制,行为与旧版完全一致.""" + # 强制构造一个 concurrency=None 的 ZhipuConfig(绕过默认工厂) + cfg = ZhipuConfig(api_key="key") + cfg = cfg.model_copy(update={"concurrency": None}) + vendor = ZhipuVendor(cfg, _make_mapper()) + assert vendor._concurrency_limiter is None + + gate = asyncio.Event() + active = 0 + peak = 0 + + async def mock_post(*_, **__) -> httpx.Response: + nonlocal active, peak + active += 1 + peak = max(peak, active) + await gate.wait() + active -= 1 + return _make_200_response() + + with patch.object(vendor, "_get_client") as mock_client: + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + tasks = [ + asyncio.create_task( + vendor.send_message( + {"model": "claude-opus-4-6", "messages": []}, + {}, + ) + ) + for _ in range(5) + ] + for _ in range(40): + if active >= 5: + break + await asyncio.sleep(0.01) + + assert peak == 5, "无并发限制时应全部并行" + gate.set() + await asyncio.gather(*tasks) + + +# ─── ZhipuVendor 集成测试:流式 ────────────────────────────── + + +class TestZhipuVendorStreamConcurrency: + """流式 send_message_stream 的并发限制行为.""" + + @pytest.mark.asyncio + async def test_stream_limits_parallel_requests(self) -> None: + """流式请求遵循并发限制,超出排队等待.""" + vendor = _make_vendor(ZhipuConcurrencyConfig(default=1)) + active = 0 + peak = 0 + gate = asyncio.Event() + + async def fake_stream(self, _body, _headers): # noqa: ARG001 + nonlocal active, peak + active += 1 + peak = max(peak, active) + try: + await gate.wait() + yield b'data: {"type":"message_start"}\n\n' + finally: + active -= 1 + + async def consume(model: str) -> int: + chunks: list[bytes] = [] + async for chunk in vendor.send_message_stream( + {"model": model, "messages": []}, {} + ): + chunks.append(chunk) + return len(chunks) + + with patch.object(NativeAnthropicVendor, "send_message_stream", fake_stream): + tasks = [asyncio.create_task(consume("claude-opus-4-6")) for _ in range(3)] + for _ in range(40): + if active >= 1: + break + await asyncio.sleep(0.01) + + assert active == 1, "concurrency=1 时只允许 1 个流式请求并发" + gate.set() + results = await asyncio.gather(*tasks) + assert all(c >= 1 for c in results) + assert peak == 1 + + @pytest.mark.asyncio + async def test_stream_releases_slot_on_completion(self) -> None: + """流式生成器正常耗尽后槽位释放.""" + vendor = _make_vendor(ZhipuConcurrencyConfig(default=1)) + + async def fake_stream(self, _body, _headers): # noqa: ARG001 + yield b'data: {"type":"message_start"}\n\n' + yield b'data: {"type":"message_stop"}\n\n' + + with patch.object(NativeAnthropicVendor, "send_message_stream", fake_stream): + # 连续两次流式请求都能完成(说明槽位被释放) + for _ in range(2): + chunks = [] + async for chunk in vendor.send_message_stream( + {"model": "claude-opus-4-6", "messages": []}, {} + ): + chunks.append(chunk) + assert len(chunks) == 2 + + # 确认 slot 当前完全可用 + assert vendor._concurrency_limiter is not None + slot = vendor._concurrency_limiter._get_or_create_slot("glm-5.1") + assert slot.available == 1 + + @pytest.mark.asyncio + async def test_stream_releases_slot_on_error(self) -> None: + """流式请求异常退出时槽位仍释放,后续请求不被阻塞.""" + vendor = _make_vendor(ZhipuConcurrencyConfig(default=1)) + call_count = 0 + + async def fake_stream(self, _body, _headers): # noqa: ARG001 + nonlocal call_count + call_count += 1 + if call_count == 1: + resp = httpx.Response( + status_code=500, + content=b'{"error":{"type":"api_error"}}', + request=httpx.Request("POST", "https://example.com"), + ) + raise httpx.HTTPStatusError("500", request=resp.request, response=resp) + yield b"" # 让函数成为 async generator(不可达) + yield b'data: {"type":"message_start"}\n\n' + + with patch.object(NativeAnthropicVendor, "send_message_stream", fake_stream): + with pytest.raises(httpx.HTTPStatusError): + async for _ in vendor.send_message_stream( + {"model": "claude-opus-4-6", "messages": []}, {} + ): + pass + + # 槽位应已释放,第二次请求可正常推进 + chunks = [] + async for chunk in vendor.send_message_stream( + {"model": "claude-opus-4-6", "messages": []}, {} + ): + chunks.append(chunk) + assert chunks == [b'data: {"type":"message_start"}\n\n'] + + @pytest.mark.asyncio + async def test_stream_and_nonstream_share_semaphore(self) -> None: + """流式与非流式请求共用同一信号量(按映射后模型分组).""" + vendor = _make_vendor(ZhipuConcurrencyConfig(default=1)) + gate = asyncio.Event() + active = 0 + + async def fake_stream(self, _body, _headers): # noqa: ARG001 + nonlocal active + active += 1 + try: + await gate.wait() + yield b'data: {"type":"message_start"}\n\n' + finally: + active -= 1 + + async def mock_post(*_, **__) -> httpx.Response: + nonlocal active + active += 1 + active -= 1 + return _make_200_response() + + with ( + patch.object(NativeAnthropicVendor, "send_message_stream", fake_stream), + patch.object(vendor, "_get_client") as mock_client, + ): + client = AsyncMock() + client.post = mock_post + mock_client.return_value = client + + # 启动流式请求并等待它占用槽位 + async def consume_stream() -> None: + async for _ in vendor.send_message_stream( + {"model": "claude-opus-4-6", "messages": []}, {} + ): + pass + + stream_task = asyncio.create_task(consume_stream()) + for _ in range(40): + if active >= 1: + break + await asyncio.sleep(0.01) + assert active == 1 + + # 非流式请求应被同一信号量阻塞 + nonstream_task = asyncio.create_task( + vendor.send_message( + {"model": "claude-opus-4-6", "messages": []}, + {}, + ) + ) + await asyncio.sleep(0.05) + assert not nonstream_task.done(), "非流式请求应等待流式释放槽位" + + # 释放后两者都能完成 + gate.set() + await asyncio.gather(stream_task, nonstream_task) diff --git a/uv.lock b/uv.lock index 79995a3..d04ad46 100644 --- a/uv.lock +++ b/uv.lock @@ -74,7 +74,7 @@ wheels = [ [[package]] name = "coding-proxy" -version = "0.4.0" +version = "0.5.0" source = { editable = "." } dependencies = [ { name = "aiosqlite" },