fix(Session): 会话标题无语义占位符根治——质量门禁 SoT + prompt 去双用 + history 增强 + 存量回填 CLI#1027
Merged
ThreeFish-AI merged 1 commit intoJun 30, 2026
Merged
Conversation
… CLI; 根因:标题生成被正确触发(反应式 + 巡检双路径在跑),但 LLM 在 history 文本稀薄时 把「指令本身」当对话内容描述,输出「首次标题」/「标题 v2」/「会话标题自动生成」等 元描述性标题并写回 DB;存量坏标题因巡检 refresh_delta=20 对短会话永不可达而卡死。 生成质量侧(源头根治): - summarization.py 新增模块级纯函数 is_semantically_vacant_title 作为标题质量门禁 单一事实源(黑名单正则 + 剥离元词汇后实质 <3 + 复述指令检测),被生成路径/回填 CLI 复用; generate_title 后处理接入——命中返回 None,宁可不写也不写无语义标题。 - 重构 prompt 消除历史双用缺陷:system_instruction 承载全部指令(含 few-shot + 元词汇禁止), contents 仅末尾追加极简触发句。 - session_service._generate_title_for_session:history 改为「首条 user + 最近 6 条」 去重合并(避免长会话丢失主题消息)、保守纳入 functionCall/functionResponse 可读摘要、 有效文本 <8 字符 return(避免空对话硬生成空洞标题)。 存量治理侧(清理卡死的坏标题): - 新增 cli_backfill_titles.py 一次性回填 CLI(negentropy backfill-session-titles): keyset 分页扫描 auto 标题、Python 侧复用 SoT 判 vacant、清空 title 及溯源字段、 per-session advisory lock + UPDATE WHERE 二次校验 title_source='auto' 护栏、 默认 dry-run、永不碰 manual/legacy、严禁 DELETE;交巡检下一 tick 自然重新生成。 - cli.py 注册 backfill-session-titles 子命令。 验证:test_summarization.py 38 例(含 35 条门禁表驱动)+ test_session_title.py/ test_title_inspector.py/test_title_source.py/test_session_hard_delete.py 共 71 例全绿; CLI dry-run/apply 端到端实测通过(dry-run 命中 6 条含泛化识别「唯一标题」,--apply 成功清空)。 🤖 Generated with [Claude Code](https://github.com/claude), [CodeX](https://openai.com), [Gemini](https://github.com/apps/gemini-code-assist) Co-Authored-By: Aurelius Huang<threefish.ai@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
Home / Studio 页 Session 列表中大量会话标题为无语义占位符(如「首次标题」「标题 v2」「会话标题自动生…」,截图实证「首次标题」重复 6 次),用户无法一眼看出对话内容。
根因(全链路排查确认)
非调度未触发——反应式
append_event → _schedule_title_generation与巡检session_title_inspect(enabled=True、300s)双路径均在跑。真正根因是「触发了但生成质量差」+「存量坏标题无法自愈」:generate_title把同一份指令文本同时塞进system_instruction与追加到contents末尾的 user message,弱模型在 history 文本稀薄时把「指令本身」当对话内容描述,输出元描述性标题;prompt 无 few-shot、无元词汇禁止。textpart,长会话丢失首条主题消息、工具密集会话 text 极少。title_source=auto后,巡检刷新条件max_seq - gen_seq >= 20对短会话永不可达,叠加_title_skip_reason的already_titled,坏标题无任何自愈路径。生产代码无占位符字面量/无 mock 泄漏,失败时只累加计数不写 title,故这些坏标题确为 LLM 实际生成并写回 DB。
改动(生成质量门禁 + 存量治理两块闭环)
生成质量侧(源头根治)
summarization.py:新增模块级纯函数is_semantically_vacant_title作为标题质量门禁单一事实源(SoT)——黑名单正则 + 剥离元词汇后实质<3+ 复述指令检测;generate_title后处理接入,命中返回 None(宁可不写也不写无语义标题);重构 prompt 消除双用(system 承载全部指令含 few-shot + 元词汇禁止,contents 仅末尾极简触发句)。session_service.py:_generate_title_for_session的 history 改为「首条 user + 最近 6 条」去重合并、保守纳入functionCall/functionResponse可读摘要、有效文本<8字符 return。存量治理侧(清理卡死的坏标题)
cli_backfill_titles.py(新增)+cli.py:一次性回填 CLInegentropy backfill-session-titles——keyset 分页扫描 auto 标题、Python 侧复用 SoT 判 vacant、清空 title 及溯源字段、per-session advisory lock + UPDATE WHERE 二次校验title_source='auto'护栏、默认 dry-run、永不碰 manual/legacy、严禁 DELETE;交巡检下一 tick 自然重新生成。验证
test_summarization.py38 例(含 35 条门禁表驱动,覆盖截图所有坏标题字面量 + 回归保护用例)全绿。test_session_title.py(+2 新增:首条消息保留 / 低内容跳过)、test_title_inspector.py、test_title_source.py、test_session_hard_delete.py共 71 例全绿。--apply --limit 2成功清空并结构化日志backfill_title_cleared,advisory lock + 护栏无报错。部署时序(运维侧,不进本 PR)
uv run negentropy backfill-session-titles(dry-run 评估 → 灰度--user/--limit→ 全量--apply);--apply上线,否则清空后巡检仍会再次生成坏标题。关联
docs/.agents/issue.mdISSUE-151。model_resolver.py硬编码默认(影响面过大)、不改Thread.metadata_形状、不改前端 fallback 行为(Session <id>兜底正确)。🤖 Generated with Claude Code, CodeX, Gemini
Co-Authored-By: Aurelius Huangthreefish.ai@gmail.com