Proj/byterover tool mode by danhdoan · Pull Request #663 · campfirein/byterover-cli

danhdoan · 2026-05-15T04:34:35Z

Summary

Lands the full byterover tool-mode project end-to-end:

HTML context-tree format. The 19-element <bv-*> vocabulary
(bv-topic, bv-task, bv-pattern, bv-rule, bv-diagram, …) is
now the on-disk format for curated topics. Parser, writer, validator
registry, element-axis index, and LLM-ready renderer all ship in
src/server/infra/render/. Markdown topics keep working via the
format detector + extension-aware reader.
Tool mode is the default. Both brv curate and brv query run
without any byterover LLM provider configured. BRV_*_TOOL_MODE
env-var gates have been removed (ENG-2815). The calling agent owns
synthesis; byterover retrieves, renders, validates, and writes.
MCP migrated to provider-free. brv-query swaps to
task:create type='query-tool-mode' and returns the structured
envelope (ENG-2836). brv-curate is replaced by
brv-curate-html — accepts pre-authored <bv-topic> HTML and routes
through the writer directly, no LLM (ENG-2837). After this lands,
MCP integrations (Claude Desktop / Cursor MCP / etc.) inherit the
no-provider property the CLI already has.
New brv read <path> command (ENG-2769) for single-topic render
- inspection.
Editorial topic viewer in webui (ENG-2810) for bv-* HTML
topics, hardened against untrusted HTML.
Telemetry producers (ENG-2741) attach token/latency/format
metadata to QueryLogEntry and CurateLogEntry via a shared
TaskUsageAggregator.

What's new

Curate

HTML writer with <bv-topic> schema validation and atomic
tmp+rename writes (ENG-2737/2738/2739).
CurateOrchestrator state machine: generate → validate → correct
→ write → retry loop (ENG-2766).
Prompt builder generated from ELEMENT_REGISTRY, with
prompt-injection-hardened delimiters (ENG-2767).
CLI surface uses in-process kickoffSession / continueSession
instead of the legacy daemon runAgentBody path (ENG-2765).
System-managed createdat / updatedat injected at write time —
the model is not allowed to author these.
confirmOverwrite guard returns existing content for the caller to
merge instead of clobbering silently.

Query

Tool-mode executor runs Tier 0/1 cache + Tier-2 retrieval without
the canRespondDirectly confidence threshold (ENG-2811/2812). The
threshold was a byterover-side judgment; calling agents now own the
"is this match good enough?" decision.
Tier 3/4 LLM-driven synthesis removed from the executor path.
HTML-routing in SearchKnowledgeService so HTML topics render
identically to MD in Tier 2.
New docs/curate-protocol.md documents the wire envelope.

MCP

brv-query migrated to the tool-mode envelope; renders matched
topics as markdown sections.
brv-curate replaced by brv-curate-html (breaking — see below).
Shared encoders: src/shared/transport/query-tool-mode-content.ts
and src/shared/transport/curate-html-content.ts.

SKILL.md

Tool-mode curate session protocol documented (ENG-2768).
Query tool-mode section added with skippedSharedCount semantics
(ENG-2814).
Deprecated LLM-provider references scrubbed (ENG-2822).

Breaking changes

MCP brv-curate tool input shape changes. The previous
{context, files, folder} natural-language inputs are gone. The new
tool (brv-curate-html / curate-html-direct task type) accepts
{html, confirmOverwrite?} — pre-authored <bv-topic> HTML from
the calling agent. MCP integrations relying on the natural-language
curate input must update. See src/server/templates/skill/SKILL.md
for the authoring guide.
BRV_CURATE_TOOL_MODE / BRV_QUERY_TOOL_MODE env vars removed.
No-op now — tool mode is the only path. Remove from any local
scripts.
useHtmlContextTree flag removed. HTML is the only curate
output format. Existing markdown topics still read fine via the
format detector; new writes are always HTML.

Tickets shipped

ENG-2737, ENG-2738, ENG-2739, ENG-2740, ENG-2741, ENG-2765, ENG-2766,
ENG-2767, ENG-2768, ENG-2769, ENG-2810, ENG-2811, ENG-2812, ENG-2814,
ENG-2815, ENG-2822, ENG-2836, ENG-2837.

Rollout

Existing .brv/context-tree/*.md topics keep working — read path
is format-aware.
New curate writes land as .html. A separate migration script
(Phat, parallel work) can re-author legacy MD topics into the
<bv-topic> HTML format and stash originals under
.brv/_archive/context-tree-md/. Not included in this PR.
Users following the "no provider needed" announcement and running
brv providers disconnect <all> will now find MCP brv-query and
brv-curate-html work end-to-end — same property as the CLI.

Test plan

npm run typecheck clean
npm run lint clean
npm test — full unit + integration suite (153 files changed,
~15k LOC added; new tests cover HTML reader/writer/elements,
curate-session, query-retrieval, MCP tool-mode envelopes,
telemetry aggregator, format detectors, prompt builder)
npm run build clean
Manual: 0-provider CLI curate + query round trip on a fresh
.brv/
Manual: 0-provider MCP brv-query + brv-curate-html from
Claude Desktop or Cursor MCP
Manual: HTML topic round-trip through webui editorial viewer
Verify migration script (Phat's parallel work) lands separately
and does not block this PR

…rser Lands the M1 keystone task: element vocabulary, per-element validators, data-driven registry, and HTML parser wrapper. No behavior change yet — no consumer wires the registry until T3 (curate path) and T4 (query path). Element vocabulary (5 elements): - bv-topic — root container; required `path`, optional importance/maturity/ recency/updatedat (lowercase per HTML5 normalization) - bv-rule — rule statements (severity, id) - bv-decision — decision records (id) - bv-bug — bug runbook entries (id, severity) - bv-fix — fix runbook entries (id) Architecture: - Data-driven `ELEMENT_REGISTRY` is the production-track guardrail. M2 vocabulary expansion is purely additive — new elements register via one entry + one schema/validator pair, no consumer code changes. - `makeAttributeValidator` factory eliminates per-element boilerplate; reduces M2's 12 additional element validators to one-line bindings. - HTML parser is parser-library-agnostic at the API boundary (`DocumentNode` / `ElementNode` / `TextNode`). parse5 used internally; consumers don't see parse5 types. - Attribute-case normalization is documented in `ElementNode` JSDoc: HTML5 lowercases attribute names at parse time, so schema keys must match parser output (lowercase), not source HTML (camelCase). Tests (106 new): - 5 per-element validator suites (61 cases total) — each ≥10 cases - Registry shape + wiring + metadata (14 cases) - HTML parser: basic parsing, malformed-input handling, walkElements, getInnerText, serializeHtml round-trip (22 cases) - End-to-end round-trip on a sample fixture covering all 5 elements (9 cases) Verification: - typecheck clean - lint 0 errors (242 warnings, all pre-existing baseline) - 7715 tests passing, 0 failing - build clean Adds parse5@^8.0.1 as a runtime dep — the W3C-spec HTML parser used by jsdom; types ship in-tree (no @types/parse5 needed). Justified vs hand- rolled parsing because robust HTML tag-soup handling is non-trivial and this is the standard library for the job.

- bv-topic schema: tighten recency regex to ^\d+(\.\d+)?$ (rejects '.', '..1', '1..2' which previously slipped past the regex), drop redundant Number.isFinite for style consistency with importance, accept timezone offsets in updatedat via {offset: true} - html-parser: replace `as unknown as Parse5DocumentFragment` cast with parse5 defaultTreeAdapter factories (createDocumentFragment, createElement, createTextNode, appendChild) — no more structural casts in the serialize path - html-parser: getInnerText now inserts a separator between sibling element children so compact emit (<p>foo.</p><p>bar.</p>) does not merge tokens for BM25 - html-parser: document <template> content limitation (parse5 places template children in .content, not childNodes) - schema docstrings + test names: replace 'warn-only behaviour' with 'parse-and-skip' (no warning is actually emitted in M1) - sample-topic round-trip: add explicit case-normalization regression test for updatedAt → updatedat

feat: [ENG-2737] HTML render layer foundation — element registry + parser

Rewrite the canonical curate tool description (loaded by ToolDescriptionLoader and presented to the agent on every curate operation) from the structured JSON-args shape (rawConcept/narrative/facts/etc.) to the M1 HTML-emission contract using the closed `<bv-*>` vocabulary defined by T1. Coverage: - All 5 elements (bv-topic, bv-rule, bv-decision, bv-bug, bv-fix) with full per-element attribute schemas: required/optional, value enums, format constraints (importance integer 0-100, recency numeric 0-1, ISO-8601 updatedat, severity enums per element, etc.). - Allowed-children semantics (any/inline/block) + pairing convention (bug+fix as siblings). - Path format (slash-separated snake_case) + domain guidelines. - Output contract: HTML only, exactly one bv-topic root, no preamble/fences/ commentary, lowercase attribute names per HTML5 spec, closed vocabulary, no invented attributes, no clarifying questions. - Three worked examples (bug+fix runbook, rule+decision pair, general topic) to anchor the model's element-pairing and id-naming conventions. Add a sanity test (test/unit/server/infra/render/curate-prompt.test.ts) that loads curate.txt and asserts every registered ELEMENT_NAME, every per-element enum value, and every output-contract directive is present. Guards against silent drift when M2 expands the vocabulary or refactors attribute schemas. Authoring fluency check (M1 T2 spike, separate harness in local-auto-test/curate-fluency/, not in this repo): 20/20 generated outputs validated against the M1 element registry on Sonnet 4.5. Gate cleared on first full run after one prompt-tuning iteration. Decision: proceed to T3. Full report: research repo at features/html-memory-conversion/milestones/01-experiment/fluency-report.md

Field-by-field review of curate.txt against the actual rendered .md context files (frontmatter + ## Reason + ## Raw Concept + ## Narrative + ## Facts) surfaced two issues with the initial T2 draft: 1. <bv-topic> carried runtime-signal attributes (importance, maturity, recency, updatedat) that were explicitly migrated to a sidecar store under research/features/runtime-signals/ (per-user, per-machine state that caused noisy `vc status` diffs and team-share conflicts). Drop them from the schema; replace with the actual frontmatter fields the markdown writer renders: title (required), summary, tags, keywords, related. 2. The 5-element vocabulary covered Rules + three M1 net-new elements (Decision/Bug/Fix) but left every other rendered MD section as free-form HTML — losing the structural mapping the writer needs. Expand M1 vocabulary to 16 elements (added 11), each mapping to a specific section in the rendered .md output: - bv-reason → ## Reason - bv-task, → ## Raw Concept (Task / Changes / Files / Flow) bv-changes, bv-files, bv-flow - bv-structure, → ## Narrative (Structure / Dependencies / bv-dependencies, Highlights / Examples / Diagrams); bv-highlights, bv-rule already covered Rules bv-examples, bv-diagram (with type + title attrs for verbatim diagram preservation) - bv-fact (subject/category/value attrs) → ## Facts list Each element follows the existing makeAttributeValidator pattern; the data-driven registry stays the single source of truth. Tests: - Update bv-topic test to drop runtime-signal cases and assert the new required-attributes (path + title); add a regression case asserting passthrough still tolerates legacy importance/maturity/ recency without enforcing them. - Add registry test asserting (a) bv-topic exposes the new optionalAttributes, (b) runtime signals are NOT in the registry metadata, (c) every registered validator accepts its own empty node + rejects mismatched-tag. - Consolidate the 9 attribute-free new elements into one shared text-only-elements.test.ts (same trivial schema; per-element duplication wouldn't add coverage). - Add dedicated bv-diagram.test.ts (type enum) and bv-fact.test.ts (category enum, subject/value pass-through). - Update sample-topic.html fixture to exercise every new element + the new frontmatter attributes; assert no runtime signals leak in. - Update curate-prompt.test.ts to assert the expanded vocabulary, category/type enums for bv-fact/bv-diagram, and that the prompt explicitly excludes runtime signals from bv-topic attributes. Re-run the M1 T2 fluency check on the same 20 fixtures with the new vocabulary: 20 / 20 valid (gate cleared); 100% cohort-appropriate placement (bug+fix → bv-bug+bv-fix; rule+decision → bv-rule+bv-decision; general → bv-reason); 122 bv-fact instances extracted across the run (~6 per fixture). Fluency report updated: research/features/html-memory-conversion/milestones/01-experiment/fluency-report.md Open contract violation persists: 18/20 outputs (90%) wrap in code fences despite explicit prompt instruction. Recommendation for T3 unchanged — strip in the response parser, don't iterate the prompt.

Second writer audit (markdown-writer.ts:270-354) found 3 rawConcept sub-fields without an HTML home: rawConcept.timestamp, rawConcept.author, rawConcept.patterns. Add the missing elements so the M1 vocabulary covers every section the writer renders. Vocabulary additions (16 → 19 elements): - <bv-timestamp> — text-only; renders as `**Timestamp:**` under `## Raw Concept`. Distinct from frontmatter createdAt/updatedAt (system-set) — this is the date the concept's data represents. - <bv-author> — text-only; renders as `**Author:**`. - <bv-pattern> — structured (flags + description as attrs, pattern as text content). Multiple siblings inside <bv-topic> collected into the `**Patterns:**` bullet list. Mirrors how <bv-fact> works. Sanitiser: - stripCodeFenceWrapper(html) added to html-parser.ts. Strips a single outer ``` ... ``` (any language tag) wrapper from the LLM response. Inner fences (e.g., <pre><code> blocks inside <bv-diagram>) survive. T3's curate executor calls this before parseHtml. Defensive sanitisation generalises better than chasing the model's quirk via prompt iteration — observed fence-wrap rate is 70% on Sonnet 4.5 even with explicit prompt instruction to the contrary. Tests: - bv-pattern dedicated test (bv-pattern.test.ts) for the structured attrs. - text-only-elements.test.ts extended for bv-timestamp + bv-author (same trivial schema as the other text-only elements). - html-parser.test.ts: 7 new cases for stripCodeFenceWrapper covering ```html, ```xml, no-language ```, leading/trailing whitespace, unwrapped passthrough, mismatched-fence passthrough, inner-fence preservation, and parse-after-strip end-to-end. - sample-topic-roundtrip.test.ts: renderable-MD coverage assertion extended to require all 15 dedicated body-section elements (was 12). - sample-topic.html fixture exercises the 3 new elements. Re-ran the M1 T2 fluency check on the same 20 fixtures with the 19-element vocabulary. Results: - 20 / 20 valid (gate cleared) - 100% cohort-appropriate placement (unchanged) - bv-timestamp 60% adoption (used where input has a clear date) - bv-author 10% (used only on inputs that name an owner) - bv-pattern 0% — model correctly did not over-emit; no fixture contained regex content. Restraint is good behaviour. - Cost: $0.61 (vs. $0.60 for the 16-element draft; negligible) Updated fluency-report.md in research repo with final coverage matrix and the 19-element → markdown-writer mapping for T3.

…Query/CurateLogEntry Add per-call token usage (input, output, cached, cache-creation), latency tiers (searchMs, llmMs, totalMs), and format mode (markdown vs html) to QueryLogEntry and CurateLogEntry. Token rollup uses a per-task TaskUsageAggregator subscribed to llmservice:usage events on the agent bus; LoggingContentGenerator emits the events on both streaming and non-streaming paths. AiSdkContentGenerator folds result.usage and providerMetadata.anthropic.cacheCreationInputTokens into a synthesized rawResponse so per-provider extraction lands consistently across Anthropic, OpenAI, and Google. Live-validated against 30 curates × 3 providers; all four token fields populate end-to-end with cacheCreationTokens correctly Anthropic-only.

[ENG-2738] HTML-emission contract in curate.txt + fluency check

- core/interfaces no longer reaches into infra: CurateUsageRecord moved to core/domain/entities/curate-log-entry, IUsageAggregator added in core/interfaces/telemetry/, TaskUsageAggregator now `implements` it. - Consolidate curate telemetry assembly inside CurateExecutor: runAgentBody invokes reportTelemetry on both happy and error paths; agent-process passes onTelemetry that forwards via task:curateResult. Removes the duplicated inline assembly in agent-process. - QueryExecutor.buildTiming prefers TaskUsageAggregator.getLlmMs() over the executeOnSession wall-clock so llmMs counts only LLM-call time, not tool-execution overhead. Falls back to wall-clock for tests. - CurateLogHandler error finalization merges telemetryFields(state.usage) so failed curates don't underreport cost when the executor's best-effort error-path reportTelemetry has populated state.usage. - OOD short-circuit in QueryExecutor routes through formatDetector.detect([]) instead of returning format: undefined, so future HTML-aware detectors don't get silently bypassed. - DRY: extract makeUsageListener helper in agent-process and reuse for both curate and query handlers. - Document that PROVIDER_TYPES iteration in LoggingContentGenerator only matches 'aiSdk' in production (all live traffic flows through AiSdkContentGenerator).

…ocation Re-review flagged the gap that error-path reportTelemetry being silent could mask a future regression where failed-curate telemetry disappears from disk. Pin both directions of the contract: - happy path: onTelemetry fires once before runAgentBody returns - error path: onTelemetry fires once before the throw propagates so the daemon can still emit task:curateResult ahead of task:error

feat: telemetry producer for QueryLogEntry + CurateLogEntry

…or wiring Wires the M1 HTML context-tree experiment into the curate flow. When the new BrvConfig.useHtmlContextTree flag is enabled, curate emits a <bv-topic> HTML document using T1's element registry (19 elements after T2's expansion) and atomically writes it to .brv/context-tree/<path>.html. When the flag is off (the default), the existing markdown curate path runs unchanged. Rolled-in T2 review fixes (PR #628 auto-review): - curate.txt: split the "Standard HTML inside <bv-*>" clause per allowed-children semantics. Inline-only subset (code/strong/em) for bv-rule, bv-task, bv-flow, bv-fact, bv-pattern, bv-timestamp, bv-author. Full block subset for bv-topic and the block-content elements. Removes the M1-passthrough/M2-strict drift the reviewer flagged. - curate.txt: prefix the examples block with a disclaimer that the fences are for prompt readability only and the agent's actual output must be bare HTML. Lower-cost mitigation for the model's stubborn fence-wrap habit (T2's stripCodeFenceWrapper still handles the remaining cases). - curate-prompt.test.ts: enum-value tests now anchor to the section after each element header instead of whole-file string match. Catches "severity moved from bv-bug to bv-decision" drift. - curate-prompt.test.ts: new test that parses every fenced example block from curate.txt with parseHtml and runs each <bv-*> element through its registered validator. Catches example typos (severity="hihg") AND vocabulary drift in one assertion. T3 proper: BrvConfig: - New optional boolean field useHtmlContextTree (default undefined == false). Plumbed through the JSON validator, withReviewDisabled, withVersion, and withSpace withers. withoutSpace already used spread, so it picks up the new field automatically. 8 new BrvConfig unit tests cover round-trip + wither preservation. format-detector (src/server/infra/render/format/format-detector.ts): - Two pure functions: getFormatForWrite(config) reads the flag, getFormatForRead(filePath) inspects the extension. Default markdown. 12 unit tests cover both branches plus edge cases (case-insensitive extensions, directory-named-foo.html with .md inside, etc.). html-writer (src/server/infra/render/writer/html-writer.ts): - Public surface: validateHtmlTopic(html) (pure validation, no I/O) and writeHtmlTopic({contextTreeRoot, rawHtml}) (validation + atomic write). Pipeline: 1. stripCodeFenceWrapper (T2-provided) on raw input. 2. parseHtml + walkElements; require exactly one <bv-topic> and a non-empty path attribute. 3. Run every typed bv-* element through its registered validator. Reject on any failure with structured errors. 4. Atomically write via DirectoryManager.writeFileAtomic to <contextTreeRoot>/<path>.html. - Path-traversal hardening: backslashes normalised, `..` segments rejected, leading slashes stripped, resolved path must stay under contextTreeRoot. 15 unit tests cover validity branches, fence variants, atomic-write success/failure, and traversal attempts. curate executor wiring: - CurateExecuteOptions gains useHtmlContextTree?: boolean. - runAgentBody branches the runtime prompt's final-step instruction: HTML mode tells the agent its FINAL RESPONSE is the bv-topic document and explicitly forbids tools.curate calls (which would write a sibling .md file). MD mode keeps the existing "verify via result.applied[].filePath" instruction unchanged. - After agent.executeOnSession returns, HTML mode routes through a new handleHtmlCurateResponse() that calls writeHtmlTopic and shapes a CurationStatus from the result (success/failed). MD mode continues to call parseCurationStatus. - 6 new unit tests cover: valid topic written, fence-wrapped response stripped, missing bv-topic → failed, invalid attribute → failed, flag=false uses MD path, flag=undefined uses MD path. agent-process plumbing: - The daemon's curate dispatcher reads cachedBrvConfig.useHtmlContextTree and forwards it on every CurateExecuteOptions call. Default false remains production-safe; the bench's Configuration B sets it to true via .brv/config.json. Tests: 7892 pass, 0 fail. Typecheck clean, lint clean, build clean. Out of scope (per the M1 plan): - Storage path migration to .brv/render/ — M2. - IR (ComponentIR) compilation — M2. - Multi-tier query / structural selectors — T4. - to-injection/to-evidence/to-summary/to-md adapters — M2. - Migration tool for existing .brv/context-tree/ — M2.

HTML is the curate format; there is no MD/HTML branching to control. Removing the flag simplifies the executor (one prompt, one response handler), the BrvConfig schema (no per-project toggle), and the format-detector (no write-side decision — only read-side dispatch remains for legacy MD files). Code changes: - BrvConfig: remove the optional `useHtmlContextTree` field, the JSON validator branch, all wither plumbing, and the dedicated test block. - ICurateExecutor: remove `useHtmlContextTree` from CurateExecuteOptions. - CurateExecutor: collapse the prompt branch — every curate runs the HTML-emission instruction. Always route the agent's response through `handleHtmlCurateResponse`. Remove the dead `parseCurationStatus` method (the JSON-status-block path is gone). - agent-process: drop the `useHtmlContextTree` plumbing on the curate dispatch site. - format-detector: drop `getFormatForWrite` (no decision to make); keep `getFormatForRead` so the query/search read path can still route legacy `.md` files transparently. Drop the BrvConfig import. Test changes: - BrvConfig: drop the 8-case `useHtmlContextTree` describe block. - format-detector: drop the 4 `getFormatForWrite` cases. - curate-executor-html-mode: drop the two MD-fallback cases (no flag to toggle); the file is now strictly HTML-emission tests. Comment cleanup (separate concern, same commit): Strip milestone scaffolding (M1, M2, T1–T7) from comments across the render layer. Comments now describe logic and decisions, not the project plan that scaffolded them. Specifically: - "M1 light validation; ADR-007 §13 strict validation is M2" → "Light validation; strict validation per ADR-007 §13 is future work" - "M1 element registry"/"M1 vocabulary" → "element registry"/"closed bv-* vocabulary" - "T3 wires the writer", "T4 owns the index", etc. → describe the consumer by role, not by milestone task - "M1 keeps the existing .brv/context-tree/ storage" → describe what the resolver encodes, not which milestone made the decision - Test descriptions: "(parse-and-skip — M1 light validation)" → "(parse-and-skip — light validation)" Pipeline: - 7878 tests pass (lost 14 with the dropped flag — 8 BrvConfig + 4 format-detector write + 2 executor MD-fallback). - Typecheck clean, lint clean, build clean.

End-to-end smoke testing surfaced a gap: the executor's runtime prompt told the agent to emit HTML as its final response, but the base system prompt (system-prompt.yml) had ~500 lines of detailed guidance for the old structured-args curate flow (UPSERT/ADD/UPDATE/MERGE/DELETE with rawConcept/narrative/facts content shape). The system prompt's specifics overrode the runtime instruction; the agent kept building JS data structures via code_exec and never emitted HTML, so the writer saw nothing to validate and no .html file landed. Rewrite the curate-related sections of system-prompt.yml to match the HTML-emission contract: - Update <TOOL_QUICK_REFERENCE>, <WORKFLOW_SUMMARY>, <PATH_TITLE_CLARIFICATION>, <PROHIBITED_BEHAVIORS>, <DIRECTORIES>, and the <SANDBOX_PROGRAMMING> curate example to instruct HTML emission. The agent's final response IS the bv-topic document; do NOT call tools.curate; do NOT wrap in code fences. - Replace the entire ## Curation Workflow section (~500 lines, lines 275–786) with a tighter HTML-focused workflow (~217 lines): - Keep the recon → mapExtract → dedup → groupBySubject preprocessing helpers (still useful for chunked/large inputs). - Document the bv-topic frontmatter (path/title required; summary, tags, keywords, related optional) and explicitly call out that importance/maturity/recency/updatedat are sidecar runtime signals, not authored. - Map MD sections to bv-* elements (Reason → bv-reason, Raw Concept sub-fields → bv-task/bv-changes/bv-files/bv-flow/bv-timestamp/ bv-author/bv-pattern, Narrative subsections → bv-structure/ bv-dependencies/bv-highlights/bv-rule/bv-examples/bv-diagram, Facts → bv-fact, runbooks → bv-decision/bv-bug/bv-fix). - Document inline-content vs block-content elements and the standard HTML allowed inside each. - Keep facts extraction guidance, content quality requirements, temporal preservation, and content preservation rules — translated to the bv-* element vocabulary. - Drop the operation-type table (UPSERT/ADD/UPDATE/MERGE/DELETE), Two-Part Context Model section, domainContext/topicContext/ subtopicContext (these belonged to the structured-args flow that no longer exists). - Update the query-side temporal-reasoning hint to reference bv-timestamp content + sidecar runtime signals instead of rawConcept/narrative timestamps. - Update the related-topics search hint to reference the `related` attribute on bv-topic alongside the legacy `relations` frontmatter on .md topics. Net change: 196 insertions, 494 deletions. The rewritten prompt is ~217 lines shorter and aligned with the actual emission contract in curate.txt. Verified end-to-end via local-auto-test/curate-html-e2e/ harness: - 3/3 fixtures (bug-fix, rule-decision, general-context) curate to registry-valid HTML topic files (Sonnet 4.5). - File paths land at <projectDir>/.brv/context-tree/<path>.html as expected. - Per-fixture latency: 19.3s, 30.5s, 27.3s. Pipeline: 7878 unit tests pass, lint clean, typecheck clean, build clean.

…e time The previous markdown writer rendered `createdAt` and `updatedAt` in the YAML frontmatter of every topic file. The HTML writer was dropping both — the LLM doesn't author them and the writer wasn't injecting them either, so HTML topics had no temporal metadata on disk. Inject both as `<bv-topic>` attributes during the writer's atomic write, after fence-stripping and registry validation: - `updatedat` is always set to the current ISO-8601 instant. - `createdat` is preserved from the existing file on disk if one exists at the resolved path; otherwise it is set to the current ISO instant. Re-writes leave `createdat` alone, advance `updatedat`. - Any value the LLM authored for these attributes is overridden — the agent is not allowed to choose its own timestamps. Implementation: surgical regex edit on the bv-topic opening tag (no parse → re-serialize round-trip) so the LLM's formatting (whitespace, attribute order, quoting style) survives intact. Two helpers added to html-writer.ts: - setBvTopicAttributes(html, attrs) — replace-or-insert each attr. - readExistingTopicAttribute(filePath, name) — surgical read of an attribute value from an existing on-disk topic without parsing the whole document. Tests: - Existing assertions about file content equaling VALID_TOPIC updated to match-with-injected-timestamps (body preserved verbatim, bv-topic opening tag now has createdat/updatedat). - Three new cases under "system-managed timestamps": 1. First write injects both timestamps within the test window. 2. Re-writes preserve createdat, advance updatedat. 3. LLM-authored values for either attribute are overridden by the system value. Verified end-to-end via local-auto-test/curate-html-e2e/ — all 3 fixtures produce HTML topic files with system-injected createdat and updatedat ISO-8601 timestamps on the bv-topic root, alongside the LLM's frontmatter (path/title/summary/tags/keywords/related). Pipeline: 18 html-writer unit tests pass (3 new), full suite green, typecheck clean, lint clean.

CI test failure (curate-executor-html-mode.test.ts:79) — the file-content assertion still expected the LLM's raw HTML; now matches the timestamp-injected file by checking the body is preserved verbatim and that createdat/updatedat are present. Same fix in the fence-stripping test variant. Review feedback addressed: 1. Telemetry format hardcoded to 'markdown' (curate-executor.ts:485). The flag is gone; HTML is the only curate format. Updated to emit 'html' and reworded the doc comment. Updated the regression test in curate-executor.test.ts:491 to assert 'html' accordingly. 2. validateHtmlTopic does not catch path-traversal (html-writer.ts). The doc comment promised "safe to use the topic" but a payload like `<bv-topic path="../../../etc/passwd">` returned `{ok: true}` — path-traversal was only caught later when topicPathToFilePath threw. That works end-to-end (executor catches and surfaces as failed status), but is a foot-gun for any future caller using validateHtmlTopic standalone (preview, dry-run). Move the path-segment safety check into validateHtmlTopic and surface as a new structured error kind 'unsafe-path'. topicPathToFilePath keeps path.resolve as defence-in-depth. Added two unit tests covering `..` segments and single-`.` segments; updated the existing executor-side test to expect a validation failure (with `unsafe-path` errors) instead of a thrown exception. 3. Attribute-validation errors swallowed the `field` discriminant (curate-executor.ts:344). The writer captures which attribute failed (`severity`, `id`, etc.) but the executor's verification.missing string only included the tag. So a malformed `severity="urgent"` was logged as `attribute-validation (bv-rule): …` rather than `attribute-validation (bv-rule.severity): …`. Updated the mapping to include the field for attribute-validation errors. Non-blocking but improves bench-log debuggability. Reviewed but obsolete (not actioned): - "format-detector.ts ships an unused getFormatForWrite" — removed in the flag-drop commit f701cd2. - "agent-process.ts double-coerces useHtmlContextTree" — also obsolete (flag dropped). Pipeline: 7883 unit tests pass, lint clean, typecheck clean, build clean. E2e smoke (curate-html-e2e/) green on the rebuilt CLI.

[ENG-2739] curate path — HTML writer + format detector + executor wiring

…ch routing Wires the search/query path to read HTML topic files transparently alongside markdown ones. Curate emits HTML; the legacy MD path stays in for `brv swarm` and existing trees that haven't migrated. The BM25 index is format-agnostic (HTML inner-text decodes to the same plain text the markdown tokenizer expects); the indexer dispatches on file extension at read time. New modules: - src/server/infra/render/reader/html-reader.ts — readHtmlTopicSync(html) returns {bodyText, elements, topicAttributes}. bodyText is the entity-decoded inner text, ready to feed BM25. elements is a flat, document-order list of every typed `<bv-*>` node with its attribute map. topicAttributes lifts the bv-topic frontmatter (path, title, summary, tags, keywords, related). Forgiving on malformed input (parse5 returns a best-effort tree). - src/server/infra/render/reader/element-axis-index.ts — in-memory ElementAxisIndex. Two query keys: findByTag(tag) → paths findByAttribute(tag, attr, value) → paths Reverse path→keys map keeps invalidation O(keys-touched) instead of full-scan. Built in lockstep with the BM25 index; same lifetime, same invalidation triggers. Today's consumer is the optional `elementHint` filter on SearchOptions; the structural-selector grammar will hook in here without further reshaping. Search service changes (search-knowledge-service.ts): - File discovery: glob extended from `**/*.md` to `**/*.{html,md}`. - Indexing: new readIndexableContent() helper dispatches on getFormatForRead(filePath). HTML files route through readHtmlTopicSync (the BM25 input is bodyText, not raw markup) and contribute their element list to the structural-axis index. MD files index their raw content as before. - IndexedDocument gains a `format: 'html' | 'markdown'` field, set from the file extension at index time and surfaced on every SearchKnowledgeResult entry. Telemetry/query-log consumers (T5-shipped QueryLogEntry.format) populate from this without format-detection at the call site. - CachedIndex gains `elementAxisIndex: ElementAxisIndex`, populated during buildFreshIndex from the per-document elements walk. The empty-tree fallback also constructs an empty index so callers can treat the field as non-nullable. - SearchOptions gains `elementHint?: ElementHint`. When present, the search composes a MiniSearch filter that intersects scope + element-axis matches before BM25 ranking. No caller passes the hint today; M2 selector grammar is the consumer. - INDEX_SCHEMA_VERSION bumped to 6 so previously-cached indexes built against MD-only schema rebuild on next query. - Title resolution: HTML files use the bv-topic title attribute; MD files retain the existing extractTitle frontmatter walk. - The structural-selector hint is also exposed on the public SearchKnowledgeOptions as `SearchKnowledgeElementHint` — wired through ToolsSDK so SDK consumers can pass it without going through the internal SearchOptions surface. Tests: - html-reader (9 cases): bodyText extraction, entity decoding, attribute lift, elements-list document order + de-duplication, unknown-bv-* skipping, missing bv-topic graceful behaviour, malformed input non-throw. - element-axis-index (10 cases): findByTag/findByAttribute, empty-array for unknown tags, case-sensitivity on values, single-file multiple- contribution dedup, remove() drops all keys + cleans empty key sets, remove() on unknown path no-op, clear(). - search-knowledge-service-html-routing (8 cases): HTML files indexed, format field populated for both HTML and MD, raw markup is NOT searchable (inner-text only), bv-topic title attribute lifts as doc title, elementHint pre-filter (no-match returns empty, tag-only restricts, tag.attr=value restricts further). Pipeline: 7910 unit tests pass (27 new), typecheck clean, lint clean (0 errors), build clean. Out of scope (per the M1 plan): - Structural selector grammar — M2 (this PR ships the substrate). - findReferences reverse-graph — M2. - Signal-precise cache invalidation — mtime-based today is sufficient. - BM25 ranking parity test (MD vs HTML same content) — M2 bench measures end-to-end; the routing tests above cover the determinism boundary.

- HTML BM25 input now concatenates `<bv-topic>` summary/tags/keywords/related into the indexable content. Without this, a term living only in those attributes ranked HTML topics far below the markdown equivalent (where YAML frontmatter is fed verbatim into BM25). - HTML title fallback uses `?.trim() || fallbackTitle` so a `title=""` or whitespace-only attribute doesn't leak into BM25's 3x-boosted title field. - `ElementHint` (service + tools-sdk) becomes a discriminated union so `attribute` and `value` must be supplied together at compile time. - `ElementAxisIndex` storage swaps the stringly-keyed `${tag}.${attr}=${value}` table for nested Maps — eliminates the `.`/`=`-in-attribute collision class without a delimiter discipline. - `html-reader` lifts attributes off the FIRST `<bv-topic>` encountered unconditionally rather than the first non-empty one (matches the contract on malformed input). - Glob discovery does two parallel passes (`**/*.html`, `**/*.md`) instead of relying on the underlying engine to expand `{html,md}`. Tests: +7 (BM25 attribute-payload parity, title-fallback, FS-backed reader round-trip, topic-attribute lift contract, attribute-name/value collision robustness x3). All 7917 pass; lint + typecheck clean.

[ENG-2740] query path — HTML reader + element-axis index + search routing

…sponse Tier 2 (`brv query`'s direct-search response — no LLM) reads the full file at `query-executor.ts:701` and ships the bytes back to the user through `formatDirectResponse`. For HTML topics that's raw `<bv-topic ...><bv-rule severity="must">...</bv-rule>...` markup. Two consequences observed during the M1 multi-tier audit: 1. The 5000-char content budget in `direct-search-responder.ts:11` burns on tags and attribute syntax before reaching meaningful text. Long topics get truncated mid-element. 2. Any LLM-shaped consumer downstream (Tier 4 agent re-reads, the bench harness, third-party integrations) re-parses the HTML. Wasted tokens; lossy semantics. Stripping every tag (bodyText only) loses the typed-element signals the format exists to carry — severity on rules, subject/value on facts, id on decisions. New `renderHtmlTopicForLlm(html)` walks the parsed tree and renders each `<bv-*>` element with a semantic prefix: - `<bv-rule severity="must" id="r-1">x</bv-rule>` → `- **Rule** [must] (r-1): x` - `<bv-fact subject="alg" value="RS256">…</bv-fact>` → `- **Fact** (subject=alg, value=RS256): …` Wired into `tryDirectSearchResponse`: when `result.format === 'html'` the full content goes through the renderer; markdown topics pass through verbatim. Renderer is forgiving on malformed input (mirrors the rest of the parse5-driven reader pipeline); on any throw we fall back to raw bytes so a single bad topic can't blank the response. Tests: +15 (12 renderer, 2 Tier 2 HTML branch, 1 Tier 2 markdown regression guard). 7931 pass; lint + typecheck + build clean.

fix: Tier 2 direct response renders HTML topics into structured markdown

Post-HTML-migration, every QueryLogEntry on disk reports `format: "markdown"` regardless of whether the recall touched HTML or markdown topics. Telemetry consumers (query-log summaries, analytics dashboards, the brv-bench harness) cannot distinguish HTML vs markdown traffic — every chart flat-lines as "all markdown" even on a fully migrated tree. Root cause: T5 (ENG-2741) introduced `MarkdownOnlyFormatDetector` as a deliberate stub with the explicit comment "swap when the format-detector task lands the real extension-aware detector." T3 (ENG-2739) shipped `format-detector.ts` for read-routing inside `search-knowledge-service` but did not swap the executor's binding. The stub stayed wired as the default in `QueryExecutor`. This change: - Adds `ExtensionAwareFormatDetector` — inspects each `matchedDoc.path` extension and returns `'html'` if any HTML doc was retrieved, `'markdown'` for legacy-only recalls, `undefined` for empty recalls (cache hit, OOD short-circuit, tier 4 LLM-only). - Single-`'html'` policy is deliberate: any HTML doc is the load-bearing signal for telemetry rollups in a partially-migrated tree. Reporting `'markdown'` for a mixed result would hide HTML traffic from cost / coverage dashboards. - Strips shared-source `[alias]:` prefixes before extension parsing so shared-source paths classify the same as local. - Defaults to `'markdown'` for extensionless paths (defensive). - Wires as the default `formatDetector` binding in `QueryExecutor`. - Keeps `MarkdownOnlyFormatDetector` for tests that pin pre-migration semantics; updates its docstring + comment to reflect the new role. Tests: +9 (extension-aware detector full coverage). 7940 pass; lint runs through the pre-existing submodule config issue, otherwise clean.

ByteRover-provider runs produce QueryLogEntry records with no `inputTokens` / `outputTokens` / `cachedInputTokens` / `cacheCreationTokens` even though Tier 3 calls take seconds of llmMs. Telemetry consumers cannot compute per-query cost. Root cause: T5's telemetry pipeline depends on `GenerateContentChunk.rawResponse` being set on the terminating chunk so `LoggingContentGenerator.generateContentStream()` can capture the last non-undefined `rawResponse` and feed it to `pickRawUsage()` (`.usage ?? .usageMetadata`). The AI-SDK content generator wires this on its `'finish-step'` case but `ByteRoverLlmHttpService.extractContentFromResponse()` (the simulated stream backing every `ByteRoverContentGenerator` call) yielded chunks without ever attaching the backend response. Result: `lastRawResponse` stays undefined, no `llmservice:usage` event fires, no tokens land on the log entry. Fix: attach the full `GenerateContentResponse` as `rawResponse` on the terminating chunk in both branches of `extractContentFromResponse` — empty-content and full-content paths. The `pickRawUsage` helper already handles the Gemini-shaped (`usageMetadata`) and Claude-shaped (`usage`) payloads the ByteRover backend returns through this code path, and the existing `extractUsage('google'|'anthropic')` mappers take it from there. Tests: +3 (terminating-chunk rawResponse coverage across content, empty-parts, and function-call termination cases). 7943 pass.

The companion prompt loaded into every curate-mode invocation (`curate-detail-preservation.yml`, auto-discovered by `companion-contributor.ts:discoverCompanionPrompts('curate')` against the `curate-*.yml` glob) still instructed the agent to "store in narrative.rules / narrative.highlights / narrative.diagrams / narrative.structure / narrative.dependencies / narrative.examples / rawConcept.flow / content.facts / snippets" — every one of which is a JSON-schema field on the pre-T3 curate-tool API that no longer exists. T3 (ENG-2739) dropped `useHtmlContextTree` and made HTML the sole emission format, but T2 (ENG-2738, the HTML-emission contract) only updated `curate.txt` and missed this companion file. Empirically observable on the proj/html-mem-conversion branch: a LoCoMo-bench curate against `conv_26/session_1` produced a `<bv-topic>` with one giant `<bv-rule>` element wrapping a markdown narrative — the agent saw "store in narrative.rules" and treated `<bv-rule>` as the closest match. Other sessions in the same run emitted clean typed-element shapes; the variance correlates with whether the legacy companion prompt dominated that turn's context. This change rewrites the file so every preservation rule maps to the correct `<bv-*>` element from the closed vocabulary (`bv-rule`, `bv-task`, `bv-fact`, `bv-decision`, `bv-bug`, `bv-fix`, `bv-flow`, `bv-diagram`, `bv-highlights`, `bv-structure`, `bv-dependencies`, `bv-examples`, `bv-pattern`). General preservation principles (completeness over conciseness, exact preservation, never summarize) are kept verbatim. Adds an explicit guard at the end of the prompt telling the agent NOT to use `narrative.*` / `rawConcept.*` / `content.*` JSON-schema field names so any residual model bias toward the pre-migration API surfaces is suppressed. Tests: no test fixture asserts on this file's contents (it's a runtime agent prompt, exercised via end-to-end curate runs). YAML still parses; full suite green at 7943.

Two items from the review pass on this branch: 1. `extractContentFromResponse` had three terminating-chunk paths; only two received `rawResponse` in the original fix. The third branch — `candidates` array missing or empty (line 193) — fires when Gemini blocks output via safety filter or Claude returns a refusal-shaped response. Both providers populate `usageMetadata` / `usage` on those responses, so dropping `rawResponse` here silently loses billable tokens on every safety-block / refusal turn. Trivial completion: forward `response` from this branch too, matching the other two. New test pins the contract. 2. JSDoc rewrite on `stripSharedAlias` to justify why the helper is retained. Today `path.extname` happens to handle `[alias]:rel/path.html` correctly (the colon lives in the dirname, not the basename), so the strip is defensive rather than load-bearing. The helper stays for two reasons noted in the docstring: keeping the detector independent of `getFormatForRead`'s internals (so a future switch to URL-style parsing doesn't silently break the alias case), and making the alias-stripping branch testable in isolation. Prevents a future reader from deleting it as apparent dead code. Tests: +1 (candidates-empty terminating-chunk rawResponse contract). 7944 pass.

fix: extension-aware format detector + byterover tokens + curate prompt

…aceholder First milestone of byterover-tool-mode: ByteRover stops being an agent for curate/query and becomes a tool the user's primary agent calls. The user's LLM (Claude Code, Cursor, etc.) supplies completions; byterover validates and persists. No provider config inside byterover. This commit ships the wire-protocol surface for the curate side. The orchestrator behind it is a placeholder — TKT 02 lands the real state machine; TKT 03 fills in real prompts; TKT 04 authors the SKILL.md that drives the loop from Claude Code. Wire contract (documented at docs/curate-protocol.md): Kickoff (env-gated): BRV_CURATE_TOOL_MODE=1 brv curate "<text>" --format json → {status: needs-llm-step, sessionId, step: generate-html, prompt} Continuation (--session implies tool mode): brv curate --session <id> --response "<text>" --format json → {status: done, filePath} | {status: failed, errors[]} Without the env var, brv curate runs today's legacy agent-driven path unchanged — fully additive. The env var is the temporary opt-in; TKT 02 replaces it with a BrvConfig field as part of the real orchestrator. Placeholder semantics: - Kickoff always returns one `needs-llm-step` with a stub prompt - Continuation always returns `done` on first valid response - State on disk at `.brv/sessions/curate-<id>/state.json` (TKT 02 moves into daemon task-session sandbox vars when search-service integration lands) - Empty response → `failed` with kind=empty-response (session kept live for retry); unknown sessionId → `failed` with kind=unknown-session (terminal) Tests: 12 placeholder unit tests covering kickoff, continuation success/failure paths, on-disk state lifecycle, and envelope-contract guarantees per the protocol doc. Manual round-trip smoke verified the CLI happy path end-to-end.

…-, doc Four items from the code review: - Helper API redesign: `assertNoRemovedFlags` (which threw a bare `Error`) is now `findRemovedFlagMessage` returning `string | undefined`. Caller drives the response — JSON envelope when `--format json` is set, `this.error(...)` otherwise. Yields the canonical oclif `› Error:` rendering instead of a raw stack-leaking throw. - JSON-mode envelope: both `brv curate --format json` and `brv query --format json` now emit a structured rejection envelope on a removed flag instead of dumping a plain-stderr line. Detected with a new `argvRequestsJsonFormat(argv)` helper that mirrors the recognised forms (`--format json`, `--format=json`). - `--` terminator: the scan now stops at `--`, matching POSIX convention. Same applies to the JSON-format detector. Lets users on permissive-parse `Query` pass a literal query that contains a hyphen-prefixed token without hitting a false-positive rejection (e.g. `brv query -- what does --timeout do`). - Stale SKILL.md `-f` / `--files` references: line 639 (the "File access" paragraph that described the removed --files flow) and lines 658–660 (error-table rows for `Maximum 5 files allowed` / `File type not supported`) replaced with a single row pointing agents at the new migration-error message shape. The other three doc files were already clean. Tests - 15 unit cases for the helper (was 8; +7 for `--` terminator, --flag=value, and argvRequestsJsonFormat). - Full suite: 8521 passing, 16 pending, 0 failing. - Typecheck clean. Lint 0 errors / 262 pre-existing warnings.

refactor: [ENG-2880] drop legacy flags from brv curate / brv query

github-actions · 2026-05-21T08:23:24Z

Claude finished @danhdoan's task in 3m 52s —— View job

Re-review of PR #663 after the ENG-2880 batch (PR #687)

Two PR-specific commits since 4e84552 (c3f17aa → c4c4a52, merged via #687), plus a main-sync merge (0b98569) that pulls in unrelated v3.14.x work. The ENG-2880 pair is exactly the carry-over UX nit that has been flagged across seven prior reviews on this PR — brv curate declared --detach/-f/-d/--timeout but never consumed them, plus the --timeout description disagreement between brv query and brv dream. Both are now resolved cleanly with a shared removed-flags.ts helper and a clear migration message instead of oclif's default Nonexistent flag rendering.

Score: 4.85 / 5 — net improvement from 4.75/5. The longest-standing live finding on this PR is now closed; only cosmetic / defer-defensible items remain. Ready to merge.

What changed since `4e84552`

Commit	Scope
`c3f17aa`	ENG-2880 — drops legacy `-d/-f/--detach/--timeout` from `brv curate` and `--timeout` from `brv query`. Adds shared `src/oclif/lib/removed-flags.ts` (`findRemovedFlagMessage` + `argvRequestsJsonFormat` + `CURATE_REMOVED_FLAGS` + `QUERY_REMOVED_FLAGS`). Updates `SKILL.md`, `brv-instructions.md`, `workflow.md`, `command-reference.md` to match.
`c4c4a52`	ENG-2880 review fix — JSON envelope on rejection (was a bare `throw`), `this.error(...)` instead of stack-leaking `throw new Error()`, `--` terminator honored, scrubs `SKILL.md` lines 639 + 658-660 of stale `-f` / `--files` references. Bumps the helper's test count from 8 → 15.

Strengths

Helper-shape is well-chosen. findRemovedFlagMessage returning string | undefined lets the caller drive the response — curate uses emitToolModeEnvelope (structured {errors: [{kind: 'removed-flag', message}], ok: false, status: 'failed'}), query uses writeJsonResponse with its existing {error, status} shape — and either falls back to this.error(message, {exit: 1}) for the human path. Much better than a bare throw new Error(...) which would have stack-leaked through oclif's parser into --format json callers' stdout.
-- terminator + --flag=value form. Line 40 stops scanning at the standard POSIX terminator, which fixes the brv query -- what does --timeout do false-positive on permissive-parse (relevant because Query is strict = false). Line 42 matches the = form via startsWith('${f}='), and the = separator naturally prevents --folder-extra prefix collisions.
Symmetric scan-then-parse. Both commands run the scan before await this.parse(...), so the migration message surfaces under strict and permissive parse modes alike. Curate (strict) wouldn't have parsed the flag anyway; query (permissive) would have silently swallowed it as positional — both now reject coherently.
Doc cleanup is complete. SKILL.md lines 639 + 658-660 (the "File access" paragraph and three error-table rows for Maximum 5 files allowed / File does not exist / File type not supported) are gone, replaced with a single migration-error row. brv-instructions.md / command-reference.md / workflow.md likewise scrubbed. The agent-side description no longer instructs agents to use removed flags.
Test design. 15 unit cases cover undefined on no-match, long-form match, short-alias match, --flag=value form, first-match short-circuit, ---terminator (both "stops scanning" and "still matches BEFORE the terminator"), and the new argvRequestsJsonFormat discriminator. The "matches BEFORE --" assertion is the load-bearing case — without it, a user typing brv curate --detach -- "ok" could slip a removed flag through.

Issues

1. nit (new, stale docstring) — `useHtmlContextTree` reference in `CurateLogEntry.format` doc

src/server/core/domain/entities/curate-log-entry.ts:57-61 still says:

Format mode of the curate output. `'html'` when `useHtmlContextTree` is
on, else `'markdown'`. Settled at task start, not derived from output.

But the PR description explicitly lists useHtmlContextTree as a removed flag — HTML is now the only curate output format. The 'markdown' literal is only retained for back-compat with log entries persisted before the format migration. Also a stray *. on line 60. One-block JSDoc rewrite (suggestion posted inline). Non-blocking.

2. observation (carry-over, non-blocking) — error messages still reference removed `brv providers connect/switch` commands

Confirmed still standing across:

src/oclif/lib/daemon-client.ts:36, 38, 41, 181-182, 274-277
src/oclif/commands/dream.ts:144
src/tui/utils/error-messages.ts:17, 18, 21
src/server/infra/daemon/task-validation.ts:26, 36

brv dream is still the live reachable path. Deferred across multiple reviews; the team's pattern has been post-merge polish. Now that ENG-2880 has knocked down the longest carry-over, this is the next-oldest live item. Fix this →

3. observation (carry-over, non-blocking) — `<existing-topic>` block still inlines system-managed timestamps verbatim

src/server/core/domain/render/curate-prompt-builder.ts:119-123 unchanged. OUTPUT_CONTRACT (line 168) tells the model not to emit createdat/updatedat, but the inlined existing-topic visibly shows them. With tool-mode-dream MERGE candidates now exercising the UPDATE → correction-loop path under meta.previousSummary, worth a one-regex strip pass before the prompt builder shape settles permanently.

4. observation (carry-over, non-blocking) — `topic-loader.ts` regex parser inconsistent with `index-generator.ts` parse5

src/server/infra/dream/tool-mode/topic-loader.ts:43, 110-155 still uses BV_TOPIC_RE + parseAttributes, while src/server/infra/context-tree/index-generator.ts:208-222 uses parse5-backed parseHtml / walkElements for the same task. Entity-decoded title attributes diverge (& in the regex path stays literal). Latent — topics are writer-produced today — but worth aligning.

5. observation (carry-over, non-blocking) — CLI `ConsoleLogger` emits chalk on stderr under `--format json`

src/oclif/lib/curate-session.ts:380 (sidecar-bump) and :423 (index regen) still construct new ConsoleLogger() unconditionally. dream.ts:33 handles this with format === 'json' ? new NoOpLogger() : new ConsoleLogger(). Cheap to plumb format through ContinueOptions.

Praise

The two-commit shape. c3f17aa does the surface removal + helper; c4c4a52 does the four polish items from the synchronous code review (JSON envelope, this.error, -- terminator, doc residue). The split keeps each diff focused and reviewable — no "address review feedback" mega-commit. Even the migration sentences read well at human scale: "Pack the folder content into the <bv-topic> HTML directly before calling brv curate" tells the agent what to do, not just what's removed.
Tests track the helper's invariants, not its implementation. Cases assert on -- short-circuit behavior, --flag=value matching, and first-match priority — the user-observable contract — rather than internal control flow. Future contributors can refactor the loop without breaking the test suite if the semantics hold.
Catch on the -- terminator. The original commit (c3f17aa) didn't include the -- terminator handling; it was added in the review-fix commit (c4c4a52) to handle brv query -- what does --timeout do. Permissive-parse on Query makes this a real ergonomic concern, and the fix is one line in the helper plus one line in the format detector.
Doc/code parity. When the agent-side description still mentioned -f / --files, an LLM would have happily emitted those flags into the next brv curate call and gotten the new migration error — a confusing loop. Scrubbing both the command and the SKILL.md it ships closes that gap end-to-end.

Test plan suggestions

Optional: integration test that brv curate "ctx" --detach --format json emits the structured rejection envelope on stdout and exits non-zero. Would also lock that success: false survives the writeJsonResponse wrapper.
Optional: brv query -- --timeout 600 (positional content containing a removed flag token) succeeds — the -- terminator path needs an end-to-end smoke test, not just the unit-level helper test.

Final disposition

LGTM. ENG-2880 is high-quality follow-up — clean helper shape, complete doc parity, 15-case test coverage with the load-bearing -- and --flag=value cases explicit. Carry-over items are all known and have consistently been polished post-merge.

Todo

Identify what changed since 4e84552
Spot-check the ENG-2880 surface removal + helper + tests
Verify outstanding carry-over items from prior reviews
Post inline comments + final summary with score
· Branch: proj/byterover-tool-mode

github-actions · 2026-05-21T08:26:18Z

+export function findRemovedFlagMessage(argv: string[], removed: RemovedFlag[]): string | undefined {
+  for (const token of argv) {
+    if (token === '--') return undefined
+    for (const {flags, migration} of removed) {
+      const matched = flags.find((f) => token === f || token.startsWith(`${f}=`))
+      if (matched) return formatRejection(matched, migration)
+    }
+  }
+
+  return undefined
+}


praise: Closes the longest-standing finding from this PR (carry-over across 7 prior reviews). The helper-shape is well-chosen:

Returns string | undefined so the caller owns the response shape (JSON envelope vs this.error(...)) — much better than the original throw new Error() shape, which would have stack-leaked through oclif and broken --format json parsers.

-- terminator is honored (line 40), which fixes the brv query -- what does --timeout do false-positive case for permissive-parse.

--flag=value form is matched correctly via startsWith('${f}='), and the = separator naturally prevents --folder-extra prefix collisions.

The error message names the literal token the user typed AND surfaces a migration sentence, not a generic "removed" line.

The 15-case test file (including the ---terminator cases at lines 45-51 and 71-73) locks the contract. Nice end-to-end execution on a finicky bit of CLI ergonomics.

github-actions · 2026-05-21T08:26:27Z

+  /**
+   * Format mode of the curate output. `'html'` when `useHtmlContextTree` is
+   * on, else `'markdown'`. Settled at task start, not derived from output.
+   *.
+   */
+  format?: 'html' | 'markdown'


nit (stale doc): The docstring says 'html' when useHtmlContextTree is on, else 'markdown', but the PR description explicitly removes the useHtmlContextTree flag — HTML is now the only curate output format. Two consequences:

The condition is no longer expressible: there's no flag to check. New writes are always 'html'.

The 'markdown' branch is only set for the back-compat read path (markdown topics still parse via MarkdownWriter), which is the reader concern — but this field tags curate outputs.

If format can still legitimately be 'markdown' (e.g. legacy operations replayed from older log entries), say so explicitly. Otherwise, simplify to 'html' literal and drop the optional. Either way, the useHtmlContextTree reference here will mislead the next contributor doing log-entry archaeology.

Suggested change

/**

* Format mode of the curate output. `'html'` when `useHtmlContextTree` is

* on, else `'markdown'`. Settled at task start, not derived from output.

*.

*/

format?: 'html' | 'markdown'

/**

* Format of the curate output. Tool-mode curate always writes HTML

* (`<bv-topic>`); the `'markdown'` literal is retained for back-compat

* with log entries persisted before the format migration. Settled at

* task start, not derived from output.

*/

format?: 'html' | 'markdown'

Also: stray *. on line 60 inside the docblock — minor JSDoc lint.

Tool-mode subcommands (`brv dream scan / finalize / undo / sessions / cancel`, shipped in ENG-2858) cover the same workflow deterministically — daemon enumerates candidates, agent decides via `brv curate`, daemon archives via `finalize`. No provider required. With tool-mode as the project default the legacy LLM-driven entry surface is orphaned, so remove it cleanly. `brv dream` (no subcommand) is now a thin topic root that prints a migration hint. `--force`, `--timeout`, `--undo` on the topic root are gone; users hit `brv dream undo` instead. The `buildUndoDeps` export stays here since the `undo` subcommand imports it. Removed - `src/oclif/commands/dream.ts` LLM-driven `run()` body (was a 200-line block dispatching dream tasks through the daemon). Topic root keeps the `buildUndoDeps` helper for the `undo` subcommand. - `case 'dream'` in `agent-process.ts` — daemon-side LLM dispatch. - `AgentIdleTimeoutPolicy.onAgentIdle` dream-eligibility branch and `preDispatchCheck` for `task.type === 'dream'` in `brv-server.ts`. The auto-dispatch had no provider-config guard, so on tool-mode projects it would silently try to dispatch and fail — removal closes that latent issue. Idle agents are now just cleaned up. - `src/server/infra/executor/dream-executor.ts` + its test. - `src/server/infra/dream/dream-trigger.ts` + its test. - `src/server/infra/dream/operations/{consolidate,synthesize,prune}.ts` + their tests. Tool-mode subcommands use `src/server/infra/dream/ tool-mode/*` instead. - `src/server/infra/dream/parse-dream-response.ts` + `dream-response-schemas.ts` + their tests — no consumer after the operations files are gone. - `src/oclif/lib/timeout-deprecation.ts` + its test — only consumer was the legacy dream command. Trimmed - `test/commands/dream.test.ts` rewritten as a 1-case smoke that asserts the new topic-root behaviour (prints the subcommand listing). - `test/unit/infra/runtime-signals/sidecar-failure-logging.test.ts` dropped the `dream operations` describe block (consolidate / prune sidecar warnings). Other describe blocks (curate-tool, archive service, manifest, search, commit-6 wiring) keep their coverage. Kept - `src/server/infra/dream/dream-undo.ts`, `dream-state-service.ts`, `dream-lock-service.ts`, `dream-log-store.ts` — LLM-agnostic services reused by the tool-mode pipeline. - `src/server/infra/dream/tool-mode/*.ts` — the subcommand pipeline. - `src/oclif/commands/dream/{scan,finalize,sessions,cancel,undo}.ts`. Verified - npm run typecheck: clean - npm run lint: 0 errors (256 pre-existing warnings) - npm test: 8337 passing, 16 pending, 0 failing (was 8521 — net -184 from the dead test deletions; matches the deleted file inventory). Migration - `brv dream` (no subcommand) now prints "Use a subcommand: brv dream {scan|finalize|undo|sessions|cancel}" and exits 0. - `brv dream --force` / `--timeout` / `--undo` removed. Use the subcommands. - Background dream is no longer daemon-scheduled; the agent decides when to run `brv dream scan` (e.g. post-curate burst, on session start). The v2 model is agent-driven cadence.

…lose schema/post-work straggler gates Two should-fix items + comment cleanup from the reviewer: 1. DREAM_REMOVED_FLAGS was exported and unit-tested but never consumed. `brv dream --timeout 30` printed the topic-root hint instead of the migration message that names --timeout. Wired `findRemovedFlagMessage(this.argv, DREAM_REMOVED_FLAGS)` into the topic root with the same JSON-envelope / this.error branching the curate and query commands use. 2. Dropped 'dream' from the transport schemas + post-work gate so any straggler `type: 'dream'` task that still hits the daemon (older client, stale queue, internal stray producer) fails at the schema boundary instead of silently completing with result = ''. - TaskTypeSchema + TaskExecuteSchema type enums: 'dream' removed. - TaskExecuteSchema.force field: removed (dream was the only consumer). - TaskExecuteSchema.trigger enum: 'agent-idle' removed (idle-trigger dispatch is gone). 'cli' and 'manual' remain. - TaskCreateRequestSchema.force field: removed. - agent-process.ts post-work gate: dropped `type === 'dream'`. - task-router.ts: dropped the orphan `data.force` plumbing. Cosmetic - task-router.test.ts: swapped the `type: 'dream'` fixtures (used to exercise the generic preDispatchCheck plumbing) to `type: 'curate'`. The tests stay — they cover the gate mechanism, not dream-specific logic. - review.ts description: dropped the stale `brv dream needsReview` bullet. Tool-mode dream-finalize writes `needsReview: false` unconditionally; the toggle has no dream-side effect to describe. - Cleared stale `dream-executor` comment refs in curate-session.ts, dream-state-service.ts, agent-process.ts, and two test files. - Topic-root exit code: kept exit 0 (matches oclif topic-root default) with a clarifying code comment noting the deliberate choice. Kept - `'agent-idle'` in dream-log-schema.ts — backwards-read compat for existing dream-log entries on disk. No new entries will carry that trigger; existing files still parse. Verified - npm run typecheck clean - npm run lint 0 errors (256 pre-existing warnings) - npm test: 8337 passing, 16 pending, 0 failing

Three FE-breaking output classes addressed by prompt updates across the 7 LLM-visible curate surfaces (SKILL.md, curate-prompt-builder, MCP tool description, registry, curate.txt, system-prompt.yml, curate-detail-preservation.yml): - <li> content must be plain text — no leading -, *, •, or 1./2. markers (the renderer adds them via CSS, so a literal prefix double-bullets) - <bv-diagram> body goes directly inside the element with HTML entities for <, >, & — no <![CDATA[…]]> wrap (HTML5 parses CDATA as a bogus comment and the first --> in mermaid closes it, corrupting the source) - `related` distinguishes files from folders by suffix: file targets end in .html, folder/domain targets stay bare. Replaces the prior "no .html extension" guidance in SKILL.md. Also adds a read-only related-ref warner that resolves each ref against .brv/context-tree/ and surfaces warnings for broken (neither file nor folder exists) and ambiguous (both exist) refs. Surfaced via a new optional `warnings` field on the curate done envelope (both CLI tool-mode and daemon curate-html-direct paths). The warner never mutates the attribute and never rejects writes — refs stay advisory so legitimate forward references (target curated later) still ship. Fourth issue from the ticket (<bv-index> description attribute) is deferred to a follow-up because index.html is server-generated and needs an end-to-end config + capture UX, not a prompt or content validator change.

…lope, race-guard stat calls Two real issues from review: - MCP curate path was dropping warnings. CurateHtmlDirectResult gained an optional `warnings?: readonly string[]` field and renderEnvelope now appends ⚠ lines under the ✓ confirmation when present, matching the CLI text-mode + JSON envelope behavior. - The warner used existsSync + statSync, which is race-prone — a target deleted between the two calls would throw and bubble out of writeHtmlTopic, turning a successful curate into a reported failure even though the topic is already on disk. Switched to a single swallow-on-error statSync helper (safeStatIsFile / safeStatIsDir). Both return false on any error, so a transient FS error surfaces the ref as "broken" rather than failing the post-write check. Added a test that chmods a parent dir to 0 to exercise the EACCES path. The reviewer's "lower priority" suggestion to also plumb warnings into the legacy curate-executor.ts in-daemon path is deferred — that surface is being phased out and the fix would expand CurationStatus shape. Other reviewer notes filed as follow-ups (not addressed here): - lstatSync vs statSync for symlink handling (today the comment overstates the symlink guarantee; the resolved-path check only validates the input string, not the link target) - case-insensitive FS edge case on macOS/Windows - skip ambiguity warning when ref carries an explicit .html suffix - .md legacy-extension strip for the dwindling MD topics

The previously-warned "both <ref>.html and <ref>/ exist" case is not actually a failure mode — the suffix the agent wrote IS the choice (bare → folder, .html → file per the convention) and the FE routes by suffix. The other on-disk shape is irrelevant. Removing the case and all references to it: - checkRef now reduces to a single broken check: warn only when neither <ref>.html nor <ref>/ exists. - HtmlWriteSuccess + CurateSessionEnvelope JSDocs dropped the "broken / ambiguous" mention. - Removed the ambiguous-ref test block from related-ref-warner.test.ts (and the both-exist no-warning block that I had repurposed mid-cycle). The remaining tests cover the actual behavior cleanly. - Updated the MCP renderer multi-warning test to use two broken-ref fixtures instead of broken + ambiguous. Also addressed two PR #690 review nits in the same pass: - Stale comment in related-ref-warner.ts:82-87 still referenced the old existsSync + statSync TOCTOU shape; rewrote to describe what the current safeStat helpers actually do. - Added MCP rendering tests for the new ⚠ branch (warnings present, warnings empty, warnings field absent — defends against pre-commit daemon payloads). Mirrors the CLI-side coverage added previously. Two review nits deferred and resolved as follow-ups: - chmodSync(…, 0) test fragility under root uid (Docker node:* images). - WebUI Tasks panel does not yet decode/render the new warnings field.

…nits The warner previously passed if EITHER `<ref>.html` or `<ref>/` existed. That left a symmetric false-negative class to the dropped ambiguous case: a `.html` ref pointing at a folder (or a bare ref pointing at a file) passed silently even though the FE routes by suffix and would show a dead pill. Tightened to probe only the form the agent's suffix picked: - `.html` suffix → check `<ref>.html` (file form) only - bare → check `<ref>/` (folder form) only A shape mismatch on disk is no longer ignored — it now surfaces as a broken-ref warning, with a coaching hint on the bare-form case ("add .html if you meant a file") since "agent forgot the .html suffix" is the typical mistake. Updated the JSDoc, tightened the broken-ref messages to name the missing shape ("no file at …" vs "no folder at …"), and replaced the two now-impossible "bare ref matches file" / "`.html` ref matches folder" no-warning tests with their inverses asserting the new warnings. Same commit also addresses two cosmetic review nits: - `renderEnvelope` docblock in brv-curate-tool.ts:266-275 updated to describe the head + ⚠ lines shape (was "a single confirmation line"). - Stale sequencing comment in html-writer.test.ts:392 ("before the disk write") corrected to "after the atomic write".

fix: add HTML output contract rules and related-ref warner

github-actions · 2026-05-21T10:29:14Z

Claude finished @danhdoan's task in 5m 40s —— View job

Re-review of PR #663 after the ENG-2883 batch (PR #690)

Four PR-specific commits since c4c4a52 (8d5ded0 → f6ec3c2 → ffe6856 → 5887e2a, merged via #690). The ENG-2883 batch tackles three FE-breaking LLM-output classes via prompt updates across seven LLM-visible surfaces, plus a read-only related-ref warner that surfaces broken related= attribute targets without rejecting writes. The "ship → review → tighten" arc across the four commits is unusually mature: PR #690's review caught a symmetric false-negative (probing both shapes silently accepted dead-pill scenarios), and 5887e2a flips the probe to "only the suffix the agent chose" with the corresponding inverse tests.

Score: 4.85 / 5 — held from prior review. Quality of the new code is high; remaining items are all known carry-overs that have been deferred across multiple reviews. Ready to merge.

What changed since `c4c4a52`

Commit	Scope
`8d5ded0`	ENG-2883 — adds 3 HTML output contract rules (`<li>` plain-text, `<bv-diagram>` body-direct + entity-escape, `related` file/folder suffix convention) across 7 LLM surfaces (`SKILL.md`, `curate-prompt-builder.ts`, `brv-curate-tool.ts`, `registry.ts`, `curate.txt`, `system-prompt.yml`, `curate-detail-preservation.yml`). Adds `related-ref-warner.ts` + wiring through `HtmlWriteSuccess.warnings`.
`f6ec3c2`	ENG-2883 review fix — surfaces `warnings` in the MCP envelope (`CurateHtmlDirectResult.warnings?`), `renderEnvelope` appends `⚠ <text>` lines, swaps `existsSync+statSync` for race-resilient `safeStatIsFile / safeStatIsDir` helpers (test exercises EACCES via `chmod 0` on the parent dir).
`ffe6856`	ENG-2883 review fix — drops the "ambiguous (both exist) refs" warning case (the suffix IS the choice; the other shape is irrelevant); rewrites stale TOCTOU comment; adds MCP rendering tests for the three wire shapes (`warnings: [...]`, `warnings: []`, `warnings` absent — defends against pre-PR daemon payloads).
`5887e2a`	ENG-2883 review fix — tightens to probe ONLY the chosen shape; replaces the now-impossible "bare matches file" / "`.html` matches folder" no-warn tests with their inverses (suffix-mismatch IS a warning now). Adds coaching hint on the bare-form case ("add `.html` if you meant a file") since "agent forgot the `.html` suffix" is the typical mistake.

Strengths

The four-commit arc is a textbook "ship → review → tighten" sequence. Each commit's body explains WHAT changed, WHY (with the specific failure mode it prevents), and what was DEFERRED (with reasoning). 5887e2a in particular catches a symmetric false-negative that the original commit had — probing both forms silently accepted dead-pill scenarios where the agent's suffix and the on-disk shape disagreed. Replacing the now-impossible no-warn tests with their inverses (the new behavior) is the right shape for a "we changed the contract" follow-up.
Race-resilience is documented in source. safeStatIsFile / safeStatIsDir (related-ref-warner.ts:103-117) swallow ALL errors as "not present" with an explicit comment at l.82-89 explaining why: the warner runs post-write and must never turn a successful curate into a reported failure. The chmod-0 test (related-ref-warner.test.ts:146-167) exercises the EACCES path concretely, not just the ENOENT happy path.
Test coverage spans all three layers. The warner unit tests (16 cases) cover happy paths, broken refs, suffix-mismatch, race-resilience, and path-traversal. The writer test (html-writer.test.ts:391-427) confirms warnings flow through HtmlWriteSuccess. The CLI side (curate-session.test.ts:168-198) and MCP side (brv-curate-tool.test.ts:434-525) both lock the wire shape (present, empty, absent — including the "older daemon payload" defensive case at line 491).
3 output-contract rules across 7 surfaces — and they actually agree. <li> plain-text rule appears verbatim in 7 places. <bv-diagram> no-CDATA + entity-escape rule appears in 7 places with the same --> example. The related file-vs-folder suffix convention is consistent across SKILL.md, the MCP description, the prompt builder, the system prompt, and curate.txt. The diagram preservation note in curate-detail-preservation.yml:47-57 even includes a worked mermaid example with corrected entities. Previously this kind of multi-surface change had drifted between SKILL.md and the prompt builder; this batch lands them in sync.
Honest deferral. The PR fix: add HTML output contract rules and related-ref warner #690 commit messages explicitly list what wasn't addressed and why: legacy curate-executor.ts warnings plumbing (the surface is being phased out, would expand CurationStatus), WebUI Tasks panel doesn't decode the new warnings field yet, symlink resolution (today's comment overstates the guarantee), case-insensitive FS edge case, chmod 0 test fragility under root uid (Docker node:* images). This is the right level of transparency — the operator running real-corpus tests in two weeks won't be confused about whether something is a known gap vs. a real bug.

Issues

1. suggestion — `.htm` suffix inconsistency with the file-reader

Posted inline at related-ref-warner.ts:52. wantsFile = stripped.endsWith('.html') is case-sensitive and only matches .html — but file-context-file-reader.ts:95-98 matches both .html AND .htm (the May-15 fix), and format-detector.ts:24 treats both as HTML. So a topic that exists on disk as foo/bar.htm (migrated, manual moves) is readable as a topic but unreachable from a related="@foo/bar.htm" ref — the warner probes for a folder and surfaces it as broken.

2. observation (UX, batch curate) — forward refs surface as warnings on every first-touch

Posted inline at related-ref-warner.ts:17-19. The header comment correctly says "a forward reference to a topic about to be curated is legit" — and the warner stays advisory rather than rejecting. But the ⚠ still SURFACES on every such ref. For tool-mode dream link/merge candidate flows (paired cross-refs across two topics by design), this is the dominant case. Worth a thought once you have telemetry on how often warnings fire on real-corpus dream runs.

3. observation (security, defer-defensible) — symlink defense-in-depth at `related-ref-warner.ts:74-80`

Posted inline. The comment says symlinks "could theoretically escape; reject any resolution that does" but path.resolve() operates on the string only — it does NOT follow symlinks. The follow-up lstatSync vs statSync swap is already on the list per f6ec3c2. The actual security impact is minimal here (the warner only returns booleans, no content read).

4. issue (carry-over) — error messages still reference removed `brv providers connect/switch` commands

Confirmed still standing across:

src/oclif/lib/daemon-client.ts:36, 38, 41, 181-182, 274-277
src/oclif/commands/dream.ts:144
src/tui/utils/error-messages.ts:17, 18, 21
src/server/infra/daemon/task-validation.ts:26, 36

brv dream is still the live reachable path. Now the longest-standing carry-over since ENG-2880 closed the curate-flags one. Fix this →

5. observation (carry-overs, non-blocking)

<existing-topic> block in curate-prompt-builder.ts:119-123 still inlines system-managed timestamps verbatim. Heavier traffic now with tool-mode-dream MERGE candidates exercising the UPDATE path under meta.previousSummary. One-regex strip pass before the prompt builder shape settles permanently.
topic-loader.ts regex parser still inconsistent with index-generator.ts parse5. Latent — topics are writer-produced today — but worth aligning.
CLI ConsoleLogger chalk-on-stderr under --format json (curate-session.ts:380, 423). dream.ts:33 solves this with format === 'json' ? new NoOpLogger() : new ConsoleLogger().

Praise

Posted inline at related-ref-warner.ts:86-96 — tightening to probe only the suffix the agent chose is the right call, and replacing the "ambiguous" warning with asymmetric-mismatch coverage is a clean net improvement.
Coaching hint design. The "bare ref + only file present" warning at l.96 includes add ".html" if you meant a file — that's the specific text an agent encountering the typical "I forgot the suffix" mistake needs to see. Cheap to write, high-leverage on the agent's correction loop. The asymmetry (only the bare-form gets the hint, not the file-form) reflects empirical understanding of which mistake is more common.
MCP envelope render-test triad. brv-curate-tool.test.ts:434-525 covers three wire shapes (warnings: [...], warnings: [], warnings absent) — the third case is the load-bearing one for backward compatibility with pre-PR daemon payloads. Most teams would have written just the happy path.
Diagram preservation prompt cross-cut. Updating curate-detail-preservation.yml:47-57 to use --> in the mermaid example (instead of leaving the prior --> arrows that would now trip the renderer) shows the kind of multi-file consistency discipline that prevents prompt/contract drift. The corrected example doubles as inline documentation of the rule.

Test plan suggestions

A test for the .htm ref case (issue 1) — either assert the current "broken" warning text mentions the suffix mismatch, or add .htm to the wantsFile set and assert clean resolution against a seeded foo/bar.htm file.
A test for the symlink-inside-contextTreeRoot case (issue 3) — even if you defer the lstatSync swap, a regression test that locks the current behavior (symlink-to-outside-folder probed via stat follows the link and returns isDirectory()=true → no warning) at least documents what we have today.

Final disposition

LGTM. ENG-2883 is high-quality follow-up work — multi-surface prompt-contract alignment that previously had drifted, a well-tested read-only warner with race-resilient stat probes, and a textbook iterative review-tightening sequence across four commits. The remaining carry-overs are all known, well-documented, and have consistently been polished post-merge.

Todo

Identify what changed since c4c4a52 (8d5ded0 → f6ec3c2 → ffe6856 → 5887e2a)
Spot-check the ENG-2883 warner + envelope + prompt-contract updates
Verify outstanding carry-over items from prior reviews
Post inline comments and final summary with score
· Branch: proj/byterover-tool-mode

github-actions · 2026-05-21T10:33:10Z

+  // target is reported as broken rather than thrown. Probe only the
+  // shape the agent chose: a `.html` ref against a folder (or a bare
+  // ref against a file) is a dead pill the FE cannot route, so it
+  // must surface even when the "other" shape happens to exist.
+  if (wantsFile) {
+    if (safeStatIsFile(candidate)) return undefined
+    return `related ref "${originalRef}" was not found — no file at "${relative}.html" under the context tree`
+  }
+
+  if (safeStatIsDir(candidate)) return undefined
+  return `related ref "${originalRef}" was not found — no folder at "${relative}/" under the context tree (bare refs target folders; add ".html" if you meant a file)`


praise: Tightening this to probe only the suffix the agent chose is the right call — and the commit message captures the rationale precisely: the suffix IS the decision, and probing both forms would silently accept a dead-pill (a .html ref + only-folder-on-disk would FE-404 either way). The pair with the .html "coaching hint" on bare-vs-file mismatch (l.96) is a nice ergonomic touch — that's the typical "agent forgot the .html" mistake.

The test in related-ref-warner.test.ts:124-142 ("warns when an explicit .html ref points at a folder instead of a file" / "warns when a bare ref points at a file instead of a folder") locks the contract from both directions. Replacing the previous "ambiguous" warning with this asymmetric-mismatch coverage is a clear net improvement.

The post-write race-resilient stat path (l.103-117) is also well-considered: a transient FS error surfacing the ref as "broken" rather than throwing out of writeHtmlTopic is the right priority order — the topic is already on disk, the warner's job is advisory only.

github-actions · 2026-05-21T10:33:23Z

+  const wantsFile = stripped.endsWith('.html')
+  const withoutExt = wantsFile ? stripped.slice(0, -'.html'.length) : stripped


suggestion (consistency with the file-reader): endsWith('.html') here is case-sensitive and only matches the .html suffix — but src/server/infra/context-tree/file-context-file-reader.ts:95-98 matches both .html AND .htm, and format-detector.ts:24 treats both as HTML. So a topic that exists on disk as foo/bar.htm (migrated content, older OS conventions, manual moves) is readable as a topic but unreachable from a related="@foo/bar.htm" ref — the warner here treats .htm as a bare ref, probes for a folder, and flags it as broken even though the file exists.

Three options ranked by surface area:

Cheapest: broaden the suffix test:

const lower = stripped.toLowerCase() const wantsFile = lower.endsWith('.html') || lower.endsWith('.htm') const withoutExt = wantsFile ? stripped.slice(0, -(lower.endsWith('.html') ? 5 : 4)) : stripped

This matches the file-reader's surface.

Cleaner: add a small parseRefSuffix(stripped) → {wantsFile, withoutExt} helper that's shared between the warner and any future ref consumer (FE router, BM25 indexer).

Or just document that .htm refs are explicitly unsupported and surface a dedicated unsupported-extension warning.

Latent — topics today are writer-produced as .html — but this is a small wedge between two sibling code paths that the file-reader already had aligned. Case-sensitivity is also listed in the deferred follow-ups (f6ec3c2 commit body), and a single .toLowerCase() on stripped closes that nit at the same time.

github-actions · 2026-05-21T10:33:40Z

+ * The warner never mutates the attribute and never rejects the write
+ * — refs are advisory metadata, and a "forward reference" to a topic
+ * about to be curated is legit.


observation (UX, batch curate): The header note says "a forward reference to a topic about to be curated is legit" — and the warner correctly stays advisory rather than rejecting. But the warning still SURFACES on every such ref, which means an agent doing batch curation (curate A → curate B → curate A again to add related="@b"`` is the natural pattern, but curate A with related="@B" first → curate B produces one ⚠ on A's write).

For tool-mode dream's link/merge candidate flows (which by design produce paired cross-refs across two topics) this is the dominant case. The agent gets a ⚠ on the first write, then no ⚠ on the second — but the user-facing log line will read like a real problem.

Two thoughts (defer-defensible — not blocking):

Two-pass curate in the orchestrator. Tool-mode dream finalize could write both halves of a link/merge pair, then re-check refs once both are on disk and drop warnings that resolved. Adds complexity; only pays for itself if forward-ref noise becomes an observed pain.

Cheap-now signal in the warning text. When the resolved ref's domain folder DOES exist but the target file/folder doesn't, hint at "forward reference?" alongside the existing "no file at …" / "no folder at …" message. Tells the agent (and reviewer) "this is the routine batch case", not "this is a typo".

Worth a thought once you have telemetry on how often warnings fire on real-corpus dream runs. The asymmetric-mismatch coaching at l.96 already shows good intuition for "shape of the typical mistake"; this is the same principle applied to the other common case.

github-actions · 2026-05-21T10:33:54Z

+  // Defence in depth: even after segment filtering, the resolved path
+  // must remain inside the context-tree root. A symlink in `contextTreeRoot`
+  // could theoretically escape; reject any resolution that does.
+  const rootResolved = path.resolve(contextTreeRoot)
+  if (!isInsideRoot(candidate, rootResolved)) {
+    return `related ref "${originalRef}" resolves outside the context-tree root and was not checked`
+  }


observation (security, defer-defensible): The comment says "A symlink in contextTreeRoot could theoretically escape; reject any resolution that does." But path.resolve() does NOT follow symlinks — it operates on the string only. If <contextTreeRoot>/security is a symlink pointing at /etc, then path.resolve(contextTreeRoot, 'security/passwd') still returns <contextTreeRoot>/security/passwd (string-wise), isInsideRoot returns true, and the subsequent safeStatIs{File,Dir} would follow the symlink and probe /etc/passwd.

The current commit message (f6ec3c2) already flags this as a follow-up: "lstatSync vs statSync for symlink handling (today the comment overstates the symlink guarantee; the resolved-path check only validates the input string, not the link target)". The actual security impact is minimal here: the warner only does stat().isFile() / isDirectory() and returns a boolean — no content is read, no path is logged in detail beyond the warning string the agent is going to see anyway. So an attacker who controls contextTreeRoot symlinks could only learn "is /etc/passwd a file? yes." through warning suppression — not much of an exfiltration vector.

Still worth either:

updating the comment to be honest about the guarantee (the input-string check, not the link target), or

swapping statSync for lstatSync so symlinks are reported as not-the-expected-shape (a symlink isFile() = false, isDirectory() = false, isSymbolicLink() = true — both probes would surface the ref as broken, which is the correct user-facing result).

Cheap to fix at the same time as the case-insensitive FS deferral. Non-blocking.

…efactor/ENG-2884

…nt path normalization) The dream-scan output emits candidate paths with .html (e.g. auth/jwt.html) and the documented dream→curate merge workflow tells the agent to write the survivor at that path. The writer previously appended .html unconditionally, producing auth/jwt.html.html on disk while reporting status: done and bypassing the path-exists guard — silent data corruption during merge: the survivor's merged content landed in a stale file while the agent archived the loser thinking the merge had succeeded. topicPathToFilePath now strips a trailing .html before appending. Both path="x/y" (bare convention) and path="x/y.html" (the form dream-scan emits) collapse to the same on-disk file at x/y.html and trigger the path-exists guard identically. 4 new tests in html-writer.test.ts cover: .html-suffixed path writes to single-extension file; path-exists fires across both forms in both orderings; --overwrite leaves only one file.

… capture pre-archive mtime/signals so dream undo is bit-exact Three coupled changes that touch the search-service contract and the tool-mode dream finalize/undo path. Search service contract: - ISearchKnowledgeService gains refreshIndex(): invalidates the cached MiniSearch index so the next search() call rebuilds from disk regardless of TTL. Required because the service's 5s TTL fast-path would otherwise serve a cached index that pre-dates just-loaded topics, surfacing zero candidates on a fresh scan. - SearchKnowledgeOptions gains combineWith?: 'AND' | 'auto' | 'OR'. Default 'auto' preserves the historical AND-first-with-OR-fallback behavior. Internal SearchOptions references the public option's type to keep the two declarations from drifting if a new value is added. - bm25-pair-discovery passes combineWith: 'OR'. Multi-word distinctive titles (e.g. "Redis Caching Layer") AND-combine to terms only the source itself contains; OR-mode lets cross-pairs surface based on any-term overlap. Dream finalize / undo signal+mtime restore: - finalizeDreamSession captures each archived file's mtimeMs (via stat) and current RuntimeSignals (via runtimeSignalStore.get) before the rename, and returns them on the result alongside previousTexts. - DreamLogSchema PRUNE op gains optional previousSignals + previousMtimes (backward-compat — older logs still undo via fallback). - agent-process dream-finalize branch writes the new metadata into each PRUNE op. - DreamUndoDeps gains optional runtimeSignalStore. undoPrune restores mtime via utimes() and signals via runtimeSignalStore.set() when the PRUNE op carries the new fields; falls back gracefully when not. - buildUndoDeps threads runtimeSignalStore through. scanDreamCandidates now invokes searchService.refreshIndex() after loadToolModeTopics and before the parallel candidate-generator block, so the first scan after seed produces full results. Tests cover: refreshIndex called pre-discovery; combineWith: 'OR' passed to search; mtime+signal capture on finalize; mtime+signal restoration on undo; backward-compat fallback path; mock updates for the extended ISearchKnowledgeService in unrelated test files.

Both subcommands are intentional v1 placeholders — the daemon doesn't persist session state, so dream sessions always returns [] and dream cancel is a no-op. The previous surface advertised them as if they were functional: descriptions said 'List active...' / 'Discard a...', JSON envelopes returned success-looking shapes ({sessions: [], status: ok} / {status: cancelled}), and machine-readable consumers that branched on the response had no way to know the daemon was never consulted. This is a cheap-honesty fix, not a real persistence implementation. Real persistence is a separate ticket. Changes: - DreamSessions.description prefixed with '[v1 stub]' and now names the stateless behavior inline. - DreamCancel.description prefixed with '[v1 stub]' and now names the no-op behavior inline. - Both JSON envelopes gain a 'note' field disclosing v1 status so machine consumers see the caveat at the same key they'd branch on. - Topic-root listing in brv dream's run() body flags both subcommands as '[v1 stub]' in the one-liner hint. 4 tests in test/commands/dream.test.ts cover: each static description contains 'v1 stub'; each --format json envelope's data.note discloses stateless / no-op semantics.

…, worked example A skill-discovery sub-agent test surfaced gaps in §7 (brv dream): prose described the surface but no sample JSON envelopes existed, prune-signal thresholds were left to the agent to guess, and there was no end-to-end walkthrough — so consumers had to grep the source to learn field shapes. Additions to §7: - Sample dream scan --format json envelope (sessionId + candidates with one example each of link / merge / prune / synthesize shapes, including signals + daysSinceModified for prune). - Sample dream finalize response (archived, skipped, logId, status). - Prune signal thresholds inline: importance < 35 → low-importance, draft >60d / validated >120d → stale-mtime, maturity 'core' never surfaced. - 'Worked example — prune the stalest topic (no curate detour)': three-block scan → finalize → undo walkthrough. - Path-attribute warning in the merge bullet so agents stripping the scan-emitted .html before <bv-topic path=> see the guidance. - Updated undo description to reflect the symmetric mtime + signal restoration shipped in this PR — was 'mtime resets to current time'; is 'content, original mtime, and sidecar runtime signals... all restored'. Companion change in prune-candidates.ts: sync-hint comment near the threshold constants pointing at SKILL.md §7, so future threshold changes don't silently drift the agent-facing docs.

…566ac) Reviewer caught a real race in the refreshIndex() shipped in f5566ac. The orphan-builder problem: T0: caller A invokes search() — acquireIndex starts buildA; state.buildingPromise = buildA. The IIFE body is still awaiting fileSystem.listDirectory and the eventual buildFreshIndex. T1: scanDreamCandidates calls refreshIndex(). Old code cleared state.cachedIndex AND state.buildingPromise immediately and returned. T2: caller C invokes search() — sees both fields cleared, kicks off a NEW build (buildB). state.buildingPromise = buildB. T3: buildA finally completes and writes state.cachedIndex = freshIndexA (line 1016 of acquireIndex). T4: buildB completes and writes state.cachedIndex = freshIndexB. If buildA happens to settle AFTER buildB (e.g. slower disk read, larger source set), the older freshIndexA wins — reintroducing the exact TTL-stale-data scenario Fix #2 was meant to prevent. Realistic on a daemon with concurrent brv search / brv query / dream-scan tasks. Fix per reviewer's accepted shape: refreshIndex() now AWAITS any in-flight buildingPromise before clearing cached state. By the time the await returns, the in-flight build has either published its result (which is then cleared) or rejected (swallowed — callers don't care about that build's outcome, they just want a clean slate). No orphan publisher can write back after invalidation because the publishing point is now in the past relative to the clear. New test file search-knowledge-service-refresh.test.ts covers: - the contract under test (refreshIndex MUST NOT resolve while a build is in flight) - idempotent behavior on cold state (no in-flight build) - graceful clear when the in-flight build rejects Not a follow-up sub-agent regression replay: the race manifests only under concurrent operations on a single SearchKnowledgeService instance, which the 12-scenario test matrix (per-scenario daemons, serial workflow) does not exercise. The unit test at the contract boundary is the right pin.

Five items from the latest codex review batch on PR #689. - agent-process.ts: drop the dead-code `?? ''` fallback on previousTexts. archived[] paths are invariant-guaranteed to have all three fields (text + mtime + signals) populated together, so the fallback was already unreachable — and if it ever fired, the resulting `{[path]: undefined}` would violate the new PRUNE schema's z.record(z.string(), z.number()) for previousMtimes. All three lookups now agree: trust the invariant uniformly. - search-knowledge-service.ts: extend refreshIndex docstring with a maintainer note explaining that the explicit `state.buildingPromise = undefined` after the await is defensive (acquireIndex's finally block already nulls it). Also documents the benign double-build inefficiency when a fresh acquireIndex() races into the gap between await-resolve and clear — not a correctness issue, just non-obvious. - SKILL.md §7: soften the merge bullet's path-attribute warning. The prior "drop the trailing .html — the writer always appends it" instruction was load-bearing before the path-doubling fix shipped; now that the writer normalizes idempotently (both bare and .html-suffixed forms resolve to the same on-disk file), the instruction reduces to agent-facing cognitive load. Updated to "either form works; the writer normalizes" so the agent doesn't have to track which form scan emitted. - dream.test.ts: add two regression tests for the DREAM_REMOVED_FLAGS wire-up in dream.ts's run() body. (1) text mode: --timeout 30 rejects with this.error throwing before this.log runs; (2) json mode: --timeout 30 --format json emits a JSON envelope with success:false / data.status:'error' / data.error containing '--timeout'. Pins the early-exit ordering so a future refactor that drops the findRemovedFlagMessage(this.argv, ...) call fails the suite instead of silently regressing to pre-removal state. - schemas.ts: hoist TaskTypeSchema declaration above TaskExecuteSchema (the schema previously sat ~100 lines below) and replace TaskExecuteSchema.type's inline z.enum(...) with a reference to TaskTypeSchema. The doc block at the original declaration site claimed it was the single source of truth, but the inline enum on TaskExecuteSchema.type silently violated that — the only outlier; TaskCreatedSchema.type (now line 605) and TaskCreateRequestSchema.type already referenced TaskTypeSchema. Hoist needed to avoid a Zod TDZ at module load. No behavioral change; schemas now share one literal enum source.

refactor: [ENG-2884] remove legacy LLM-driven brv dream path

github-actions · 2026-05-22T09:24:49Z

Claude finished @danhdoan's task in 7m 51s —— View job

Re-review of PR #663 after the ENG-2884 batch (PR #689)

Two PR-specific blocks since 5887e2a: the ENG-2884 refactor (13d83a6 → 7d77a8e → merge #689 → 7a397ba → f5566ac → 2e138a6 → 53004fd → 8c7c1cc → e3f96e8) which removes 4000+ LOC of the legacy LLM-driven brv dream pipeline and replaces the topic root with a thin migration-hint stub. Tool-mode subcommands (brv dream {scan,finalize,undo,sessions,cancel}) cover the same workflow deterministically. The batch also fixes a path-doubling silent-data-loss bug in the dream→curate merge handoff, expands previousMtimes / previousSignals capture so prune undo is bit-exact, and catches a real refresh-index race in PR #689 review (8c7c1cc).

Score: 4.85 / 5 — held from prior review. The refactor is unusually clean for its size (32 files deleted, net -5825 LOC) and the in-cycle bug catches show real review discipline.

What changed since `5887e2a`

Commit(s)	Scope
`13d83a6`	ENG-2884 refactor — removes `dream-executor.ts`, `dream-trigger.ts`, `consolidate/synthesize/prune` LLM ops, `dream-response-schemas.ts`, `parse-dream-response.ts`, `timeout-deprecation.ts`. `brv dream` is now a thin topic root printing a migration hint. Net `–4000` LOC, `–14` files.
`7d77a8e`	ENG-2884 PR #689 review — wires `DREAM_REMOVED_FLAGS` into the topic root (was exported but never consumed); drops `'dream'` + `force` + `'agent-idle'` from `TaskTypeSchema` / `TaskExecuteSchema` / `TaskCreateRequestSchema`. Closes the straggler-task gap (a stale `type:'dream'` now fails at the schema boundary instead of silently completing with empty result).
`7a397ba`	Strip trailing `.html` in `topicPathToFilePath` so `path="x/y"` and `path="x/y.html"` both resolve to the same on-disk file — fixes the merge-workflow silent-data-loss bug where the survivor landed in `x/y.html.html` while the agent archived the loser thinking the merge succeeded.
`f5566ac`	`searchService.refreshIndex()` + `SearchKnowledgeOptions.combineWith` + pre-archive `mtime`/`signals` capture on finalize. `DreamLogSchema.PRUNE` gains optional `previousSignals` / `previousMtimes`. `undoPrune` restores via `utimes()` + `runtimeSignalStore.set()`.
`2e138a6`	Discloses `[v1 stub]` status on `dream sessions` / `dream cancel` in descriptions + JSON envelopes (`data.note: 'v1: ...'`).
`53004fd`	SKILL.md §7 — sample envelopes, prune-signal thresholds, worked example. Sync-hint comment on the prune threshold constants pointing at SKILL.md.
`8c7c1cc`	Race-fix on `refreshIndex` from PR #689 reviewer catch — awaits in-flight build before clearing, so orphan builders can't write back after invalidation. Maintainer-note documents the benign double-build window. New `search-knowledge-service-refresh.test.ts` (3 cases).
`e3f96e8`	PR #689 codex review fixes — drop dead-code `?? ''` fallback on `previousTexts` (invariant-guaranteed populated together); hoist `TaskTypeSchema` above `TaskExecuteSchema` to make it the single source for both (was inline-duplicated on `TaskExecuteSchema.type`); SKILL.md path-attribute note softened (writer normalizes either form).

Strengths

Refactor scope is right-sized. Removed exactly the dead surface: LLM-driven entry, daemon-side dream-trigger, agent-idle dispatch branch, response parsers, timeout-deprecation helper (no consumer post-removal). Kept LLM-agnostic services (dream-undo, dream-state-service, dream-lock-service, dream-log-store) that the tool-mode pipeline reuses. The commit message's "Removed / Trimmed / Kept" tri-list is unusually clear.
8c7c1cc race fix — see inline comment at search-knowledge-service.ts:1123. The orphan-builder window was easy to miss and the unit-test contract (refresh must not resolve while a build is in flight) is the cheapest possible pin.
mtime + signal capture on finalize. The metadata that drove the prune decision (importance, maturity, recency, mtime) is now restorable, so a topic archived as low-importance or stale-mtime returns with the same prune-eligibility on the next scan. Without this, undo silently mutated observable state and the topic stopped re-surfacing — a quiet correctness regression that would only show up under repeated dream cycles. The three-case test triad in dream-undo.test.ts (capture-and-restore, restore-via-utimes, restore-via-signal-store, legacy-log-fallback) is the right shape.
Path-doubling fix is symmetric. Two reading orderings + confirmOverwrite round-trip = 4 new tests; not just the doubled-extension case in isolation. See inline comment at html-writer.ts:274 for one small follow-up (.htm extension symmetry).
Schema-source consolidation. TaskTypeSchema is now the single source for TaskExecuteSchema.type, TaskCreateRequestSchema.type, and TaskCreatedSchema.type — previously TaskExecuteSchema.type had an inline z.enum(...) that silently violated the "single source of truth" doc above the declaration. e3f96e8 hoisted the declaration to fix the Zod TDZ at module-load and removed the inline duplicate. Quiet drift that nobody flagged in three prior reviews.
Honest v1-stub disclosure. [v1 stub] prefix in command descriptions + data.note field in JSON envelopes for dream sessions / dream cancel. Machine consumers branching on sessions.length or status: 'cancelled' see the caveat at the same key shape they'd inspect.

Issues

1. observation (semantic deviation, defer-defensible) — `previousSignals` captured for never-tracked topics

Posted inline at dream-session.ts:221. runtimeSignalStore.get() returns validateOrDefault(...) — i.e. system defaults when the sidecar has no entry. On a never-tracked topic, undo writes those defaults via set(), creating a sidecar entry that didn't exist before the archive. Strictly speaking, that's not "undo to original state" — it's "undo to original-state + defaults". Defensible (a future access would mutate the defaults anyway), but worth one line in the docstring.

2. suggestion (consistency) — strip `.htm` too in `topicPathToFilePath`

Posted inline at html-writer.ts:274. Latent today (dream-scan only emits .html), but /\.html?$/i matches both .html and .htm at no extra cost, and the file-reader / format-detector already handle both — this is the only path-shape resolver that doesn't.

3. nit (CLI behavior asymmetry) — JSON-mode removed-flag rejection exits 0, text-mode exits 1

Posted inline at dream.ts:94. Same pattern as the curate/query precedent, so consistency with the rest of the CLI is preserved. But set -e shell wrappers see different exit codes for the same logical error depending on the --format flag. Either document, or set process.exitCode = 1 before the return.

4. issue (stale comment, post-ENG-2884) — `task-router.ts:138-139` still references the removed dream consumer

The preDispatchCheck doc says "Used for dream task type to enforce gates 1-3 (time, activity, queue) even on the CLI dispatch path — mirrors the idle-trigger pre-check pattern." But the 'dream' task type is gone from TaskTypeSchema, and a grep shows no caller passes a preDispatchCheck anymore (only the type plumbing remains). The new task-router.test.ts correctly swapped its fixtures to type: 'curate' — this comment is the last straggler.

Worth either updating to "Currently unused — kept for future per-type gating" or trimming the dream-specific paragraph. (Can't post inline since this exact line wasn't touched in the PR's diff.)

5. observation (carry-over) — `daemon-client.ts` / `task-validation.ts` still reference `brv providers connect/switch` / `/provider` REPL

Carry-over from prior reviews — but the reachable-impact analysis has shifted significantly since the prior review. With the LLM-driven dream path gone, the validateProviderForTask gate (agent-process.ts:513) is now skipped for all five user-facing tool-mode types (search, query-tool-mode, curate-html-direct, dream-scan, dream-finalize). The only paths that still hit provider validation are the legacy 'curate' / 'curate-folder' / 'query' task types — and the user-facing CLI commands (brv curate, brv query) have all been migrated to the new tool-mode types.

Real-world reachability is now near-zero unless a stale MCP client or an older CLI binary still produces legacy task types. Worth a one-pass scrub regardless, but the urgency from the May 19 review is gone.

6. observations (carry-overs, unchanged)

topic-loader.ts regex parser still inconsistent with index-generator.ts parse5.
<existing-topic> block in curate-prompt-builder.ts:119-123 still inlines system-managed timestamps.
CLI ConsoleLogger chalk-on-stderr under --format json (sidecar-bump + index-regen in curate-session.ts).

Praise

PR refactor: [ENG-2884] remove legacy LLM-driven brv dream path #689 review catch on the refresh-index race — saving a real correctness bug from landing. Pulling the cycle in three days (ship f5566ac → reviewer catches the orphan-builder window → ship 8c7c1cc with explicit unit-test contract) is the model "ship → review → tighten" arc.
Codex review-fix discipline in e3f96e8 — five items, each scoped to one file, each with a why-this-matters line in the commit message. The TaskTypeSchema hoist+dedupe fixes drift that three prior human review rounds missed because it lived behind a "single source of truth" doc comment that was technically lying.
Test count delta is honest. Net -184 tests is exactly what the deleted file inventory predicts — no orphan suites pointing at vanished modules, no zombie tests pinning long-dead behavior.
Migration text quality. dream cancel's text-mode output (l.41-46) tells the user that curate writes between scan and cancel are NOT rolled back by brv dream undo, and points to brv review reject as the right rescue. That's the specific rescue path a confused user needs — and most teams would have shipped a generic "session cancelled" line.
Sub-agent test design. search-knowledge-service-refresh.test.ts injects state.buildingPromise directly rather than going through a full IFileSystem mock, because the contract under test is exactly that interaction. Pragmatic, well-scoped, leaves a 28-line file that any future maintainer can read top-to-bottom.

Final disposition

LGTM. ENG-2884 is high-quality refactor work — the size doesn't represent risk because the removed surface was orphaned and the kept surface is fully covered. The mid-cycle catches on 8c7c1cc (race) and e3f96e8 (schema drift) are the kind that pay for themselves immediately. Remaining items are all defer-defensible or post-merge polish.

Todo

Identify what changed since 5887e2a
Spot-check the ENG-2884 refactor + ENG-2884 sub-fixes (13d83a6, 7d77a8e, 7a397ba, f5566ac, 2e138a6, 53004fd, 8c7c1cc, e3f96e8)
Verify outstanding carry-over items from prior reviews
Post inline comments + final summary with score
· Branch: proj/byterover-tool-mode

github-actions · 2026-05-22T09:30:39Z

+ * point that encodes that convention.
+ */
+function topicPathToFilePath(contextTreeRoot: string, topicPath: string): string {
+  const normalized = topicPath.replaceAll('\\', '/').replace(/^\/+/, '').replace(/\.html$/i, '')


praise: Idempotent .html stripping is the right shape for the dream→curate handoff. The doc above (l.260-271) names the failure mode it prevents (auth/jwt.html.html, silent path-exists bypass, archived loser + stale survivor), and the four new tests in html-writer.test.ts lock the contract from both orderings (bare-first-then-suffixed + suffixed-first-then-bare + confirmOverwrite round-trip). That's the right test triad for a "two input shapes must resolve to one file" invariant — it would have been easy to just test the doubled-extension case in isolation and miss the symmetry.

One micro-nit: the case-insensitive regex /\.html$/i correctly handles .HTML etc., but file-context-file-reader.ts:95-98 and format-detector.ts:24 both treat .htm as HTML too. If path="x/y.htm" ever shows up (manual migration, legacy import), this resolver would keep the .htm suffix, append .html, and produce x/y.htm.html — same doubled-extension problem this fix was designed to prevent, just one extension off.

Latent today (dream-scan emits .html only), but for full symmetry with the reader, consider:

const normalized = topicPath.replaceAll('\\', '/').replace(/^\/+/, '').replace(/\.html?$/i, '')

The s? matches .html and .htm identically. Same regex character cost.

github-actions · 2026-05-22T09:30:54Z

+   * may start a parallel build. That's a benign double-build
+   * inefficiency, not a correctness issue.
+   */
+  async refreshIndex(): Promise<void> {


praise (the race fix in 8c7c1cc): The original refreshIndex() in f5566ac had the orphan-builder race — clearing cachedIndex + buildingPromise while a prior acquireIndex() build was still in flight, then the orphan build resolved and wrote back to state.cachedIndex (line ~1016), defeating the invalidation. Awaiting the in-flight before clearing is the right shape: by the time the await returns, the in-flight has either published its result (which is then cleared) or rejected (which is swallowed — the caller doesn't care about the orphan's outcome). No publisher can write back after invalidation because the publishing point is now in the past relative to the clear.

The maintainer note about state.buildingPromise = undefined being defensive (l.1111-1121) is unusually thoughtful — it identifies the only post-fix race window (a fresh acquireIndex() that captures a new buildingPromise between await resolution and the explicit clear, where the clear then nulls the new pointer leaving a benign parallel double-build) and explicitly classifies it as performance-only, not correctness. That class of "we know about this edge case and it's intentional" comment is exactly what prevents the next contributor from "fixing" the non-bug.

The unit test in test/unit/agent/tools/search-knowledge-service-refresh.test.ts pins the contract (refresh-must-not-resolve-while-building) directly via state.buildingPromise injection — the cheapest test that locks the behavior at the seam. Cold-state idempotency and rejected-build cases round it out. Three cases, every one load-bearing.

github-actions · 2026-05-22T09:31:09Z

+        content = await readFile(source, 'utf8')
+        const stats = await stat(source)
+        mtimeMs = stats.mtimeMs
+        signals = await runtimeSignalStore.get(relPath)


observation (semantic deviation on never-tracked topics): runtimeSignalStore.get(relPath) never returns undefined — validateOrDefault ensures a default RuntimeSignals is returned when the sidecar has no entry for relPath (runtime-signal-store.ts:49-52). For a topic that was created without ever going through the sidecar-writing path (rare today, but possible: hand-authored files, restored from a non-tool-mode archive, future migration tools), the captured previousSignals[path] is the default object, not "no signals".

On undo, undoPrune then writes those default signals via runtimeSignalStore.set() — which creates a sidecar entry that didn't exist before the archive. Not strictly an "undo to original state" — the post-undo sidecar inventory now has one more entry than the pre-archive inventory. Defensible (the defaults are now explicit, and the next access bump will mutate them anyway), but worth noting in the docstring or capturing as a discriminator.

If you want strict bit-exactness on this edge case, two options:

Probe the sidecar's raw existence via keyStorage.get before capturing, and store undefined in previousSignals[path] when the sidecar entry was genuinely absent. Then undoPrune either calls set() or delete() accordingly.

Have runtimeSignalStore.getRaw(relPath): Promise<RuntimeSignals | undefined> so the dream-finalize path can distinguish "present-as-defaults" from "absent".

Non-blocking — the deviation only affects never-tracked topics, which are not the dream-scan target population. Worth one line in the previousSignals docstring at l.150-163: "For topics with no pre-archive sidecar entry, the captured snapshot is the system default — undo will create a sidecar where one didn't exist."

github-actions · 2026-05-22T09:31:23Z

+    const removed = findRemovedFlagMessage(this.argv, DREAM_REMOVED_FLAGS)
+    if (removed) {
+      if (argvRequestsJsonFormat(this.argv)) {
+        writeJsonResponse({command: 'dream', data: {error: removed, status: 'error'}, success: false})


nit (consistency): The text-mode branch uses this.error(removed, {exit: 1}) (l.98), which throws and sets exit 1. The JSON-mode branch uses writeJsonResponse(...) then return — implicitly exit 0 — but the envelope carries success: false / data.status: 'error'.

The discrepancy is intentional and matches the curate/query precedent (both do the same thing), but a CLI caller wrapping brv dream with set -e will get different shell-script behavior between brv dream --timeout 30 (fails) and brv dream --timeout 30 --format json (succeeds). Two clean options if scripts that branch on exit code matter:

Document the asymmetry in removed-flags.ts (it's an implicit contract today).

Set a non-zero exit after the JSON envelope: process.exitCode = 1; return. The envelope still flushes to stdout cleanly, and shell callers see consistent failure semantics.

Not blocking — the JSON envelope's success: false is the machine-readable truth, and set -e users typically inspect .success. But this is the same asymmetry that bit MCP integrations on the pre-ENG-2837 curate path; worth being explicit about it.

danhdoan and others added 30 commits May 10, 2026 12:25

Merge pull request #627 from campfirein/feat/ENG-2737

8b8b0b2

feat: [ENG-2737] HTML render layer foundation — element registry + parser

Merge pull request #628 from campfirein/feat/ENG-2738

3bdefe5

[ENG-2738] HTML-emission contract in curate.txt + fluency check

Merge branch 'proj/html-mem-conversion' into feat/ENG-2741

b58163e

Merge pull request #629 from campfirein/feat/ENG-2741

944813b

feat: telemetry producer for QueryLogEntry + CurateLogEntry

Merge pull request #630 from campfirein/feat/ENG-2739

33765da

[ENG-2739] curate path — HTML writer + format detector + executor wiring

Merge pull request #631 from campfirein/feat/ENG-2740

a86d210

[ENG-2740] query path — HTML reader + element-axis index + search routing

Merge branch 'proj/html-mem-conversion' into fix/tier2-html-render

8dd1c0a

Merge pull request #632 from campfirein/fix/tier2-html-render

2a1426d

fix: Tier 2 direct response renders HTML topics into structured markdown

Merge pull request #633 from campfirein/fix/html-conversion-followups

3816618

fix: extension-aware format detector + byterover tokens + curate prompt

cuongdo-byterover and others added 2 commits May 21, 2026 11:10

Merge pull request #687 from campfirein/refactor/ENG-2880

eb3add2

refactor: [ENG-2880] drop legacy flags from brv curate / brv query

github-actions Bot reviewed May 21, 2026

View reviewed changes

cuongdo-byterover and others added 7 commits May 21, 2026 15:48

Merge pull request #690 from campfirein/fix/ENG-2883

e02f000

fix: add HTML output contract rules and related-ref warner

github-actions Bot reviewed May 21, 2026

View reviewed changes

RyanNg1403 and others added 8 commits May 21, 2026 18:50

Merge remote-tracking branch 'origin/proj/byterover-tool-mode' into r…

c915ca2

…efactor/ENG-2884

Merge pull request #689 from campfirein/refactor/ENG-2884

8edf54a

refactor: [ENG-2884] remove legacy LLM-driven brv dream path

github-actions Bot reviewed May 22, 2026

View reviewed changes

		const wantsFile = stripped.endsWith('.html')
		const withoutExt = wantsFile ? stripped.slice(0, -'.html'.length) : stripped

Conversation

danhdoan commented May 15, 2026

Summary

What's new

Curate

Query

MCP

SKILL.md

Breaking changes

Tickets shipped

Rollout

Test plan

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review of PR #663 after the ENG-2880 batch (PR #687)

What changed since 4e84552

Strengths

Issues

1. nit (new, stale docstring) — useHtmlContextTree reference in CurateLogEntry.format doc

2. observation (carry-over, non-blocking) — error messages still reference removed brv providers connect/switch commands

3. observation (carry-over, non-blocking) — <existing-topic> block still inlines system-managed timestamps verbatim

4. observation (carry-over, non-blocking) — topic-loader.ts regex parser inconsistent with index-generator.ts parse5

5. observation (carry-over, non-blocking) — CLI ConsoleLogger emits chalk on stderr under --format json

Praise

Test plan suggestions

Final disposition

Todo

Uh oh!

github-actions Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review of PR #663 after the ENG-2883 batch (PR #690)

What changed since c4c4a52

Strengths

Issues

1. suggestion — .htm suffix inconsistency with the file-reader

2. observation (UX, batch curate) — forward refs surface as warnings on every first-touch

3. observation (security, defer-defensible) — symlink defense-in-depth at related-ref-warner.ts:74-80

4. issue (carry-over) — error messages still reference removed brv providers connect/switch commands

5. observation (carry-overs, non-blocking)

Praise

Test plan suggestions

Final disposition

Todo

Uh oh!

github-actions Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review of PR #663 after the ENG-2884 batch (PR #689)

What changed since 5887e2a

Strengths

Issues

1. observation (semantic deviation, defer-defensible) — previousSignals captured for never-tracked topics

2. suggestion (consistency) — strip .htm too in topicPathToFilePath

3. nit (CLI behavior asymmetry) — JSON-mode removed-flag rejection exits 0, text-mode exits 1

4. issue (stale comment, post-ENG-2884) — task-router.ts:138-139 still references the removed dream consumer

5. observation (carry-over) — daemon-client.ts / task-validation.ts still reference brv providers connect/switch / /provider REPL

6. observations (carry-overs, unchanged)

Praise

Final disposition

Todo

Uh oh!

github-actions Bot May 22, 2026

github-actions Bot commented May 21, 2026 •

edited

Loading

What changed since `4e84552`

1. nit (new, stale docstring) — `useHtmlContextTree` reference in `CurateLogEntry.format` doc

2. observation (carry-over, non-blocking) — error messages still reference removed `brv providers connect/switch` commands

3. observation (carry-over, non-blocking) — `<existing-topic>` block still inlines system-managed timestamps verbatim

4. observation (carry-over, non-blocking) — `topic-loader.ts` regex parser inconsistent with `index-generator.ts` parse5

5. observation (carry-over, non-blocking) — CLI `ConsoleLogger` emits chalk on stderr under `--format json`

github-actions Bot commented May 21, 2026 •

edited

Loading

What changed since `c4c4a52`

1. suggestion — `.htm` suffix inconsistency with the file-reader

3. observation (security, defer-defensible) — symlink defense-in-depth at `related-ref-warner.ts:74-80`

4. issue (carry-over) — error messages still reference removed `brv providers connect/switch` commands

github-actions Bot commented May 22, 2026 •

edited

Loading

What changed since `5887e2a`

1. observation (semantic deviation, defer-defensible) — `previousSignals` captured for never-tracked topics

2. suggestion (consistency) — strip `.htm` too in `topicPathToFilePath`

4. issue (stale comment, post-ENG-2884) — `task-router.ts:138-139` still references the removed dream consumer

5. observation (carry-over) — `daemon-client.ts` / `task-validation.ts` still reference `brv providers connect/switch` / `/provider` REPL