Skip to content

Proj/byterover tool mode#663

Open
danhdoan wants to merge 132 commits into
mainfrom
proj/byterover-tool-mode
Open

Proj/byterover tool mode#663
danhdoan wants to merge 132 commits into
mainfrom
proj/byterover-tool-mode

Conversation

@danhdoan
Copy link
Copy Markdown
Collaborator

Summary

Lands the full byterover tool-mode project end-to-end:

  • HTML context-tree format. The 19-element <bv-*> vocabulary
    (bv-topic, bv-task, bv-pattern, bv-rule, bv-diagram, …) is
    now the on-disk format for curated topics. Parser, writer, validator
    registry, element-axis index, and LLM-ready renderer all ship in
    src/server/infra/render/. Markdown topics keep working via the
    format detector + extension-aware reader.

  • Tool mode is the default. Both brv curate and brv query run
    without any byterover LLM provider configured. BRV_*_TOOL_MODE
    env-var gates have been removed (ENG-2815). The calling agent owns
    synthesis; byterover retrieves, renders, validates, and writes.

  • MCP migrated to provider-free. brv-query swaps to
    task:create type='query-tool-mode' and returns the structured
    envelope (ENG-2836). brv-curate is replaced by
    brv-curate-html — accepts pre-authored <bv-topic> HTML and routes
    through the writer directly, no LLM (ENG-2837). After this lands,
    MCP integrations (Claude Desktop / Cursor MCP / etc.) inherit the
    no-provider property the CLI already has.

  • New brv read <path> command (ENG-2769) for single-topic render

    • inspection.
  • Editorial topic viewer in webui (ENG-2810) for bv-* HTML
    topics, hardened against untrusted HTML.

  • Telemetry producers (ENG-2741) attach token/latency/format
    metadata to QueryLogEntry and CurateLogEntry via a shared
    TaskUsageAggregator.

What's new

Curate

  • HTML writer with <bv-topic> schema validation and atomic
    tmp+rename writes (ENG-2737/2738/2739).
  • CurateOrchestrator state machine: generate → validate → correct
    → write → retry loop (ENG-2766).
  • Prompt builder generated from ELEMENT_REGISTRY, with
    prompt-injection-hardened delimiters (ENG-2767).
  • CLI surface uses in-process kickoffSession / continueSession
    instead of the legacy daemon runAgentBody path (ENG-2765).
  • System-managed createdat / updatedat injected at write time —
    the model is not allowed to author these.
  • confirmOverwrite guard returns existing content for the caller to
    merge instead of clobbering silently.

Query

  • Tool-mode executor runs Tier 0/1 cache + Tier-2 retrieval without
    the canRespondDirectly confidence threshold (ENG-2811/2812). The
    threshold was a byterover-side judgment; calling agents now own the
    "is this match good enough?" decision.
  • Tier 3/4 LLM-driven synthesis removed from the executor path.
  • HTML-routing in SearchKnowledgeService so HTML topics render
    identically to MD in Tier 2.
  • New docs/curate-protocol.md documents the wire envelope.

MCP

  • brv-query migrated to the tool-mode envelope; renders matched
    topics as markdown sections.
  • brv-curate replaced by brv-curate-html (breaking — see below).
  • Shared encoders: src/shared/transport/query-tool-mode-content.ts
    and src/shared/transport/curate-html-content.ts.

SKILL.md

  • Tool-mode curate session protocol documented (ENG-2768).
  • Query tool-mode section added with skippedSharedCount semantics
    (ENG-2814).
  • Deprecated LLM-provider references scrubbed (ENG-2822).

Breaking changes

  • MCP brv-curate tool input shape changes. The previous
    {context, files, folder} natural-language inputs are gone. The new
    tool (brv-curate-html / curate-html-direct task type) accepts
    {html, confirmOverwrite?} — pre-authored <bv-topic> HTML from
    the calling agent. MCP integrations relying on the natural-language
    curate input must update. See src/server/templates/skill/SKILL.md
    for the authoring guide.

  • BRV_CURATE_TOOL_MODE / BRV_QUERY_TOOL_MODE env vars removed.
    No-op now — tool mode is the only path. Remove from any local
    scripts.

  • useHtmlContextTree flag removed. HTML is the only curate
    output format. Existing markdown topics still read fine via the
    format detector; new writes are always HTML.

Tickets shipped

ENG-2737, ENG-2738, ENG-2739, ENG-2740, ENG-2741, ENG-2765, ENG-2766,
ENG-2767, ENG-2768, ENG-2769, ENG-2810, ENG-2811, ENG-2812, ENG-2814,
ENG-2815, ENG-2822, ENG-2836, ENG-2837.

Rollout

  • Existing .brv/context-tree/*.md topics keep working — read path
    is format-aware.
  • New curate writes land as .html. A separate migration script
    (Phat, parallel work) can re-author legacy MD topics into the
    <bv-topic> HTML format and stash originals under
    .brv/_archive/context-tree-md/. Not included in this PR.
  • Users following the "no provider needed" announcement and running
    brv providers disconnect <all> will now find MCP brv-query and
    brv-curate-html work end-to-end — same property as the CLI.

Test plan

  • npm run typecheck clean
  • npm run lint clean
  • npm test — full unit + integration suite (153 files changed,
    ~15k LOC added; new tests cover HTML reader/writer/elements,
    curate-session, query-retrieval, MCP tool-mode envelopes,
    telemetry aggregator, format detectors, prompt builder)
  • npm run build clean
  • Manual: 0-provider CLI curate + query round trip on a fresh
    .brv/
  • Manual: 0-provider MCP brv-query + brv-curate-html from
    Claude Desktop or Cursor MCP
  • Manual: HTML topic round-trip through webui editorial viewer
  • Verify migration script (Phat's parallel work) lands separately
    and does not block this PR

danhdoan and others added 30 commits May 10, 2026 12:25
…rser

Lands the M1 keystone task: element vocabulary, per-element validators,
data-driven registry, and HTML parser wrapper. No behavior change yet —
no consumer wires the registry until T3 (curate path) and T4 (query path).

Element vocabulary (5 elements):
- bv-topic — root container; required `path`, optional importance/maturity/
  recency/updatedat (lowercase per HTML5 normalization)
- bv-rule — rule statements (severity, id)
- bv-decision — decision records (id)
- bv-bug — bug runbook entries (id, severity)
- bv-fix — fix runbook entries (id)

Architecture:
- Data-driven `ELEMENT_REGISTRY` is the production-track guardrail. M2
  vocabulary expansion is purely additive — new elements register via
  one entry + one schema/validator pair, no consumer code changes.
- `makeAttributeValidator` factory eliminates per-element boilerplate;
  reduces M2's 12 additional element validators to one-line bindings.
- HTML parser is parser-library-agnostic at the API boundary
  (`DocumentNode` / `ElementNode` / `TextNode`). parse5 used internally;
  consumers don't see parse5 types.
- Attribute-case normalization is documented in `ElementNode` JSDoc:
  HTML5 lowercases attribute names at parse time, so schema keys must
  match parser output (lowercase), not source HTML (camelCase).

Tests (106 new):
- 5 per-element validator suites (61 cases total) — each ≥10 cases
- Registry shape + wiring + metadata (14 cases)
- HTML parser: basic parsing, malformed-input handling, walkElements,
  getInnerText, serializeHtml round-trip (22 cases)
- End-to-end round-trip on a sample fixture covering all 5 elements
  (9 cases)

Verification:
- typecheck clean
- lint 0 errors (242 warnings, all pre-existing baseline)
- 7715 tests passing, 0 failing
- build clean

Adds parse5@^8.0.1 as a runtime dep — the W3C-spec HTML parser used by
jsdom; types ship in-tree (no @types/parse5 needed). Justified vs hand-
rolled parsing because robust HTML tag-soup handling is non-trivial and
this is the standard library for the job.
- bv-topic schema: tighten recency regex to ^\d+(\.\d+)?$ (rejects '.', '..1', '1..2' which previously slipped past the regex), drop redundant Number.isFinite for style consistency with importance, accept timezone offsets in updatedat via {offset: true}
- html-parser: replace `as unknown as Parse5DocumentFragment` cast with parse5 defaultTreeAdapter factories (createDocumentFragment, createElement, createTextNode, appendChild) — no more structural casts in the serialize path
- html-parser: getInnerText now inserts a separator between sibling element children so compact emit (<p>foo.</p><p>bar.</p>) does not merge tokens for BM25
- html-parser: document <template> content limitation (parse5 places template children in .content, not childNodes)
- schema docstrings + test names: replace 'warn-only behaviour' with 'parse-and-skip' (no warning is actually emitted in M1)
- sample-topic round-trip: add explicit case-normalization regression test for updatedAt → updatedat
feat: [ENG-2737] HTML render layer foundation — element registry + parser
Rewrite the canonical curate tool description (loaded by ToolDescriptionLoader
and presented to the agent on every curate operation) from the structured
JSON-args shape (rawConcept/narrative/facts/etc.) to the M1 HTML-emission
contract using the closed `<bv-*>` vocabulary defined by T1.

Coverage:
- All 5 elements (bv-topic, bv-rule, bv-decision, bv-bug, bv-fix) with full
  per-element attribute schemas: required/optional, value enums, format
  constraints (importance integer 0-100, recency numeric 0-1, ISO-8601
  updatedat, severity enums per element, etc.).
- Allowed-children semantics (any/inline/block) + pairing convention
  (bug+fix as siblings).
- Path format (slash-separated snake_case) + domain guidelines.
- Output contract: HTML only, exactly one bv-topic root, no preamble/fences/
  commentary, lowercase attribute names per HTML5 spec, closed vocabulary,
  no invented attributes, no clarifying questions.
- Three worked examples (bug+fix runbook, rule+decision pair, general topic)
  to anchor the model's element-pairing and id-naming conventions.

Add a sanity test (test/unit/server/infra/render/curate-prompt.test.ts) that
loads curate.txt and asserts every registered ELEMENT_NAME, every per-element
enum value, and every output-contract directive is present. Guards against
silent drift when M2 expands the vocabulary or refactors attribute schemas.

Authoring fluency check (M1 T2 spike, separate harness in
local-auto-test/curate-fluency/, not in this repo): 20/20 generated outputs
validated against the M1 element registry on Sonnet 4.5. Gate cleared on
first full run after one prompt-tuning iteration. Decision: proceed to T3.
Full report: research repo at features/html-memory-conversion/milestones/01-experiment/fluency-report.md
Field-by-field review of curate.txt against the actual rendered .md
context files (frontmatter + ## Reason + ## Raw Concept + ## Narrative
+ ## Facts) surfaced two issues with the initial T2 draft:

1. <bv-topic> carried runtime-signal attributes (importance, maturity,
   recency, updatedat) that were explicitly migrated to a sidecar store
   under research/features/runtime-signals/ (per-user, per-machine state
   that caused noisy `vc status` diffs and team-share conflicts). Drop
   them from the schema; replace with the actual frontmatter fields the
   markdown writer renders: title (required), summary, tags, keywords,
   related.

2. The 5-element vocabulary covered Rules + three M1 net-new elements
   (Decision/Bug/Fix) but left every other rendered MD section as
   free-form HTML — losing the structural mapping the writer needs.

Expand M1 vocabulary to 16 elements (added 11), each mapping to a
specific section in the rendered .md output:

  - bv-reason       → ## Reason
  - bv-task,        → ## Raw Concept (Task / Changes / Files / Flow)
    bv-changes,
    bv-files,
    bv-flow
  - bv-structure,   → ## Narrative (Structure / Dependencies /
    bv-dependencies,                Highlights / Examples / Diagrams);
    bv-highlights,                  bv-rule already covered Rules
    bv-examples,
    bv-diagram (with type + title attrs for verbatim diagram preservation)
  - bv-fact (subject/category/value attrs) → ## Facts list

Each element follows the existing makeAttributeValidator pattern; the
data-driven registry stays the single source of truth.

Tests:
- Update bv-topic test to drop runtime-signal cases and assert the
  new required-attributes (path + title); add a regression case
  asserting passthrough still tolerates legacy importance/maturity/
  recency without enforcing them.
- Add registry test asserting (a) bv-topic exposes the new
  optionalAttributes, (b) runtime signals are NOT in the registry
  metadata, (c) every registered validator accepts its own empty node
  + rejects mismatched-tag.
- Consolidate the 9 attribute-free new elements into one shared
  text-only-elements.test.ts (same trivial schema; per-element
  duplication wouldn't add coverage).
- Add dedicated bv-diagram.test.ts (type enum) and bv-fact.test.ts
  (category enum, subject/value pass-through).
- Update sample-topic.html fixture to exercise every new element +
  the new frontmatter attributes; assert no runtime signals leak in.
- Update curate-prompt.test.ts to assert the expanded vocabulary,
  category/type enums for bv-fact/bv-diagram, and that the prompt
  explicitly excludes runtime signals from bv-topic attributes.

Re-run the M1 T2 fluency check on the same 20 fixtures with the new
vocabulary: 20 / 20 valid (gate cleared); 100% cohort-appropriate
placement (bug+fix → bv-bug+bv-fix; rule+decision → bv-rule+bv-decision;
general → bv-reason); 122 bv-fact instances extracted across the run
(~6 per fixture). Fluency report updated:
research/features/html-memory-conversion/milestones/01-experiment/fluency-report.md

Open contract violation persists: 18/20 outputs (90%) wrap in code
fences despite explicit prompt instruction. Recommendation for T3
unchanged — strip in the response parser, don't iterate the prompt.
Second writer audit (markdown-writer.ts:270-354) found 3 rawConcept
sub-fields without an HTML home: rawConcept.timestamp, rawConcept.author,
rawConcept.patterns. Add the missing elements so the M1 vocabulary
covers every section the writer renders.

Vocabulary additions (16 → 19 elements):

- <bv-timestamp> — text-only; renders as `**Timestamp:**` under
  `## Raw Concept`. Distinct from frontmatter createdAt/updatedAt
  (system-set) — this is the date the concept's data represents.
- <bv-author> — text-only; renders as `**Author:**`.
- <bv-pattern> — structured (flags + description as attrs, pattern
  as text content). Multiple siblings inside <bv-topic> collected
  into the `**Patterns:**` bullet list. Mirrors how <bv-fact> works.

Sanitiser:

- stripCodeFenceWrapper(html) added to html-parser.ts. Strips a single
  outer ``` ... ``` (any language tag) wrapper from the LLM response.
  Inner fences (e.g., <pre><code> blocks inside <bv-diagram>) survive.
  T3's curate executor calls this before parseHtml. Defensive
  sanitisation generalises better than chasing the model's quirk via
  prompt iteration — observed fence-wrap rate is 70% on Sonnet 4.5
  even with explicit prompt instruction to the contrary.

Tests:

- bv-pattern dedicated test (bv-pattern.test.ts) for the structured
  attrs.
- text-only-elements.test.ts extended for bv-timestamp + bv-author
  (same trivial schema as the other text-only elements).
- html-parser.test.ts: 7 new cases for stripCodeFenceWrapper covering
  ```html, ```xml, no-language ```, leading/trailing whitespace,
  unwrapped passthrough, mismatched-fence passthrough, inner-fence
  preservation, and parse-after-strip end-to-end.
- sample-topic-roundtrip.test.ts: renderable-MD coverage assertion
  extended to require all 15 dedicated body-section elements (was 12).
- sample-topic.html fixture exercises the 3 new elements.

Re-ran the M1 T2 fluency check on the same 20 fixtures with the
19-element vocabulary. Results:
- 20 / 20 valid (gate cleared)
- 100% cohort-appropriate placement (unchanged)
- bv-timestamp 60% adoption (used where input has a clear date)
- bv-author 10% (used only on inputs that name an owner)
- bv-pattern 0% — model correctly did not over-emit; no fixture
  contained regex content. Restraint is good behaviour.
- Cost: $0.61 (vs. $0.60 for the 16-element draft; negligible)

Updated fluency-report.md in research repo with final coverage matrix
and the 19-element → markdown-writer mapping for T3.
…Query/CurateLogEntry

Add per-call token usage (input, output, cached, cache-creation), latency tiers
(searchMs, llmMs, totalMs), and format mode (markdown vs html) to QueryLogEntry
and CurateLogEntry. Token rollup uses a per-task TaskUsageAggregator subscribed
to llmservice:usage events on the agent bus; LoggingContentGenerator emits the
events on both streaming and non-streaming paths. AiSdkContentGenerator folds
result.usage and providerMetadata.anthropic.cacheCreationInputTokens into a
synthesized rawResponse so per-provider extraction lands consistently across
Anthropic, OpenAI, and Google.

Live-validated against 30 curates × 3 providers; all four token fields populate
end-to-end with cacheCreationTokens correctly Anthropic-only.
[ENG-2738] HTML-emission contract in curate.txt + fluency check
- core/interfaces no longer reaches into infra: CurateUsageRecord moved to
  core/domain/entities/curate-log-entry, IUsageAggregator added in
  core/interfaces/telemetry/, TaskUsageAggregator now `implements` it.
- Consolidate curate telemetry assembly inside CurateExecutor:
  runAgentBody invokes reportTelemetry on both happy and error paths;
  agent-process passes onTelemetry that forwards via task:curateResult.
  Removes the duplicated inline assembly in agent-process.
- QueryExecutor.buildTiming prefers TaskUsageAggregator.getLlmMs() over the
  executeOnSession wall-clock so llmMs counts only LLM-call time, not
  tool-execution overhead. Falls back to wall-clock for tests.
- CurateLogHandler error finalization merges telemetryFields(state.usage)
  so failed curates don't underreport cost when the executor's best-effort
  error-path reportTelemetry has populated state.usage.
- OOD short-circuit in QueryExecutor routes through formatDetector.detect([])
  instead of returning format: undefined, so future HTML-aware detectors
  don't get silently bypassed.
- DRY: extract makeUsageListener helper in agent-process and reuse for
  both curate and query handlers.
- Document that PROVIDER_TYPES iteration in LoggingContentGenerator only
  matches 'aiSdk' in production (all live traffic flows through AiSdkContentGenerator).
…ocation

Re-review flagged the gap that error-path reportTelemetry being silent could
mask a future regression where failed-curate telemetry disappears from disk.
Pin both directions of the contract:
- happy path: onTelemetry fires once before runAgentBody returns
- error path: onTelemetry fires once before the throw propagates so the
  daemon can still emit task:curateResult ahead of task:error
feat: telemetry producer for QueryLogEntry + CurateLogEntry
…or wiring

Wires the M1 HTML context-tree experiment into the curate flow. When the
new BrvConfig.useHtmlContextTree flag is enabled, curate emits a
<bv-topic> HTML document using T1's element registry (19 elements after
T2's expansion) and atomically writes it to .brv/context-tree/<path>.html.
When the flag is off (the default), the existing markdown curate path
runs unchanged.

Rolled-in T2 review fixes (PR #628 auto-review):
- curate.txt: split the "Standard HTML inside <bv-*>" clause per
  allowed-children semantics. Inline-only subset (code/strong/em) for
  bv-rule, bv-task, bv-flow, bv-fact, bv-pattern, bv-timestamp,
  bv-author. Full block subset for bv-topic and the block-content
  elements. Removes the M1-passthrough/M2-strict drift the reviewer
  flagged.
- curate.txt: prefix the examples block with a disclaimer that the
  fences are for prompt readability only and the agent's actual output
  must be bare HTML. Lower-cost mitigation for the model's stubborn
  fence-wrap habit (T2's stripCodeFenceWrapper still handles the
  remaining cases).
- curate-prompt.test.ts: enum-value tests now anchor to the section
  after each element header instead of whole-file string match.
  Catches "severity moved from bv-bug to bv-decision" drift.
- curate-prompt.test.ts: new test that parses every fenced example
  block from curate.txt with parseHtml and runs each <bv-*> element
  through its registered validator. Catches example typos
  (severity="hihg") AND vocabulary drift in one assertion.

T3 proper:

BrvConfig:
- New optional boolean field useHtmlContextTree (default undefined ==
  false). Plumbed through the JSON validator, withReviewDisabled,
  withVersion, and withSpace withers. withoutSpace already used
  spread, so it picks up the new field automatically. 8 new BrvConfig
  unit tests cover round-trip + wither preservation.

format-detector (src/server/infra/render/format/format-detector.ts):
- Two pure functions: getFormatForWrite(config) reads the flag,
  getFormatForRead(filePath) inspects the extension. Default markdown.
  12 unit tests cover both branches plus edge cases (case-insensitive
  extensions, directory-named-foo.html with .md inside, etc.).

html-writer (src/server/infra/render/writer/html-writer.ts):
- Public surface: validateHtmlTopic(html) (pure validation, no I/O)
  and writeHtmlTopic({contextTreeRoot, rawHtml}) (validation + atomic
  write). Pipeline:
    1. stripCodeFenceWrapper (T2-provided) on raw input.
    2. parseHtml + walkElements; require exactly one <bv-topic> and
       a non-empty path attribute.
    3. Run every typed bv-* element through its registered validator.
       Reject on any failure with structured errors.
    4. Atomically write via DirectoryManager.writeFileAtomic to
       <contextTreeRoot>/<path>.html.
- Path-traversal hardening: backslashes normalised, `..` segments
  rejected, leading slashes stripped, resolved path must stay under
  contextTreeRoot. 15 unit tests cover validity branches, fence
  variants, atomic-write success/failure, and traversal attempts.

curate executor wiring:
- CurateExecuteOptions gains useHtmlContextTree?: boolean.
- runAgentBody branches the runtime prompt's final-step instruction:
  HTML mode tells the agent its FINAL RESPONSE is the bv-topic
  document and explicitly forbids tools.curate calls (which would
  write a sibling .md file). MD mode keeps the existing "verify via
  result.applied[].filePath" instruction unchanged.
- After agent.executeOnSession returns, HTML mode routes through a
  new handleHtmlCurateResponse() that calls writeHtmlTopic and shapes
  a CurationStatus from the result (success/failed). MD mode
  continues to call parseCurationStatus.
- 6 new unit tests cover: valid topic written, fence-wrapped response
  stripped, missing bv-topic → failed, invalid attribute → failed,
  flag=false uses MD path, flag=undefined uses MD path.

agent-process plumbing:
- The daemon's curate dispatcher reads cachedBrvConfig.useHtmlContextTree
  and forwards it on every CurateExecuteOptions call. Default false
  remains production-safe; the bench's Configuration B sets it to true
  via .brv/config.json.

Tests: 7892 pass, 0 fail. Typecheck clean, lint clean, build clean.

Out of scope (per the M1 plan):
- Storage path migration to .brv/render/ — M2.
- IR (ComponentIR) compilation — M2.
- Multi-tier query / structural selectors — T4.
- to-injection/to-evidence/to-summary/to-md adapters — M2.
- Migration tool for existing .brv/context-tree/ — M2.
HTML is the curate format; there is no MD/HTML branching to control.
Removing the flag simplifies the executor (one prompt, one response
handler), the BrvConfig schema (no per-project toggle), and the
format-detector (no write-side decision — only read-side dispatch
remains for legacy MD files).

Code changes:

- BrvConfig: remove the optional `useHtmlContextTree` field, the JSON
  validator branch, all wither plumbing, and the dedicated test block.
- ICurateExecutor: remove `useHtmlContextTree` from CurateExecuteOptions.
- CurateExecutor: collapse the prompt branch — every curate runs the
  HTML-emission instruction. Always route the agent's response through
  `handleHtmlCurateResponse`. Remove the dead `parseCurationStatus`
  method (the JSON-status-block path is gone).
- agent-process: drop the `useHtmlContextTree` plumbing on the curate
  dispatch site.
- format-detector: drop `getFormatForWrite` (no decision to make);
  keep `getFormatForRead` so the query/search read path can still
  route legacy `.md` files transparently. Drop the BrvConfig import.

Test changes:

- BrvConfig: drop the 8-case `useHtmlContextTree` describe block.
- format-detector: drop the 4 `getFormatForWrite` cases.
- curate-executor-html-mode: drop the two MD-fallback cases (no flag
  to toggle); the file is now strictly HTML-emission tests.

Comment cleanup (separate concern, same commit):

Strip milestone scaffolding (M1, M2, T1–T7) from comments across the
render layer. Comments now describe logic and decisions, not the
project plan that scaffolded them. Specifically:

- "M1 light validation; ADR-007 §13 strict validation is M2" →
  "Light validation; strict validation per ADR-007 §13 is future work"
- "M1 element registry"/"M1 vocabulary" → "element registry"/"closed
  bv-* vocabulary"
- "T3 wires the writer", "T4 owns the index", etc. → describe the
  consumer by role, not by milestone task
- "M1 keeps the existing .brv/context-tree/ storage" → describe what
  the resolver encodes, not which milestone made the decision
- Test descriptions: "(parse-and-skip — M1 light validation)" →
  "(parse-and-skip — light validation)"

Pipeline:
- 7878 tests pass (lost 14 with the dropped flag — 8 BrvConfig + 4
  format-detector write + 2 executor MD-fallback).
- Typecheck clean, lint clean, build clean.
End-to-end smoke testing surfaced a gap: the executor's runtime prompt
told the agent to emit HTML as its final response, but the base system
prompt (system-prompt.yml) had ~500 lines of detailed guidance for the
old structured-args curate flow (UPSERT/ADD/UPDATE/MERGE/DELETE with
rawConcept/narrative/facts content shape). The system prompt's
specifics overrode the runtime instruction; the agent kept building
JS data structures via code_exec and never emitted HTML, so the writer
saw nothing to validate and no .html file landed.

Rewrite the curate-related sections of system-prompt.yml to match the
HTML-emission contract:

- Update <TOOL_QUICK_REFERENCE>, <WORKFLOW_SUMMARY>,
  <PATH_TITLE_CLARIFICATION>, <PROHIBITED_BEHAVIORS>, <DIRECTORIES>,
  and the <SANDBOX_PROGRAMMING> curate example to instruct HTML
  emission. The agent's final response IS the bv-topic document; do
  NOT call tools.curate; do NOT wrap in code fences.
- Replace the entire ## Curation Workflow section (~500 lines, lines
  275–786) with a tighter HTML-focused workflow (~217 lines):
  - Keep the recon → mapExtract → dedup → groupBySubject preprocessing
    helpers (still useful for chunked/large inputs).
  - Document the bv-topic frontmatter (path/title required; summary,
    tags, keywords, related optional) and explicitly call out that
    importance/maturity/recency/updatedat are sidecar runtime signals,
    not authored.
  - Map MD sections to bv-* elements (Reason → bv-reason, Raw Concept
    sub-fields → bv-task/bv-changes/bv-files/bv-flow/bv-timestamp/
    bv-author/bv-pattern, Narrative subsections → bv-structure/
    bv-dependencies/bv-highlights/bv-rule/bv-examples/bv-diagram,
    Facts → bv-fact, runbooks → bv-decision/bv-bug/bv-fix).
  - Document inline-content vs block-content elements and the standard
    HTML allowed inside each.
  - Keep facts extraction guidance, content quality requirements,
    temporal preservation, and content preservation rules — translated
    to the bv-* element vocabulary.
  - Drop the operation-type table (UPSERT/ADD/UPDATE/MERGE/DELETE),
    Two-Part Context Model section, domainContext/topicContext/
    subtopicContext (these belonged to the structured-args flow that
    no longer exists).
- Update the query-side temporal-reasoning hint to reference
  bv-timestamp content + sidecar runtime signals instead of
  rawConcept/narrative timestamps.
- Update the related-topics search hint to reference the `related`
  attribute on bv-topic alongside the legacy `relations` frontmatter
  on .md topics.

Net change: 196 insertions, 494 deletions. The rewritten prompt is
~217 lines shorter and aligned with the actual emission contract in
curate.txt.

Verified end-to-end via local-auto-test/curate-html-e2e/ harness:
- 3/3 fixtures (bug-fix, rule-decision, general-context) curate to
  registry-valid HTML topic files (Sonnet 4.5).
- File paths land at <projectDir>/.brv/context-tree/<path>.html as
  expected.
- Per-fixture latency: 19.3s, 30.5s, 27.3s.

Pipeline: 7878 unit tests pass, lint clean, typecheck clean, build
clean.
…e time

The previous markdown writer rendered `createdAt` and `updatedAt` in
the YAML frontmatter of every topic file. The HTML writer was dropping
both — the LLM doesn't author them and the writer wasn't injecting
them either, so HTML topics had no temporal metadata on disk.

Inject both as `<bv-topic>` attributes during the writer's atomic
write, after fence-stripping and registry validation:

- `updatedat` is always set to the current ISO-8601 instant.
- `createdat` is preserved from the existing file on disk if one
  exists at the resolved path; otherwise it is set to the current ISO
  instant. Re-writes leave `createdat` alone, advance `updatedat`.
- Any value the LLM authored for these attributes is overridden — the
  agent is not allowed to choose its own timestamps.

Implementation: surgical regex edit on the bv-topic opening tag (no
parse → re-serialize round-trip) so the LLM's formatting (whitespace,
attribute order, quoting style) survives intact. Two helpers added to
html-writer.ts:
- setBvTopicAttributes(html, attrs) — replace-or-insert each attr.
- readExistingTopicAttribute(filePath, name) — surgical read of an
  attribute value from an existing on-disk topic without parsing the
  whole document.

Tests:
- Existing assertions about file content equaling VALID_TOPIC updated
  to match-with-injected-timestamps (body preserved verbatim, bv-topic
  opening tag now has createdat/updatedat).
- Three new cases under "system-managed timestamps":
  1. First write injects both timestamps within the test window.
  2. Re-writes preserve createdat, advance updatedat.
  3. LLM-authored values for either attribute are overridden by the
     system value.

Verified end-to-end via local-auto-test/curate-html-e2e/ — all 3
fixtures produce HTML topic files with system-injected createdat and
updatedat ISO-8601 timestamps on the bv-topic root, alongside the
LLM's frontmatter (path/title/summary/tags/keywords/related).

Pipeline: 18 html-writer unit tests pass (3 new), full suite green,
typecheck clean, lint clean.
CI test failure (curate-executor-html-mode.test.ts:79) — the file-content
assertion still expected the LLM's raw HTML; now matches the
timestamp-injected file by checking the body is preserved verbatim and
that createdat/updatedat are present. Same fix in the fence-stripping
test variant.

Review feedback addressed:

1. Telemetry format hardcoded to 'markdown' (curate-executor.ts:485).
   The flag is gone; HTML is the only curate format. Updated to emit
   'html' and reworded the doc comment. Updated the regression test in
   curate-executor.test.ts:491 to assert 'html' accordingly.

2. validateHtmlTopic does not catch path-traversal (html-writer.ts).
   The doc comment promised "safe to use the topic" but a payload like
   `<bv-topic path="../../../etc/passwd">` returned `{ok: true}` —
   path-traversal was only caught later when topicPathToFilePath threw.
   That works end-to-end (executor catches and surfaces as failed
   status), but is a foot-gun for any future caller using
   validateHtmlTopic standalone (preview, dry-run).

   Move the path-segment safety check into validateHtmlTopic and
   surface as a new structured error kind 'unsafe-path'.
   topicPathToFilePath keeps path.resolve as defence-in-depth. Added
   two unit tests covering `..` segments and single-`.` segments;
   updated the existing executor-side test to expect a validation
   failure (with `unsafe-path` errors) instead of a thrown exception.

3. Attribute-validation errors swallowed the `field` discriminant
   (curate-executor.ts:344). The writer captures which attribute
   failed (`severity`, `id`, etc.) but the executor's
   verification.missing string only included the tag. So a malformed
   `severity="urgent"` was logged as `attribute-validation
   (bv-rule): …` rather than `attribute-validation (bv-rule.severity):
   …`. Updated the mapping to include the field for
   attribute-validation errors. Non-blocking but improves bench-log
   debuggability.

Reviewed but obsolete (not actioned):

- "format-detector.ts ships an unused getFormatForWrite" — removed in
  the flag-drop commit f701cd2.
- "agent-process.ts double-coerces useHtmlContextTree" — also obsolete
  (flag dropped).

Pipeline: 7883 unit tests pass, lint clean, typecheck clean, build
clean. E2e smoke (curate-html-e2e/) green on the rebuilt CLI.
[ENG-2739] curate path — HTML writer + format detector + executor wiring
…ch routing

Wires the search/query path to read HTML topic files transparently
alongside markdown ones. Curate emits HTML; the legacy MD path stays
in for `brv swarm` and existing trees that haven't migrated. The BM25
index is format-agnostic (HTML inner-text decodes to the same plain
text the markdown tokenizer expects); the indexer dispatches on file
extension at read time.

New modules:

- src/server/infra/render/reader/html-reader.ts — readHtmlTopicSync(html)
  returns {bodyText, elements, topicAttributes}. bodyText is the
  entity-decoded inner text, ready to feed BM25. elements is a flat,
  document-order list of every typed `<bv-*>` node with its attribute
  map. topicAttributes lifts the bv-topic frontmatter (path, title,
  summary, tags, keywords, related). Forgiving on malformed input
  (parse5 returns a best-effort tree).

- src/server/infra/render/reader/element-axis-index.ts — in-memory
  ElementAxisIndex. Two query keys:
    findByTag(tag) → paths
    findByAttribute(tag, attr, value) → paths
  Reverse path→keys map keeps invalidation O(keys-touched) instead of
  full-scan. Built in lockstep with the BM25 index; same lifetime,
  same invalidation triggers. Today's consumer is the optional
  `elementHint` filter on SearchOptions; the structural-selector
  grammar will hook in here without further reshaping.

Search service changes (search-knowledge-service.ts):

- File discovery: glob extended from `**/*.md` to `**/*.{html,md}`.
- Indexing: new readIndexableContent() helper dispatches on
  getFormatForRead(filePath). HTML files route through readHtmlTopicSync
  (the BM25 input is bodyText, not raw markup) and contribute their
  element list to the structural-axis index. MD files index their raw
  content as before.
- IndexedDocument gains a `format: 'html' | 'markdown'` field, set
  from the file extension at index time and surfaced on every
  SearchKnowledgeResult entry. Telemetry/query-log consumers
  (T5-shipped QueryLogEntry.format) populate from this without
  format-detection at the call site.
- CachedIndex gains `elementAxisIndex: ElementAxisIndex`, populated
  during buildFreshIndex from the per-document elements walk. The
  empty-tree fallback also constructs an empty index so callers can
  treat the field as non-nullable.
- SearchOptions gains `elementHint?: ElementHint`. When present, the
  search composes a MiniSearch filter that intersects scope +
  element-axis matches before BM25 ranking. No caller passes the hint
  today; M2 selector grammar is the consumer.
- INDEX_SCHEMA_VERSION bumped to 6 so previously-cached indexes built
  against MD-only schema rebuild on next query.
- Title resolution: HTML files use the bv-topic title attribute; MD
  files retain the existing extractTitle frontmatter walk.
- The structural-selector hint is also exposed on the public
  SearchKnowledgeOptions as `SearchKnowledgeElementHint` — wired
  through ToolsSDK so SDK consumers can pass it without going through
  the internal SearchOptions surface.

Tests:

- html-reader (9 cases): bodyText extraction, entity decoding,
  attribute lift, elements-list document order + de-duplication,
  unknown-bv-* skipping, missing bv-topic graceful behaviour,
  malformed input non-throw.
- element-axis-index (10 cases): findByTag/findByAttribute, empty-array
  for unknown tags, case-sensitivity on values, single-file multiple-
  contribution dedup, remove() drops all keys + cleans empty key sets,
  remove() on unknown path no-op, clear().
- search-knowledge-service-html-routing (8 cases): HTML files indexed,
  format field populated for both HTML and MD, raw markup is NOT
  searchable (inner-text only), bv-topic title attribute lifts as
  doc title, elementHint pre-filter (no-match returns empty,
  tag-only restricts, tag.attr=value restricts further).

Pipeline: 7910 unit tests pass (27 new), typecheck clean, lint clean
(0 errors), build clean.

Out of scope (per the M1 plan):
- Structural selector grammar — M2 (this PR ships the substrate).
- findReferences reverse-graph — M2.
- Signal-precise cache invalidation — mtime-based today is sufficient.
- BM25 ranking parity test (MD vs HTML same content) — M2 bench
  measures end-to-end; the routing tests above cover the
  determinism boundary.
- HTML BM25 input now concatenates `<bv-topic>` summary/tags/keywords/related
  into the indexable content. Without this, a term living only in those
  attributes ranked HTML topics far below the markdown equivalent (where
  YAML frontmatter is fed verbatim into BM25).
- HTML title fallback uses `?.trim() || fallbackTitle` so a `title=""` or
  whitespace-only attribute doesn't leak into BM25's 3x-boosted title field.
- `ElementHint` (service + tools-sdk) becomes a discriminated union so
  `attribute` and `value` must be supplied together at compile time.
- `ElementAxisIndex` storage swaps the stringly-keyed
  `${tag}.${attr}=${value}` table for nested Maps — eliminates the
  `.`/`=`-in-attribute collision class without a delimiter discipline.
- `html-reader` lifts attributes off the FIRST `<bv-topic>` encountered
  unconditionally rather than the first non-empty one (matches the
  contract on malformed input).
- Glob discovery does two parallel passes (`**/*.html`, `**/*.md`)
  instead of relying on the underlying engine to expand `{html,md}`.

Tests: +7 (BM25 attribute-payload parity, title-fallback, FS-backed
reader round-trip, topic-attribute lift contract, attribute-name/value
collision robustness x3). All 7917 pass; lint + typecheck clean.
[ENG-2740] query path — HTML reader + element-axis index + search routing
…sponse

Tier 2 (`brv query`'s direct-search response — no LLM) reads the full
file at `query-executor.ts:701` and ships the bytes back to the user
through `formatDirectResponse`. For HTML topics that's raw
`<bv-topic ...><bv-rule severity="must">...</bv-rule>...` markup.

Two consequences observed during the M1 multi-tier audit:

  1. The 5000-char content budget in `direct-search-responder.ts:11`
     burns on tags and attribute syntax before reaching meaningful
     text. Long topics get truncated mid-element.
  2. Any LLM-shaped consumer downstream (Tier 4 agent re-reads, the
     bench harness, third-party integrations) re-parses the HTML.
     Wasted tokens; lossy semantics.

Stripping every tag (bodyText only) loses the typed-element signals
the format exists to carry — severity on rules, subject/value on
facts, id on decisions. New `renderHtmlTopicForLlm(html)` walks the
parsed tree and renders each `<bv-*>` element with a semantic prefix:

  - `<bv-rule severity="must" id="r-1">x</bv-rule>` →
    `- **Rule** [must] (r-1): x`
  - `<bv-fact subject="alg" value="RS256">…</bv-fact>` →
    `- **Fact** (subject=alg, value=RS256): …`

Wired into `tryDirectSearchResponse`: when `result.format === 'html'`
the full content goes through the renderer; markdown topics pass
through verbatim. Renderer is forgiving on malformed input (mirrors
the rest of the parse5-driven reader pipeline); on any throw we fall
back to raw bytes so a single bad topic can't blank the response.

Tests: +15 (12 renderer, 2 Tier 2 HTML branch, 1 Tier 2 markdown
regression guard). 7931 pass; lint + typecheck + build clean.
fix: Tier 2 direct response renders HTML topics into structured markdown
Post-HTML-migration, every QueryLogEntry on disk reports
`format: "markdown"` regardless of whether the recall touched HTML or
markdown topics. Telemetry consumers (query-log summaries, analytics
dashboards, the brv-bench harness) cannot distinguish HTML vs markdown
traffic — every chart flat-lines as "all markdown" even on a fully
migrated tree.

Root cause: T5 (ENG-2741) introduced `MarkdownOnlyFormatDetector` as a
deliberate stub with the explicit comment "swap when the format-detector
task lands the real extension-aware detector." T3 (ENG-2739) shipped
`format-detector.ts` for read-routing inside `search-knowledge-service`
but did not swap the executor's binding. The stub stayed wired as the
default in `QueryExecutor`.

This change:

- Adds `ExtensionAwareFormatDetector` — inspects each
  `matchedDoc.path` extension and returns `'html'` if any HTML doc was
  retrieved, `'markdown'` for legacy-only recalls, `undefined` for
  empty recalls (cache hit, OOD short-circuit, tier 4 LLM-only).
- Single-`'html'` policy is deliberate: any HTML doc is the
  load-bearing signal for telemetry rollups in a partially-migrated
  tree. Reporting `'markdown'` for a mixed result would hide HTML
  traffic from cost / coverage dashboards.
- Strips shared-source `[alias]:` prefixes before extension parsing so
  shared-source paths classify the same as local.
- Defaults to `'markdown'` for extensionless paths (defensive).
- Wires as the default `formatDetector` binding in `QueryExecutor`.
- Keeps `MarkdownOnlyFormatDetector` for tests that pin pre-migration
  semantics; updates its docstring + comment to reflect the new role.

Tests: +9 (extension-aware detector full coverage). 7940 pass; lint
runs through the pre-existing submodule config issue, otherwise clean.
ByteRover-provider runs produce QueryLogEntry records with no
`inputTokens` / `outputTokens` / `cachedInputTokens` /
`cacheCreationTokens` even though Tier 3 calls take seconds of llmMs.
Telemetry consumers cannot compute per-query cost.

Root cause: T5's telemetry pipeline depends on
`GenerateContentChunk.rawResponse` being set on the terminating
chunk so `LoggingContentGenerator.generateContentStream()` can
capture the last non-undefined `rawResponse` and feed it to
`pickRawUsage()` (`.usage ?? .usageMetadata`). The AI-SDK content
generator wires this on its `'finish-step'` case but
`ByteRoverLlmHttpService.extractContentFromResponse()` (the simulated
stream backing every `ByteRoverContentGenerator` call) yielded chunks
without ever attaching the backend response. Result: `lastRawResponse`
stays undefined, no `llmservice:usage` event fires, no tokens land on
the log entry.

Fix: attach the full `GenerateContentResponse` as `rawResponse` on the
terminating chunk in both branches of `extractContentFromResponse` —
empty-content and full-content paths. The `pickRawUsage` helper
already handles the Gemini-shaped (`usageMetadata`) and Claude-shaped
(`usage`) payloads the ByteRover backend returns through this code
path, and the existing `extractUsage('google'|'anthropic')` mappers
take it from there.

Tests: +3 (terminating-chunk rawResponse coverage across content,
empty-parts, and function-call termination cases). 7943 pass.
The companion prompt loaded into every curate-mode invocation
(`curate-detail-preservation.yml`, auto-discovered by
`companion-contributor.ts:discoverCompanionPrompts('curate')` against
the `curate-*.yml` glob) still instructed the agent to "store in
narrative.rules / narrative.highlights / narrative.diagrams /
narrative.structure / narrative.dependencies / narrative.examples /
rawConcept.flow / content.facts / snippets" — every one of which is a
JSON-schema field on the pre-T3 curate-tool API that no longer exists.
T3 (ENG-2739) dropped `useHtmlContextTree` and made HTML the sole
emission format, but T2 (ENG-2738, the HTML-emission contract) only
updated `curate.txt` and missed this companion file.

Empirically observable on the proj/html-mem-conversion branch: a
LoCoMo-bench curate against `conv_26/session_1` produced a `<bv-topic>`
with one giant `<bv-rule>` element wrapping a markdown narrative —
the agent saw "store in narrative.rules" and treated `<bv-rule>` as
the closest match. Other sessions in the same run emitted clean
typed-element shapes; the variance correlates with whether the
legacy companion prompt dominated that turn's context.

This change rewrites the file so every preservation rule maps to the
correct `<bv-*>` element from the closed vocabulary
(`bv-rule`, `bv-task`, `bv-fact`, `bv-decision`, `bv-bug`, `bv-fix`,
`bv-flow`, `bv-diagram`, `bv-highlights`, `bv-structure`,
`bv-dependencies`, `bv-examples`, `bv-pattern`). General preservation
principles (completeness over conciseness, exact preservation, never
summarize) are kept verbatim. Adds an explicit guard at the end of
the prompt telling the agent NOT to use `narrative.*` / `rawConcept.*` /
`content.*` JSON-schema field names so any residual model bias toward
the pre-migration API surfaces is suppressed.

Tests: no test fixture asserts on this file's contents (it's a runtime
agent prompt, exercised via end-to-end curate runs). YAML still parses;
full suite green at 7943.
Two items from the review pass on this branch:

  1. `extractContentFromResponse` had three terminating-chunk paths;
     only two received `rawResponse` in the original fix. The third
     branch — `candidates` array missing or empty (line 193) — fires
     when Gemini blocks output via safety filter or Claude returns a
     refusal-shaped response. Both providers populate `usageMetadata`
     / `usage` on those responses, so dropping `rawResponse` here
     silently loses billable tokens on every safety-block /
     refusal turn. Trivial completion: forward `response` from this
     branch too, matching the other two. New test pins the contract.

  2. JSDoc rewrite on `stripSharedAlias` to justify why the helper is
     retained. Today `path.extname` happens to handle
     `[alias]:rel/path.html` correctly (the colon lives in the dirname,
     not the basename), so the strip is defensive rather than
     load-bearing. The helper stays for two reasons noted in the
     docstring: keeping the detector independent of
     `getFormatForRead`'s internals (so a future switch to URL-style
     parsing doesn't silently break the alias case), and making the
     alias-stripping branch testable in isolation. Prevents a future
     reader from deleting it as apparent dead code.

Tests: +1 (candidates-empty terminating-chunk rawResponse contract).
7944 pass.
fix: extension-aware format detector + byterover tokens + curate prompt
…aceholder

First milestone of byterover-tool-mode: ByteRover stops being an agent
for curate/query and becomes a tool the user's primary agent calls. The
user's LLM (Claude Code, Cursor, etc.) supplies completions; byterover
validates and persists. No provider config inside byterover.

This commit ships the wire-protocol surface for the curate side. The
orchestrator behind it is a placeholder — TKT 02 lands the real state
machine; TKT 03 fills in real prompts; TKT 04 authors the SKILL.md
that drives the loop from Claude Code.

Wire contract (documented at docs/curate-protocol.md):

  Kickoff (env-gated):
    BRV_CURATE_TOOL_MODE=1 brv curate "<text>" --format json
    → {status: needs-llm-step, sessionId, step: generate-html, prompt}

  Continuation (--session implies tool mode):
    brv curate --session <id> --response "<text>" --format json
    → {status: done, filePath} | {status: failed, errors[]}

Without the env var, brv curate runs today's legacy agent-driven path
unchanged — fully additive. The env var is the temporary opt-in; TKT 02
replaces it with a BrvConfig field as part of the real orchestrator.

Placeholder semantics:
  - Kickoff always returns one `needs-llm-step` with a stub prompt
  - Continuation always returns `done` on first valid response
  - State on disk at `.brv/sessions/curate-<id>/state.json` (TKT 02
    moves into daemon task-session sandbox vars when search-service
    integration lands)
  - Empty response → `failed` with kind=empty-response (session kept
    live for retry); unknown sessionId → `failed` with
    kind=unknown-session (terminal)

Tests: 12 placeholder unit tests covering kickoff, continuation
success/failure paths, on-disk state lifecycle, and envelope-contract
guarantees per the protocol doc. Manual round-trip smoke verified the
CLI happy path end-to-end.
cuongdo-byterover and others added 2 commits May 21, 2026 11:10
…-, doc

Four items from the code review:

- Helper API redesign: `assertNoRemovedFlags` (which threw a bare
  `Error`) is now `findRemovedFlagMessage` returning `string | undefined`.
  Caller drives the response — JSON envelope when `--format json` is
  set, `this.error(...)` otherwise. Yields the canonical oclif `›
  Error:` rendering instead of a raw stack-leaking throw.
- JSON-mode envelope: both `brv curate --format json` and
  `brv query --format json` now emit a structured rejection envelope
  on a removed flag instead of dumping a plain-stderr line. Detected
  with a new `argvRequestsJsonFormat(argv)` helper that mirrors the
  recognised forms (`--format json`, `--format=json`).
- `--` terminator: the scan now stops at `--`, matching POSIX
  convention. Same applies to the JSON-format detector. Lets users
  on permissive-parse `Query` pass a literal query that contains a
  hyphen-prefixed token without hitting a false-positive rejection
  (e.g. `brv query -- what does --timeout do`).
- Stale SKILL.md `-f` / `--files` references: line 639 (the "File
  access" paragraph that described the removed --files flow) and
  lines 658–660 (error-table rows for `Maximum 5 files allowed` /
  `File type not supported`) replaced with a single row pointing
  agents at the new migration-error message shape. The other three
  doc files were already clean.

Tests
- 15 unit cases for the helper (was 8; +7 for `--` terminator,
  --flag=value, and argvRequestsJsonFormat).
- Full suite: 8521 passing, 16 pending, 0 failing.
- Typecheck clean. Lint 0 errors / 262 pre-existing warnings.
refactor: [ENG-2880] drop legacy flags from brv curate / brv query
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Claude finished @danhdoan's task in 3m 52s —— View job


Re-review of PR #663 after the ENG-2880 batch (PR #687)

Two PR-specific commits since 4e84552 (c3f17aac4c4a52, merged via #687), plus a main-sync merge (0b98569) that pulls in unrelated v3.14.x work. The ENG-2880 pair is exactly the carry-over UX nit that has been flagged across seven prior reviews on this PR — brv curate declared --detach/-f/-d/--timeout but never consumed them, plus the --timeout description disagreement between brv query and brv dream. Both are now resolved cleanly with a shared removed-flags.ts helper and a clear migration message instead of oclif's default Nonexistent flag rendering.

Score: 4.85 / 5 — net improvement from 4.75/5. The longest-standing live finding on this PR is now closed; only cosmetic / defer-defensible items remain. Ready to merge.

What changed since 4e84552

Commit Scope
c3f17aa ENG-2880 — drops legacy -d/-f/--detach/--timeout from brv curate and --timeout from brv query. Adds shared src/oclif/lib/removed-flags.ts (findRemovedFlagMessage + argvRequestsJsonFormat + CURATE_REMOVED_FLAGS + QUERY_REMOVED_FLAGS). Updates SKILL.md, brv-instructions.md, workflow.md, command-reference.md to match.
c4c4a52 ENG-2880 review fix — JSON envelope on rejection (was a bare throw), this.error(...) instead of stack-leaking throw new Error(), -- terminator honored, scrubs SKILL.md lines 639 + 658-660 of stale -f / --files references. Bumps the helper's test count from 8 → 15.

Strengths

  • Helper-shape is well-chosen. findRemovedFlagMessage returning string | undefined lets the caller drive the response — curate uses emitToolModeEnvelope (structured {errors: [{kind: 'removed-flag', message}], ok: false, status: 'failed'}), query uses writeJsonResponse with its existing {error, status} shape — and either falls back to this.error(message, {exit: 1}) for the human path. Much better than a bare throw new Error(...) which would have stack-leaked through oclif's parser into --format json callers' stdout.
  • -- terminator + --flag=value form. Line 40 stops scanning at the standard POSIX terminator, which fixes the brv query -- what does --timeout do false-positive on permissive-parse (relevant because Query is strict = false). Line 42 matches the = form via startsWith('${f}='), and the = separator naturally prevents --folder-extra prefix collisions.
  • Symmetric scan-then-parse. Both commands run the scan before await this.parse(...), so the migration message surfaces under strict and permissive parse modes alike. Curate (strict) wouldn't have parsed the flag anyway; query (permissive) would have silently swallowed it as positional — both now reject coherently.
  • Doc cleanup is complete. SKILL.md lines 639 + 658-660 (the "File access" paragraph and three error-table rows for Maximum 5 files allowed / File does not exist / File type not supported) are gone, replaced with a single migration-error row. brv-instructions.md / command-reference.md / workflow.md likewise scrubbed. The agent-side description no longer instructs agents to use removed flags.
  • Test design. 15 unit cases cover undefined on no-match, long-form match, short-alias match, --flag=value form, first-match short-circuit, ---terminator (both "stops scanning" and "still matches BEFORE the terminator"), and the new argvRequestsJsonFormat discriminator. The "matches BEFORE --" assertion is the load-bearing case — without it, a user typing brv curate --detach -- "ok" could slip a removed flag through.

Issues

1. nit (new, stale docstring)useHtmlContextTree reference in CurateLogEntry.format doc

src/server/core/domain/entities/curate-log-entry.ts:57-61 still says:

Format mode of the curate output. `'html'` when `useHtmlContextTree` is
on, else `'markdown'`. Settled at task start, not derived from output.

But the PR description explicitly lists useHtmlContextTree as a removed flag — HTML is now the only curate output format. The 'markdown' literal is only retained for back-compat with log entries persisted before the format migration. Also a stray *. on line 60. One-block JSDoc rewrite (suggestion posted inline). Non-blocking.

2. observation (carry-over, non-blocking) — error messages still reference removed brv providers connect/switch commands

Confirmed still standing across:

  • src/oclif/lib/daemon-client.ts:36, 38, 41, 181-182, 274-277
  • src/oclif/commands/dream.ts:144
  • src/tui/utils/error-messages.ts:17, 18, 21
  • src/server/infra/daemon/task-validation.ts:26, 36

brv dream is still the live reachable path. Deferred across multiple reviews; the team's pattern has been post-merge polish. Now that ENG-2880 has knocked down the longest carry-over, this is the next-oldest live item. Fix this →

3. observation (carry-over, non-blocking)<existing-topic> block still inlines system-managed timestamps verbatim

src/server/core/domain/render/curate-prompt-builder.ts:119-123 unchanged. OUTPUT_CONTRACT (line 168) tells the model not to emit createdat/updatedat, but the inlined existing-topic visibly shows them. With tool-mode-dream MERGE candidates now exercising the UPDATE → correction-loop path under meta.previousSummary, worth a one-regex strip pass before the prompt builder shape settles permanently.

4. observation (carry-over, non-blocking)topic-loader.ts regex parser inconsistent with index-generator.ts parse5

src/server/infra/dream/tool-mode/topic-loader.ts:43, 110-155 still uses BV_TOPIC_RE + parseAttributes, while src/server/infra/context-tree/index-generator.ts:208-222 uses parse5-backed parseHtml / walkElements for the same task. Entity-decoded title attributes diverge (&amp; in the regex path stays literal). Latent — topics are writer-produced today — but worth aligning.

5. observation (carry-over, non-blocking) — CLI ConsoleLogger emits chalk on stderr under --format json

src/oclif/lib/curate-session.ts:380 (sidecar-bump) and :423 (index regen) still construct new ConsoleLogger() unconditionally. dream.ts:33 handles this with format === 'json' ? new NoOpLogger() : new ConsoleLogger(). Cheap to plumb format through ContinueOptions.

Praise

  • The two-commit shape. c3f17aa does the surface removal + helper; c4c4a52 does the four polish items from the synchronous code review (JSON envelope, this.error, -- terminator, doc residue). The split keeps each diff focused and reviewable — no "address review feedback" mega-commit. Even the migration sentences read well at human scale: "Pack the folder content into the <bv-topic> HTML directly before calling brv curate" tells the agent what to do, not just what's removed.
  • Tests track the helper's invariants, not its implementation. Cases assert on -- short-circuit behavior, --flag=value matching, and first-match priority — the user-observable contract — rather than internal control flow. Future contributors can refactor the loop without breaking the test suite if the semantics hold.
  • Catch on the -- terminator. The original commit (c3f17aa) didn't include the -- terminator handling; it was added in the review-fix commit (c4c4a52) to handle brv query -- what does --timeout do. Permissive-parse on Query makes this a real ergonomic concern, and the fix is one line in the helper plus one line in the format detector.
  • Doc/code parity. When the agent-side description still mentioned -f / --files, an LLM would have happily emitted those flags into the next brv curate call and gotten the new migration error — a confusing loop. Scrubbing both the command and the SKILL.md it ships closes that gap end-to-end.

Test plan suggestions

  • Optional: integration test that brv curate "ctx" --detach --format json emits the structured rejection envelope on stdout and exits non-zero. Would also lock that success: false survives the writeJsonResponse wrapper.
  • Optional: brv query -- --timeout 600 (positional content containing a removed flag token) succeeds — the -- terminator path needs an end-to-end smoke test, not just the unit-level helper test.

Final disposition

LGTM. ENG-2880 is high-quality follow-up — clean helper shape, complete doc parity, 15-case test coverage with the load-bearing -- and --flag=value cases explicit. Carry-over items are all known and have consistently been polished post-merge.

Todo

  • Identify what changed since 4e84552
  • Spot-check the ENG-2880 surface removal + helper + tests
  • Verify outstanding carry-over items from prior reviews
  • Post inline comments + final summary with score
    · Branch: proj/byterover-tool-mode

Comment on lines +38 to +48
export function findRemovedFlagMessage(argv: string[], removed: RemovedFlag[]): string | undefined {
for (const token of argv) {
if (token === '--') return undefined
for (const {flags, migration} of removed) {
const matched = flags.find((f) => token === f || token.startsWith(`${f}=`))
if (matched) return formatRejection(matched, migration)
}
}

return undefined
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Closes the longest-standing finding from this PR (carry-over across 7 prior reviews). The helper-shape is well-chosen:

  • Returns string | undefined so the caller owns the response shape (JSON envelope vs this.error(...)) — much better than the original throw new Error() shape, which would have stack-leaked through oclif and broken --format json parsers.
  • -- terminator is honored (line 40), which fixes the brv query -- what does --timeout do false-positive case for permissive-parse.
  • --flag=value form is matched correctly via startsWith('${f}='), and the = separator naturally prevents --folder-extra prefix collisions.
  • The error message names the literal token the user typed AND surfaces a migration sentence, not a generic "removed" line.

The 15-case test file (including the ---terminator cases at lines 45-51 and 71-73) locks the contract. Nice end-to-end execution on a finicky bit of CLI ergonomics.

Comment on lines +57 to +62
/**
* Format mode of the curate output. `'html'` when `useHtmlContextTree` is
* on, else `'markdown'`. Settled at task start, not derived from output.
*.
*/
format?: 'html' | 'markdown'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (stale doc): The docstring says 'html' when useHtmlContextTree is on, else 'markdown', but the PR description explicitly removes the useHtmlContextTree flag — HTML is now the only curate output format. Two consequences:

  1. The condition is no longer expressible: there's no flag to check. New writes are always 'html'.
  2. The 'markdown' branch is only set for the back-compat read path (markdown topics still parse via MarkdownWriter), which is the reader concern — but this field tags curate outputs.

If format can still legitimately be 'markdown' (e.g. legacy operations replayed from older log entries), say so explicitly. Otherwise, simplify to 'html' literal and drop the optional. Either way, the useHtmlContextTree reference here will mislead the next contributor doing log-entry archaeology.

Suggested change
/**
* Format mode of the curate output. `'html'` when `useHtmlContextTree` is
* on, else `'markdown'`. Settled at task start, not derived from output.
*.
*/
format?: 'html' | 'markdown'
/**
* Format of the curate output. Tool-mode curate always writes HTML
* (`<bv-topic>`); the `'markdown'` literal is retained for back-compat
* with log entries persisted before the format migration. Settled at
* task start, not derived from output.
*/
format?: 'html' | 'markdown'

Also: stray *. on line 60 inside the docblock — minor JSDoc lint.

cuongdo-byterover and others added 7 commits May 21, 2026 15:48
Tool-mode subcommands (`brv dream scan / finalize / undo / sessions /
cancel`, shipped in ENG-2858) cover the same workflow deterministically
— daemon enumerates candidates, agent decides via `brv curate`, daemon
archives via `finalize`. No provider required. With tool-mode as the
project default the legacy LLM-driven entry surface is orphaned, so
remove it cleanly.

`brv dream` (no subcommand) is now a thin topic root that prints a
migration hint. `--force`, `--timeout`, `--undo` on the topic root are
gone; users hit `brv dream undo` instead. The `buildUndoDeps` export
stays here since the `undo` subcommand imports it.

Removed
- `src/oclif/commands/dream.ts` LLM-driven `run()` body (was a 200-line
  block dispatching dream tasks through the daemon). Topic root keeps
  the `buildUndoDeps` helper for the `undo` subcommand.
- `case 'dream'` in `agent-process.ts` — daemon-side LLM dispatch.
- `AgentIdleTimeoutPolicy.onAgentIdle` dream-eligibility branch and
  `preDispatchCheck` for `task.type === 'dream'` in `brv-server.ts`.
  The auto-dispatch had no provider-config guard, so on tool-mode
  projects it would silently try to dispatch and fail — removal closes
  that latent issue. Idle agents are now just cleaned up.
- `src/server/infra/executor/dream-executor.ts` + its test.
- `src/server/infra/dream/dream-trigger.ts` + its test.
- `src/server/infra/dream/operations/{consolidate,synthesize,prune}.ts`
  + their tests. Tool-mode subcommands use `src/server/infra/dream/
  tool-mode/*` instead.
- `src/server/infra/dream/parse-dream-response.ts` +
  `dream-response-schemas.ts` + their tests — no consumer after the
  operations files are gone.
- `src/oclif/lib/timeout-deprecation.ts` + its test — only consumer
  was the legacy dream command.

Trimmed
- `test/commands/dream.test.ts` rewritten as a 1-case smoke that
  asserts the new topic-root behaviour (prints the subcommand listing).
- `test/unit/infra/runtime-signals/sidecar-failure-logging.test.ts`
  dropped the `dream operations` describe block (consolidate / prune
  sidecar warnings). Other describe blocks (curate-tool, archive
  service, manifest, search, commit-6 wiring) keep their coverage.

Kept
- `src/server/infra/dream/dream-undo.ts`, `dream-state-service.ts`,
  `dream-lock-service.ts`, `dream-log-store.ts` — LLM-agnostic services
  reused by the tool-mode pipeline.
- `src/server/infra/dream/tool-mode/*.ts` — the subcommand pipeline.
- `src/oclif/commands/dream/{scan,finalize,sessions,cancel,undo}.ts`.

Verified
- npm run typecheck: clean
- npm run lint: 0 errors (256 pre-existing warnings)
- npm test: 8337 passing, 16 pending, 0 failing (was 8521 — net -184
  from the dead test deletions; matches the deleted file inventory).

Migration
- `brv dream` (no subcommand) now prints "Use a subcommand: brv dream
  {scan|finalize|undo|sessions|cancel}" and exits 0.
- `brv dream --force` / `--timeout` / `--undo` removed. Use the
  subcommands.
- Background dream is no longer daemon-scheduled; the agent decides
  when to run `brv dream scan` (e.g. post-curate burst, on session
  start). The v2 model is agent-driven cadence.
…lose schema/post-work straggler gates

Two should-fix items + comment cleanup from the reviewer:

1. DREAM_REMOVED_FLAGS was exported and unit-tested but never consumed.
   `brv dream --timeout 30` printed the topic-root hint instead of the
   migration message that names --timeout. Wired
   `findRemovedFlagMessage(this.argv, DREAM_REMOVED_FLAGS)` into the
   topic root with the same JSON-envelope / this.error branching the
   curate and query commands use.

2. Dropped 'dream' from the transport schemas + post-work gate so any
   straggler `type: 'dream'` task that still hits the daemon (older
   client, stale queue, internal stray producer) fails at the schema
   boundary instead of silently completing with result = ''.

   - TaskTypeSchema + TaskExecuteSchema type enums: 'dream' removed.
   - TaskExecuteSchema.force field: removed (dream was the only consumer).
   - TaskExecuteSchema.trigger enum: 'agent-idle' removed (idle-trigger
     dispatch is gone). 'cli' and 'manual' remain.
   - TaskCreateRequestSchema.force field: removed.
   - agent-process.ts post-work gate: dropped `type === 'dream'`.
   - task-router.ts: dropped the orphan `data.force` plumbing.

Cosmetic
- task-router.test.ts: swapped the `type: 'dream'` fixtures (used to
  exercise the generic preDispatchCheck plumbing) to `type: 'curate'`.
  The tests stay — they cover the gate mechanism, not dream-specific
  logic.
- review.ts description: dropped the stale `brv dream needsReview`
  bullet. Tool-mode dream-finalize writes `needsReview: false`
  unconditionally; the toggle has no dream-side effect to describe.
- Cleared stale `dream-executor` comment refs in curate-session.ts,
  dream-state-service.ts, agent-process.ts, and two test files.
- Topic-root exit code: kept exit 0 (matches oclif topic-root default)
  with a clarifying code comment noting the deliberate choice.

Kept
- `'agent-idle'` in dream-log-schema.ts — backwards-read compat for
  existing dream-log entries on disk. No new entries will carry that
  trigger; existing files still parse.

Verified
- npm run typecheck clean
- npm run lint 0 errors (256 pre-existing warnings)
- npm test: 8337 passing, 16 pending, 0 failing
Three FE-breaking output classes addressed by prompt updates across the
7 LLM-visible curate surfaces (SKILL.md, curate-prompt-builder, MCP tool
description, registry, curate.txt, system-prompt.yml,
curate-detail-preservation.yml):

- <li> content must be plain text — no leading -, *, •, or 1./2. markers
  (the renderer adds them via CSS, so a literal prefix double-bullets)
- <bv-diagram> body goes directly inside the element with HTML entities
  for <, >, & — no <![CDATA[…]]> wrap (HTML5 parses CDATA as a bogus
  comment and the first --> in mermaid closes it, corrupting the source)
- `related` distinguishes files from folders by suffix: file targets end
  in .html, folder/domain targets stay bare. Replaces the prior
  "no .html extension" guidance in SKILL.md.

Also adds a read-only related-ref warner that resolves each ref against
.brv/context-tree/ and surfaces warnings for broken (neither file nor
folder exists) and ambiguous (both exist) refs. Surfaced via a new
optional `warnings` field on the curate done envelope (both CLI tool-mode
and daemon curate-html-direct paths). The warner never mutates the
attribute and never rejects writes — refs stay advisory so legitimate
forward references (target curated later) still ship.

Fourth issue from the ticket (<bv-index> description attribute) is
deferred to a follow-up because index.html is server-generated and
needs an end-to-end config + capture UX, not a prompt or content
validator change.
…lope, race-guard stat calls

Two real issues from review:

- MCP curate path was dropping warnings. CurateHtmlDirectResult gained
  an optional `warnings?: readonly string[]` field and renderEnvelope
  now appends ⚠ lines under the ✓ confirmation when present, matching
  the CLI text-mode + JSON envelope behavior.
- The warner used existsSync + statSync, which is race-prone — a target
  deleted between the two calls would throw and bubble out of
  writeHtmlTopic, turning a successful curate into a reported failure
  even though the topic is already on disk. Switched to a single
  swallow-on-error statSync helper (safeStatIsFile / safeStatIsDir).
  Both return false on any error, so a transient FS error surfaces the
  ref as "broken" rather than failing the post-write check. Added a
  test that chmods a parent dir to 0 to exercise the EACCES path.

The reviewer's "lower priority" suggestion to also plumb warnings into
the legacy curate-executor.ts in-daemon path is deferred — that surface
is being phased out and the fix would expand CurationStatus shape.

Other reviewer notes filed as follow-ups (not addressed here):
- lstatSync vs statSync for symlink handling (today the comment overstates
  the symlink guarantee; the resolved-path check only validates the input
  string, not the link target)
- case-insensitive FS edge case on macOS/Windows
- skip ambiguity warning when ref carries an explicit .html suffix
- .md legacy-extension strip for the dwindling MD topics
The previously-warned "both <ref>.html and <ref>/ exist" case is not
actually a failure mode — the suffix the agent wrote IS the choice
(bare → folder, .html → file per the convention) and the FE routes by
suffix. The other on-disk shape is irrelevant. Removing the case and
all references to it:

- checkRef now reduces to a single broken check: warn only when neither
  <ref>.html nor <ref>/ exists.
- HtmlWriteSuccess + CurateSessionEnvelope JSDocs dropped the "broken /
  ambiguous" mention.
- Removed the ambiguous-ref test block from related-ref-warner.test.ts
  (and the both-exist no-warning block that I had repurposed mid-cycle).
  The remaining tests cover the actual behavior cleanly.
- Updated the MCP renderer multi-warning test to use two broken-ref
  fixtures instead of broken + ambiguous.

Also addressed two PR #690 review nits in the same pass:
- Stale comment in related-ref-warner.ts:82-87 still referenced the old
  existsSync + statSync TOCTOU shape; rewrote to describe what the
  current safeStat helpers actually do.
- Added MCP rendering tests for the new ⚠ branch (warnings present,
  warnings empty, warnings field absent — defends against pre-commit
  daemon payloads). Mirrors the CLI-side coverage added previously.

Two review nits deferred and resolved as follow-ups:
- chmodSync(…, 0) test fragility under root uid (Docker node:* images).
- WebUI Tasks panel does not yet decode/render the new warnings field.
…nits

The warner previously passed if EITHER `<ref>.html` or `<ref>/` existed.
That left a symmetric false-negative class to the dropped ambiguous
case: a `.html` ref pointing at a folder (or a bare ref pointing at a
file) passed silently even though the FE routes by suffix and would
show a dead pill.

Tightened to probe only the form the agent's suffix picked:
- `.html` suffix → check `<ref>.html` (file form) only
- bare → check `<ref>/` (folder form) only

A shape mismatch on disk is no longer ignored — it now surfaces as a
broken-ref warning, with a coaching hint on the bare-form case ("add
.html if you meant a file") since "agent forgot the .html suffix" is
the typical mistake. Updated the JSDoc, tightened the broken-ref
messages to name the missing shape ("no file at …" vs "no folder at
…"), and replaced the two now-impossible "bare ref matches file" /
"`.html` ref matches folder" no-warning tests with their inverses
asserting the new warnings.

Same commit also addresses two cosmetic review nits:
- `renderEnvelope` docblock in brv-curate-tool.ts:266-275 updated to
  describe the head + ⚠ lines shape (was "a single confirmation line").
- Stale sequencing comment in html-writer.test.ts:392 ("before the
  disk write") corrected to "after the atomic write".
fix: add HTML output contract rules and related-ref warner
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Claude finished @danhdoan's task in 5m 40s —— View job


Re-review of PR #663 after the ENG-2883 batch (PR #690)

Four PR-specific commits since c4c4a52 (8d5ded0f6ec3c2ffe68565887e2a, merged via #690). The ENG-2883 batch tackles three FE-breaking LLM-output classes via prompt updates across seven LLM-visible surfaces, plus a read-only related-ref warner that surfaces broken related= attribute targets without rejecting writes. The "ship → review → tighten" arc across the four commits is unusually mature: PR #690's review caught a symmetric false-negative (probing both shapes silently accepted dead-pill scenarios), and 5887e2a flips the probe to "only the suffix the agent chose" with the corresponding inverse tests.

Score: 4.85 / 5 — held from prior review. Quality of the new code is high; remaining items are all known carry-overs that have been deferred across multiple reviews. Ready to merge.

What changed since c4c4a52

Commit Scope
8d5ded0 ENG-2883 — adds 3 HTML output contract rules (<li> plain-text, <bv-diagram> body-direct + entity-escape, related file/folder suffix convention) across 7 LLM surfaces (SKILL.md, curate-prompt-builder.ts, brv-curate-tool.ts, registry.ts, curate.txt, system-prompt.yml, curate-detail-preservation.yml). Adds related-ref-warner.ts + wiring through HtmlWriteSuccess.warnings.
f6ec3c2 ENG-2883 review fix — surfaces warnings in the MCP envelope (CurateHtmlDirectResult.warnings?), renderEnvelope appends ⚠ <text> lines, swaps existsSync+statSync for race-resilient safeStatIsFile / safeStatIsDir helpers (test exercises EACCES via chmod 0 on the parent dir).
ffe6856 ENG-2883 review fix — drops the "ambiguous (both exist) refs" warning case (the suffix IS the choice; the other shape is irrelevant); rewrites stale TOCTOU comment; adds MCP rendering tests for the three wire shapes (warnings: [...], warnings: [], warnings absent — defends against pre-PR daemon payloads).
5887e2a ENG-2883 review fix — tightens to probe ONLY the chosen shape; replaces the now-impossible "bare matches file" / ".html matches folder" no-warn tests with their inverses (suffix-mismatch IS a warning now). Adds coaching hint on the bare-form case ("add .html if you meant a file") since "agent forgot the .html suffix" is the typical mistake.

Strengths

  • The four-commit arc is a textbook "ship → review → tighten" sequence. Each commit's body explains WHAT changed, WHY (with the specific failure mode it prevents), and what was DEFERRED (with reasoning). 5887e2a in particular catches a symmetric false-negative that the original commit had — probing both forms silently accepted dead-pill scenarios where the agent's suffix and the on-disk shape disagreed. Replacing the now-impossible no-warn tests with their inverses (the new behavior) is the right shape for a "we changed the contract" follow-up.
  • Race-resilience is documented in source. safeStatIsFile / safeStatIsDir (related-ref-warner.ts:103-117) swallow ALL errors as "not present" with an explicit comment at l.82-89 explaining why: the warner runs post-write and must never turn a successful curate into a reported failure. The chmod-0 test (related-ref-warner.test.ts:146-167) exercises the EACCES path concretely, not just the ENOENT happy path.
  • Test coverage spans all three layers. The warner unit tests (16 cases) cover happy paths, broken refs, suffix-mismatch, race-resilience, and path-traversal. The writer test (html-writer.test.ts:391-427) confirms warnings flow through HtmlWriteSuccess. The CLI side (curate-session.test.ts:168-198) and MCP side (brv-curate-tool.test.ts:434-525) both lock the wire shape (present, empty, absent — including the "older daemon payload" defensive case at line 491).
  • 3 output-contract rules across 7 surfaces — and they actually agree. <li> plain-text rule appears verbatim in 7 places. <bv-diagram> no-CDATA + entity-escape rule appears in 7 places with the same --&gt; example. The related file-vs-folder suffix convention is consistent across SKILL.md, the MCP description, the prompt builder, the system prompt, and curate.txt. The diagram preservation note in curate-detail-preservation.yml:47-57 even includes a worked mermaid example with corrected entities. Previously this kind of multi-surface change had drifted between SKILL.md and the prompt builder; this batch lands them in sync.
  • Honest deferral. The PR fix: add HTML output contract rules and related-ref warner #690 commit messages explicitly list what wasn't addressed and why: legacy curate-executor.ts warnings plumbing (the surface is being phased out, would expand CurationStatus), WebUI Tasks panel doesn't decode the new warnings field yet, symlink resolution (today's comment overstates the guarantee), case-insensitive FS edge case, chmod 0 test fragility under root uid (Docker node:* images). This is the right level of transparency — the operator running real-corpus tests in two weeks won't be confused about whether something is a known gap vs. a real bug.

Issues

1. suggestion.htm suffix inconsistency with the file-reader

Posted inline at related-ref-warner.ts:52. wantsFile = stripped.endsWith('.html') is case-sensitive and only matches .html — but file-context-file-reader.ts:95-98 matches both .html AND .htm (the May-15 fix), and format-detector.ts:24 treats both as HTML. So a topic that exists on disk as foo/bar.htm (migrated, manual moves) is readable as a topic but unreachable from a related="@foo/bar.htm" ref — the warner probes for a folder and surfaces it as broken.

2. observation (UX, batch curate) — forward refs surface as warnings on every first-touch

Posted inline at related-ref-warner.ts:17-19. The header comment correctly says "a forward reference to a topic about to be curated is legit" — and the warner stays advisory rather than rejecting. But the ⚠ still SURFACES on every such ref. For tool-mode dream link/merge candidate flows (paired cross-refs across two topics by design), this is the dominant case. Worth a thought once you have telemetry on how often warnings fire on real-corpus dream runs.

3. observation (security, defer-defensible) — symlink defense-in-depth at related-ref-warner.ts:74-80

Posted inline. The comment says symlinks "could theoretically escape; reject any resolution that does" but path.resolve() operates on the string only — it does NOT follow symlinks. The follow-up lstatSync vs statSync swap is already on the list per f6ec3c2. The actual security impact is minimal here (the warner only returns booleans, no content read).

4. issue (carry-over) — error messages still reference removed brv providers connect/switch commands

Confirmed still standing across:

  • src/oclif/lib/daemon-client.ts:36, 38, 41, 181-182, 274-277
  • src/oclif/commands/dream.ts:144
  • src/tui/utils/error-messages.ts:17, 18, 21
  • src/server/infra/daemon/task-validation.ts:26, 36

brv dream is still the live reachable path. Now the longest-standing carry-over since ENG-2880 closed the curate-flags one. Fix this →

5. observation (carry-overs, non-blocking)
  • <existing-topic> block in curate-prompt-builder.ts:119-123 still inlines system-managed timestamps verbatim. Heavier traffic now with tool-mode-dream MERGE candidates exercising the UPDATE path under meta.previousSummary. One-regex strip pass before the prompt builder shape settles permanently.
  • topic-loader.ts regex parser still inconsistent with index-generator.ts parse5. Latent — topics are writer-produced today — but worth aligning.
  • CLI ConsoleLogger chalk-on-stderr under --format json (curate-session.ts:380, 423). dream.ts:33 solves this with format === 'json' ? new NoOpLogger() : new ConsoleLogger().

Praise

  • Posted inline at related-ref-warner.ts:86-96 — tightening to probe only the suffix the agent chose is the right call, and replacing the "ambiguous" warning with asymmetric-mismatch coverage is a clean net improvement.
  • Coaching hint design. The "bare ref + only file present" warning at l.96 includes add ".html" if you meant a file — that's the specific text an agent encountering the typical "I forgot the suffix" mistake needs to see. Cheap to write, high-leverage on the agent's correction loop. The asymmetry (only the bare-form gets the hint, not the file-form) reflects empirical understanding of which mistake is more common.
  • MCP envelope render-test triad. brv-curate-tool.test.ts:434-525 covers three wire shapes (warnings: [...], warnings: [], warnings absent) — the third case is the load-bearing one for backward compatibility with pre-PR daemon payloads. Most teams would have written just the happy path.
  • Diagram preservation prompt cross-cut. Updating curate-detail-preservation.yml:47-57 to use --&gt; in the mermaid example (instead of leaving the prior --> arrows that would now trip the renderer) shows the kind of multi-file consistency discipline that prevents prompt/contract drift. The corrected example doubles as inline documentation of the rule.

Test plan suggestions

  • A test for the .htm ref case (issue 1) — either assert the current "broken" warning text mentions the suffix mismatch, or add .htm to the wantsFile set and assert clean resolution against a seeded foo/bar.htm file.
  • A test for the symlink-inside-contextTreeRoot case (issue 3) — even if you defer the lstatSync swap, a regression test that locks the current behavior (symlink-to-outside-folder probed via stat follows the link and returns isDirectory()=true → no warning) at least documents what we have today.

Final disposition

LGTM. ENG-2883 is high-quality follow-up work — multi-surface prompt-contract alignment that previously had drifted, a well-tested read-only warner with race-resilient stat probes, and a textbook iterative review-tightening sequence across four commits. The remaining carry-overs are all known, well-documented, and have consistently been polished post-merge.

Todo

  • Identify what changed since c4c4a52 (8d5ded0f6ec3c2ffe68565887e2a)
  • Spot-check the ENG-2883 warner + envelope + prompt-contract updates
  • Verify outstanding carry-over items from prior reviews
  • Post inline comments and final summary with score
    · Branch: proj/byterover-tool-mode

Comment on lines +86 to +96
// target is reported as broken rather than thrown. Probe only the
// shape the agent chose: a `.html` ref against a folder (or a bare
// ref against a file) is a dead pill the FE cannot route, so it
// must surface even when the "other" shape happens to exist.
if (wantsFile) {
if (safeStatIsFile(candidate)) return undefined
return `related ref "${originalRef}" was not found — no file at "${relative}.html" under the context tree`
}

if (safeStatIsDir(candidate)) return undefined
return `related ref "${originalRef}" was not found — no folder at "${relative}/" under the context tree (bare refs target folders; add ".html" if you meant a file)`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Tightening this to probe only the suffix the agent chose is the right call — and the commit message captures the rationale precisely: the suffix IS the decision, and probing both forms would silently accept a dead-pill (a .html ref + only-folder-on-disk would FE-404 either way). The pair with the .html "coaching hint" on bare-vs-file mismatch (l.96) is a nice ergonomic touch — that's the typical "agent forgot the .html" mistake.

The test in related-ref-warner.test.ts:124-142 ("warns when an explicit .html ref points at a folder instead of a file" / "warns when a bare ref points at a file instead of a folder") locks the contract from both directions. Replacing the previous "ambiguous" warning with this asymmetric-mismatch coverage is a clear net improvement.

The post-write race-resilient stat path (l.103-117) is also well-considered: a transient FS error surfacing the ref as "broken" rather than throwing out of writeHtmlTopic is the right priority order — the topic is already on disk, the warner's job is advisory only.

Comment on lines +52 to +53
const wantsFile = stripped.endsWith('.html')
const withoutExt = wantsFile ? stripped.slice(0, -'.html'.length) : stripped
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (consistency with the file-reader): endsWith('.html') here is case-sensitive and only matches the .html suffix — but src/server/infra/context-tree/file-context-file-reader.ts:95-98 matches both .html AND .htm, and format-detector.ts:24 treats both as HTML. So a topic that exists on disk as foo/bar.htm (migrated content, older OS conventions, manual moves) is readable as a topic but unreachable from a related="@foo/bar.htm" ref — the warner here treats .htm as a bare ref, probes for a folder, and flags it as broken even though the file exists.

Three options ranked by surface area:

  1. Cheapest: broaden the suffix test:

    const lower = stripped.toLowerCase()
    const wantsFile = lower.endsWith('.html') || lower.endsWith('.htm')
    const withoutExt = wantsFile ? stripped.slice(0, -(lower.endsWith('.html') ? 5 : 4)) : stripped

    This matches the file-reader's surface.

  2. Cleaner: add a small parseRefSuffix(stripped) → {wantsFile, withoutExt} helper that's shared between the warner and any future ref consumer (FE router, BM25 indexer).

  3. Or just document that .htm refs are explicitly unsupported and surface a dedicated unsupported-extension warning.

Latent — topics today are writer-produced as .html — but this is a small wedge between two sibling code paths that the file-reader already had aligned. Case-sensitivity is also listed in the deferred follow-ups (f6ec3c2 commit body), and a single .toLowerCase() on stripped closes that nit at the same time.

Comment on lines +17 to +19
* The warner never mutates the attribute and never rejects the write
* — refs are advisory metadata, and a "forward reference" to a topic
* about to be curated is legit.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation (UX, batch curate): The header note says "a forward reference to a topic about to be curated is legit" — and the warner correctly stays advisory rather than rejecting. But the warning still SURFACES on every such ref, which means an agent doing batch curation (curate A → curate B → curate A again to add related="@b"`` is the natural pattern, but curate A with related="@B" first → curate B produces one ⚠ on A's write).

For tool-mode dream's link/merge candidate flows (which by design produce paired cross-refs across two topics) this is the dominant case. The agent gets a ⚠ on the first write, then no ⚠ on the second — but the user-facing log line will read like a real problem.

Two thoughts (defer-defensible — not blocking):

  1. Two-pass curate in the orchestrator. Tool-mode dream finalize could write both halves of a link/merge pair, then re-check refs once both are on disk and drop warnings that resolved. Adds complexity; only pays for itself if forward-ref noise becomes an observed pain.
  2. Cheap-now signal in the warning text. When the resolved ref's domain folder DOES exist but the target file/folder doesn't, hint at "forward reference?" alongside the existing "no file at …" / "no folder at …" message. Tells the agent (and reviewer) "this is the routine batch case", not "this is a typo".

Worth a thought once you have telemetry on how often warnings fire on real-corpus dream runs. The asymmetric-mismatch coaching at l.96 already shows good intuition for "shape of the typical mistake"; this is the same principle applied to the other common case.

Comment on lines +74 to +80
// Defence in depth: even after segment filtering, the resolved path
// must remain inside the context-tree root. A symlink in `contextTreeRoot`
// could theoretically escape; reject any resolution that does.
const rootResolved = path.resolve(contextTreeRoot)
if (!isInsideRoot(candidate, rootResolved)) {
return `related ref "${originalRef}" resolves outside the context-tree root and was not checked`
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation (security, defer-defensible): The comment says "A symlink in contextTreeRoot could theoretically escape; reject any resolution that does." But path.resolve() does NOT follow symlinks — it operates on the string only. If <contextTreeRoot>/security is a symlink pointing at /etc, then path.resolve(contextTreeRoot, 'security/passwd') still returns <contextTreeRoot>/security/passwd (string-wise), isInsideRoot returns true, and the subsequent safeStatIs{File,Dir} would follow the symlink and probe /etc/passwd.

The current commit message (f6ec3c2) already flags this as a follow-up: "lstatSync vs statSync for symlink handling (today the comment overstates the symlink guarantee; the resolved-path check only validates the input string, not the link target)". The actual security impact is minimal here: the warner only does stat().isFile() / isDirectory() and returns a boolean — no content is read, no path is logged in detail beyond the warning string the agent is going to see anyway. So an attacker who controls contextTreeRoot symlinks could only learn "is /etc/passwd a file? yes." through warning suppression — not much of an exfiltration vector.

Still worth either:

  • updating the comment to be honest about the guarantee (the input-string check, not the link target), or
  • swapping statSync for lstatSync so symlinks are reported as not-the-expected-shape (a symlink isFile() = false, isDirectory() = false, isSymbolicLink() = true — both probes would surface the ref as broken, which is the correct user-facing result).

Cheap to fix at the same time as the case-insensitive FS deferral. Non-blocking.

RyanNg1403 and others added 8 commits May 21, 2026 18:50
…nt path normalization)

The dream-scan output emits candidate paths with .html (e.g.
auth/jwt.html) and the documented dream→curate merge workflow tells
the agent to write the survivor at that path. The writer previously
appended .html unconditionally, producing auth/jwt.html.html on disk
while reporting status: done and bypassing the path-exists guard —
silent data corruption during merge: the survivor's merged content
landed in a stale file while the agent archived the loser thinking
the merge had succeeded.

topicPathToFilePath now strips a trailing .html before appending.
Both path="x/y" (bare convention) and path="x/y.html" (the form
dream-scan emits) collapse to the same on-disk file at x/y.html and
trigger the path-exists guard identically.

4 new tests in html-writer.test.ts cover: .html-suffixed path writes
to single-extension file; path-exists fires across both forms in
both orderings; --overwrite leaves only one file.
… capture pre-archive mtime/signals so dream undo is bit-exact

Three coupled changes that touch the search-service contract and the
tool-mode dream finalize/undo path.

Search service contract:
- ISearchKnowledgeService gains refreshIndex(): invalidates the cached
  MiniSearch index so the next search() call rebuilds from disk
  regardless of TTL. Required because the service's 5s TTL fast-path
  would otherwise serve a cached index that pre-dates just-loaded
  topics, surfacing zero candidates on a fresh scan.
- SearchKnowledgeOptions gains combineWith?: 'AND' | 'auto' | 'OR'.
  Default 'auto' preserves the historical AND-first-with-OR-fallback
  behavior. Internal SearchOptions references the public option's type
  to keep the two declarations from drifting if a new value is added.
- bm25-pair-discovery passes combineWith: 'OR'. Multi-word distinctive
  titles (e.g. "Redis Caching Layer") AND-combine to terms only the
  source itself contains; OR-mode lets cross-pairs surface based on
  any-term overlap.

Dream finalize / undo signal+mtime restore:
- finalizeDreamSession captures each archived file's mtimeMs (via stat)
  and current RuntimeSignals (via runtimeSignalStore.get) before the
  rename, and returns them on the result alongside previousTexts.
- DreamLogSchema PRUNE op gains optional previousSignals +
  previousMtimes (backward-compat — older logs still undo via fallback).
- agent-process dream-finalize branch writes the new metadata into
  each PRUNE op.
- DreamUndoDeps gains optional runtimeSignalStore. undoPrune restores
  mtime via utimes() and signals via runtimeSignalStore.set() when
  the PRUNE op carries the new fields; falls back gracefully when not.
- buildUndoDeps threads runtimeSignalStore through.

scanDreamCandidates now invokes searchService.refreshIndex() after
loadToolModeTopics and before the parallel candidate-generator block,
so the first scan after seed produces full results.

Tests cover: refreshIndex called pre-discovery; combineWith: 'OR'
passed to search; mtime+signal capture on finalize; mtime+signal
restoration on undo; backward-compat fallback path; mock updates for
the extended ISearchKnowledgeService in unrelated test files.
Both subcommands are intentional v1 placeholders — the daemon doesn't
persist session state, so dream sessions always returns [] and dream
cancel is a no-op. The previous surface advertised them as if they
were functional: descriptions said 'List active...' / 'Discard a...',
JSON envelopes returned success-looking shapes ({sessions: [], status:
ok} / {status: cancelled}), and machine-readable consumers that
branched on the response had no way to know the daemon was never
consulted.

This is a cheap-honesty fix, not a real persistence implementation.
Real persistence is a separate ticket.

Changes:
- DreamSessions.description prefixed with '[v1 stub]' and now names
  the stateless behavior inline.
- DreamCancel.description prefixed with '[v1 stub]' and now names the
  no-op behavior inline.
- Both JSON envelopes gain a 'note' field disclosing v1 status so
  machine consumers see the caveat at the same key they'd branch on.
- Topic-root listing in brv dream's run() body flags both subcommands
  as '[v1 stub]' in the one-liner hint.

4 tests in test/commands/dream.test.ts cover: each static description
contains 'v1 stub'; each --format json envelope's data.note discloses
stateless / no-op semantics.
…, worked example

A skill-discovery sub-agent test surfaced gaps in §7 (brv dream):
prose described the surface but no sample JSON envelopes existed,
prune-signal thresholds were left to the agent to guess, and there
was no end-to-end walkthrough — so consumers had to grep the source
to learn field shapes.

Additions to §7:
- Sample dream scan --format json envelope (sessionId + candidates
  with one example each of link / merge / prune / synthesize shapes,
  including signals + daysSinceModified for prune).
- Sample dream finalize response (archived, skipped, logId, status).
- Prune signal thresholds inline: importance < 35 → low-importance,
  draft >60d / validated >120d → stale-mtime, maturity 'core' never
  surfaced.
- 'Worked example — prune the stalest topic (no curate detour)':
  three-block scan → finalize → undo walkthrough.
- Path-attribute warning in the merge bullet so agents stripping the
  scan-emitted .html before <bv-topic path=> see the guidance.
- Updated undo description to reflect the symmetric mtime + signal
  restoration shipped in this PR — was 'mtime resets to current time';
  is 'content, original mtime, and sidecar runtime signals... all
  restored'.

Companion change in prune-candidates.ts: sync-hint comment near the
threshold constants pointing at SKILL.md §7, so future threshold
changes don't silently drift the agent-facing docs.
…566ac)

Reviewer caught a real race in the refreshIndex() shipped in f5566ac.
The orphan-builder problem:

  T0: caller A invokes search() — acquireIndex starts buildA;
      state.buildingPromise = buildA. The IIFE body is still awaiting
      fileSystem.listDirectory and the eventual buildFreshIndex.
  T1: scanDreamCandidates calls refreshIndex(). Old code cleared
      state.cachedIndex AND state.buildingPromise immediately and
      returned.
  T2: caller C invokes search() — sees both fields cleared, kicks off
      a NEW build (buildB). state.buildingPromise = buildB.
  T3: buildA finally completes and writes state.cachedIndex =
      freshIndexA (line 1016 of acquireIndex).
  T4: buildB completes and writes state.cachedIndex = freshIndexB.

If buildA happens to settle AFTER buildB (e.g. slower disk read,
larger source set), the older freshIndexA wins — reintroducing the
exact TTL-stale-data scenario Fix #2 was meant to prevent. Realistic
on a daemon with concurrent brv search / brv query / dream-scan
tasks.

Fix per reviewer's accepted shape: refreshIndex() now AWAITS any
in-flight buildingPromise before clearing cached state. By the time
the await returns, the in-flight build has either published its
result (which is then cleared) or rejected (swallowed — callers don't
care about that build's outcome, they just want a clean slate). No
orphan publisher can write back after invalidation because the
publishing point is now in the past relative to the clear.

New test file search-knowledge-service-refresh.test.ts covers:
- the contract under test (refreshIndex MUST NOT resolve while a
  build is in flight)
- idempotent behavior on cold state (no in-flight build)
- graceful clear when the in-flight build rejects

Not a follow-up sub-agent regression replay: the race manifests only
under concurrent operations on a single SearchKnowledgeService
instance, which the 12-scenario test matrix (per-scenario daemons,
serial workflow) does not exercise. The unit test at the contract
boundary is the right pin.
Five items from the latest codex review batch on PR #689.

- agent-process.ts: drop the dead-code `?? ''` fallback on
  previousTexts. archived[] paths are invariant-guaranteed to have
  all three fields (text + mtime + signals) populated together, so
  the fallback was already unreachable — and if it ever fired, the
  resulting `{[path]: undefined}` would violate the new PRUNE
  schema's z.record(z.string(), z.number()) for previousMtimes. All
  three lookups now agree: trust the invariant uniformly.

- search-knowledge-service.ts: extend refreshIndex docstring with a
  maintainer note explaining that the explicit `state.buildingPromise
  = undefined` after the await is defensive (acquireIndex's finally
  block already nulls it). Also documents the benign double-build
  inefficiency when a fresh acquireIndex() races into the gap between
  await-resolve and clear — not a correctness issue, just non-obvious.

- SKILL.md §7: soften the merge bullet's path-attribute warning. The
  prior "drop the trailing .html — the writer always appends it"
  instruction was load-bearing before the path-doubling fix shipped;
  now that the writer normalizes idempotently (both bare and
  .html-suffixed forms resolve to the same on-disk file), the
  instruction reduces to agent-facing cognitive load. Updated to
  "either form works; the writer normalizes" so the agent doesn't
  have to track which form scan emitted.

- dream.test.ts: add two regression tests for the DREAM_REMOVED_FLAGS
  wire-up in dream.ts's run() body. (1) text mode: --timeout 30
  rejects with this.error throwing before this.log runs; (2) json
  mode: --timeout 30 --format json emits a JSON envelope with
  success:false / data.status:'error' / data.error containing
  '--timeout'. Pins the early-exit ordering so a future refactor
  that drops the findRemovedFlagMessage(this.argv, ...) call fails
  the suite instead of silently regressing to pre-removal state.

- schemas.ts: hoist TaskTypeSchema declaration above
  TaskExecuteSchema (the schema previously sat ~100 lines below) and
  replace TaskExecuteSchema.type's inline z.enum(...) with a
  reference to TaskTypeSchema. The doc block at the original
  declaration site claimed it was the single source of truth, but
  the inline enum on TaskExecuteSchema.type silently violated that
  — the only outlier; TaskCreatedSchema.type (now line 605) and
  TaskCreateRequestSchema.type already referenced TaskTypeSchema.
  Hoist needed to avoid a Zod TDZ at module load. No behavioral
  change; schemas now share one literal enum source.
refactor: [ENG-2884] remove legacy LLM-driven brv dream path
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

Claude finished @danhdoan's task in 7m 51s —— View job


Re-review of PR #663 after the ENG-2884 batch (PR #689)

Two PR-specific blocks since 5887e2a: the ENG-2884 refactor (13d83a67d77a8e → merge #6897a397baf5566ac2e138a653004fd8c7c1cce3f96e8) which removes 4000+ LOC of the legacy LLM-driven brv dream pipeline and replaces the topic root with a thin migration-hint stub. Tool-mode subcommands (brv dream {scan,finalize,undo,sessions,cancel}) cover the same workflow deterministically. The batch also fixes a path-doubling silent-data-loss bug in the dream→curate merge handoff, expands previousMtimes / previousSignals capture so prune undo is bit-exact, and catches a real refresh-index race in PR #689 review (8c7c1cc).

Score: 4.85 / 5 — held from prior review. The refactor is unusually clean for its size (32 files deleted, net -5825 LOC) and the in-cycle bug catches show real review discipline.

What changed since 5887e2a

Commit(s) Scope
13d83a6 ENG-2884 refactor — removes dream-executor.ts, dream-trigger.ts, consolidate/synthesize/prune LLM ops, dream-response-schemas.ts, parse-dream-response.ts, timeout-deprecation.ts. brv dream is now a thin topic root printing a migration hint. Net –4000 LOC, –14 files.
7d77a8e ENG-2884 PR #689 review — wires DREAM_REMOVED_FLAGS into the topic root (was exported but never consumed); drops 'dream' + force + 'agent-idle' from TaskTypeSchema / TaskExecuteSchema / TaskCreateRequestSchema. Closes the straggler-task gap (a stale type:'dream' now fails at the schema boundary instead of silently completing with empty result).
7a397ba Strip trailing .html in topicPathToFilePath so path="x/y" and path="x/y.html" both resolve to the same on-disk file — fixes the merge-workflow silent-data-loss bug where the survivor landed in x/y.html.html while the agent archived the loser thinking the merge succeeded.
f5566ac searchService.refreshIndex() + SearchKnowledgeOptions.combineWith + pre-archive mtime/signals capture on finalize. DreamLogSchema.PRUNE gains optional previousSignals / previousMtimes. undoPrune restores via utimes() + runtimeSignalStore.set().
2e138a6 Discloses [v1 stub] status on dream sessions / dream cancel in descriptions + JSON envelopes (data.note: 'v1: ...').
53004fd SKILL.md §7 — sample envelopes, prune-signal thresholds, worked example. Sync-hint comment on the prune threshold constants pointing at SKILL.md.
8c7c1cc Race-fix on refreshIndex from PR #689 reviewer catch — awaits in-flight build before clearing, so orphan builders can't write back after invalidation. Maintainer-note documents the benign double-build window. New search-knowledge-service-refresh.test.ts (3 cases).
e3f96e8 PR #689 codex review fixes — drop dead-code ?? '' fallback on previousTexts (invariant-guaranteed populated together); hoist TaskTypeSchema above TaskExecuteSchema to make it the single source for both (was inline-duplicated on TaskExecuteSchema.type); SKILL.md path-attribute note softened (writer normalizes either form).

Strengths

  • Refactor scope is right-sized. Removed exactly the dead surface: LLM-driven entry, daemon-side dream-trigger, agent-idle dispatch branch, response parsers, timeout-deprecation helper (no consumer post-removal). Kept LLM-agnostic services (dream-undo, dream-state-service, dream-lock-service, dream-log-store) that the tool-mode pipeline reuses. The commit message's "Removed / Trimmed / Kept" tri-list is unusually clear.
  • 8c7c1cc race fix — see inline comment at search-knowledge-service.ts:1123. The orphan-builder window was easy to miss and the unit-test contract (refresh must not resolve while a build is in flight) is the cheapest possible pin.
  • mtime + signal capture on finalize. The metadata that drove the prune decision (importance, maturity, recency, mtime) is now restorable, so a topic archived as low-importance or stale-mtime returns with the same prune-eligibility on the next scan. Without this, undo silently mutated observable state and the topic stopped re-surfacing — a quiet correctness regression that would only show up under repeated dream cycles. The three-case test triad in dream-undo.test.ts (capture-and-restore, restore-via-utimes, restore-via-signal-store, legacy-log-fallback) is the right shape.
  • Path-doubling fix is symmetric. Two reading orderings + confirmOverwrite round-trip = 4 new tests; not just the doubled-extension case in isolation. See inline comment at html-writer.ts:274 for one small follow-up (.htm extension symmetry).
  • Schema-source consolidation. TaskTypeSchema is now the single source for TaskExecuteSchema.type, TaskCreateRequestSchema.type, and TaskCreatedSchema.type — previously TaskExecuteSchema.type had an inline z.enum(...) that silently violated the "single source of truth" doc above the declaration. e3f96e8 hoisted the declaration to fix the Zod TDZ at module-load and removed the inline duplicate. Quiet drift that nobody flagged in three prior reviews.
  • Honest v1-stub disclosure. [v1 stub] prefix in command descriptions + data.note field in JSON envelopes for dream sessions / dream cancel. Machine consumers branching on sessions.length or status: 'cancelled' see the caveat at the same key shape they'd inspect.

Issues

1. observation (semantic deviation, defer-defensible)previousSignals captured for never-tracked topics

Posted inline at dream-session.ts:221. runtimeSignalStore.get() returns validateOrDefault(...) — i.e. system defaults when the sidecar has no entry. On a never-tracked topic, undo writes those defaults via set(), creating a sidecar entry that didn't exist before the archive. Strictly speaking, that's not "undo to original state" — it's "undo to original-state + defaults". Defensible (a future access would mutate the defaults anyway), but worth one line in the docstring.

2. suggestion (consistency) — strip .htm too in topicPathToFilePath

Posted inline at html-writer.ts:274. Latent today (dream-scan only emits .html), but /\.html?$/i matches both .html and .htm at no extra cost, and the file-reader / format-detector already handle both — this is the only path-shape resolver that doesn't.

3. nit (CLI behavior asymmetry) — JSON-mode removed-flag rejection exits 0, text-mode exits 1

Posted inline at dream.ts:94. Same pattern as the curate/query precedent, so consistency with the rest of the CLI is preserved. But set -e shell wrappers see different exit codes for the same logical error depending on the --format flag. Either document, or set process.exitCode = 1 before the return.

4. issue (stale comment, post-ENG-2884)task-router.ts:138-139 still references the removed dream consumer

The preDispatchCheck doc says "Used for dream task type to enforce gates 1-3 (time, activity, queue) even on the CLI dispatch path — mirrors the idle-trigger pre-check pattern." But the 'dream' task type is gone from TaskTypeSchema, and a grep shows no caller passes a preDispatchCheck anymore (only the type plumbing remains). The new task-router.test.ts correctly swapped its fixtures to type: 'curate' — this comment is the last straggler.

Worth either updating to "Currently unused — kept for future per-type gating" or trimming the dream-specific paragraph. (Can't post inline since this exact line wasn't touched in the PR's diff.)

5. observation (carry-over)daemon-client.ts / task-validation.ts still reference brv providers connect/switch / /provider REPL

Carry-over from prior reviews — but the reachable-impact analysis has shifted significantly since the prior review. With the LLM-driven dream path gone, the validateProviderForTask gate (agent-process.ts:513) is now skipped for all five user-facing tool-mode types (search, query-tool-mode, curate-html-direct, dream-scan, dream-finalize). The only paths that still hit provider validation are the legacy 'curate' / 'curate-folder' / 'query' task types — and the user-facing CLI commands (brv curate, brv query) have all been migrated to the new tool-mode types.

Real-world reachability is now near-zero unless a stale MCP client or an older CLI binary still produces legacy task types. Worth a one-pass scrub regardless, but the urgency from the May 19 review is gone.

6. observations (carry-overs, unchanged)
  • topic-loader.ts regex parser still inconsistent with index-generator.ts parse5.
  • <existing-topic> block in curate-prompt-builder.ts:119-123 still inlines system-managed timestamps.
  • CLI ConsoleLogger chalk-on-stderr under --format json (sidecar-bump + index-regen in curate-session.ts).

Praise

  • PR refactor: [ENG-2884] remove legacy LLM-driven brv dream path #689 review catch on the refresh-index race — saving a real correctness bug from landing. Pulling the cycle in three days (ship f5566ac → reviewer catches the orphan-builder window → ship 8c7c1cc with explicit unit-test contract) is the model "ship → review → tighten" arc.
  • Codex review-fix discipline in e3f96e8 — five items, each scoped to one file, each with a why-this-matters line in the commit message. The TaskTypeSchema hoist+dedupe fixes drift that three prior human review rounds missed because it lived behind a "single source of truth" doc comment that was technically lying.
  • Test count delta is honest. Net -184 tests is exactly what the deleted file inventory predicts — no orphan suites pointing at vanished modules, no zombie tests pinning long-dead behavior.
  • Migration text quality. dream cancel's text-mode output (l.41-46) tells the user that curate writes between scan and cancel are NOT rolled back by brv dream undo, and points to brv review reject as the right rescue. That's the specific rescue path a confused user needs — and most teams would have shipped a generic "session cancelled" line.
  • Sub-agent test design. search-knowledge-service-refresh.test.ts injects state.buildingPromise directly rather than going through a full IFileSystem mock, because the contract under test is exactly that interaction. Pragmatic, well-scoped, leaves a 28-line file that any future maintainer can read top-to-bottom.

Final disposition

LGTM. ENG-2884 is high-quality refactor work — the size doesn't represent risk because the removed surface was orphaned and the kept surface is fully covered. The mid-cycle catches on 8c7c1cc (race) and e3f96e8 (schema drift) are the kind that pay for themselves immediately. Remaining items are all defer-defensible or post-merge polish.

Todo

  • Identify what changed since 5887e2a
  • Spot-check the ENG-2884 refactor + ENG-2884 sub-fixes (13d83a6, 7d77a8e, 7a397ba, f5566ac, 2e138a6, 53004fd, 8c7c1cc, e3f96e8)
  • Verify outstanding carry-over items from prior reviews
  • Post inline comments + final summary with score
    · Branch: proj/byterover-tool-mode

* point that encodes that convention.
*/
function topicPathToFilePath(contextTreeRoot: string, topicPath: string): string {
const normalized = topicPath.replaceAll('\\', '/').replace(/^\/+/, '').replace(/\.html$/i, '')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Idempotent .html stripping is the right shape for the dream→curate handoff. The doc above (l.260-271) names the failure mode it prevents (auth/jwt.html.html, silent path-exists bypass, archived loser + stale survivor), and the four new tests in html-writer.test.ts lock the contract from both orderings (bare-first-then-suffixed + suffixed-first-then-bare + confirmOverwrite round-trip). That's the right test triad for a "two input shapes must resolve to one file" invariant — it would have been easy to just test the doubled-extension case in isolation and miss the symmetry.

One micro-nit: the case-insensitive regex /\.html$/i correctly handles .HTML etc., but file-context-file-reader.ts:95-98 and format-detector.ts:24 both treat .htm as HTML too. If path="x/y.htm" ever shows up (manual migration, legacy import), this resolver would keep the .htm suffix, append .html, and produce x/y.htm.html — same doubled-extension problem this fix was designed to prevent, just one extension off.

Latent today (dream-scan emits .html only), but for full symmetry with the reader, consider:

const normalized = topicPath.replaceAll('\\', '/').replace(/^\/+/, '').replace(/\.html?$/i, '')

The s? matches .html and .htm identically. Same regex character cost.

* may start a parallel build. That's a benign double-build
* inefficiency, not a correctness issue.
*/
async refreshIndex(): Promise<void> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise (the race fix in 8c7c1cc): The original refreshIndex() in f5566ac had the orphan-builder race — clearing cachedIndex + buildingPromise while a prior acquireIndex() build was still in flight, then the orphan build resolved and wrote back to state.cachedIndex (line ~1016), defeating the invalidation. Awaiting the in-flight before clearing is the right shape: by the time the await returns, the in-flight has either published its result (which is then cleared) or rejected (which is swallowed — the caller doesn't care about the orphan's outcome). No publisher can write back after invalidation because the publishing point is now in the past relative to the clear.

The maintainer note about state.buildingPromise = undefined being defensive (l.1111-1121) is unusually thoughtful — it identifies the only post-fix race window (a fresh acquireIndex() that captures a new buildingPromise between await resolution and the explicit clear, where the clear then nulls the new pointer leaving a benign parallel double-build) and explicitly classifies it as performance-only, not correctness. That class of "we know about this edge case and it's intentional" comment is exactly what prevents the next contributor from "fixing" the non-bug.

The unit test in test/unit/agent/tools/search-knowledge-service-refresh.test.ts pins the contract (refresh-must-not-resolve-while-building) directly via state.buildingPromise injection — the cheapest test that locks the behavior at the seam. Cold-state idempotency and rejected-build cases round it out. Three cases, every one load-bearing.

content = await readFile(source, 'utf8')
const stats = await stat(source)
mtimeMs = stats.mtimeMs
signals = await runtimeSignalStore.get(relPath)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation (semantic deviation on never-tracked topics): runtimeSignalStore.get(relPath) never returns undefinedvalidateOrDefault ensures a default RuntimeSignals is returned when the sidecar has no entry for relPath (runtime-signal-store.ts:49-52). For a topic that was created without ever going through the sidecar-writing path (rare today, but possible: hand-authored files, restored from a non-tool-mode archive, future migration tools), the captured previousSignals[path] is the default object, not "no signals".

On undo, undoPrune then writes those default signals via runtimeSignalStore.set() — which creates a sidecar entry that didn't exist before the archive. Not strictly an "undo to original state" — the post-undo sidecar inventory now has one more entry than the pre-archive inventory. Defensible (the defaults are now explicit, and the next access bump will mutate them anyway), but worth noting in the docstring or capturing as a discriminator.

If you want strict bit-exactness on this edge case, two options:

  1. Probe the sidecar's raw existence via keyStorage.get before capturing, and store undefined in previousSignals[path] when the sidecar entry was genuinely absent. Then undoPrune either calls set() or delete() accordingly.
  2. Have runtimeSignalStore.getRaw(relPath): Promise<RuntimeSignals | undefined> so the dream-finalize path can distinguish "present-as-defaults" from "absent".

Non-blocking — the deviation only affects never-tracked topics, which are not the dream-scan target population. Worth one line in the previousSignals docstring at l.150-163: "For topics with no pre-archive sidecar entry, the captured snapshot is the system default — undo will create a sidecar where one didn't exist."

const removed = findRemovedFlagMessage(this.argv, DREAM_REMOVED_FLAGS)
if (removed) {
if (argvRequestsJsonFormat(this.argv)) {
writeJsonResponse({command: 'dream', data: {error: removed, status: 'error'}, success: false})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (consistency): The text-mode branch uses this.error(removed, {exit: 1}) (l.98), which throws and sets exit 1. The JSON-mode branch uses writeJsonResponse(...) then return — implicitly exit 0 — but the envelope carries success: false / data.status: 'error'.

The discrepancy is intentional and matches the curate/query precedent (both do the same thing), but a CLI caller wrapping brv dream with set -e will get different shell-script behavior between brv dream --timeout 30 (fails) and brv dream --timeout 30 --format json (succeeds). Two clean options if scripts that branch on exit code matter:

  • Document the asymmetry in removed-flags.ts (it's an implicit contract today).
  • Set a non-zero exit after the JSON envelope: process.exitCode = 1; return. The envelope still flushes to stdout cleanly, and shell callers see consistent failure semantics.

Not blocking — the JSON envelope's success: false is the machine-readable truth, and set -e users typically inspect .success. But this is the same asymmetry that bit MCP integrations on the pre-ENG-2837 curate path; worth being explicit about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants