Skip to content

fix: [ENG-2676] strip derived-artifact paths from related: frontmatter#618

Open
cuongdo-byterover wants to merge 3 commits intomainfrom
fix/ENG-2676
Open

fix: [ENG-2676] strip derived-artifact paths from related: frontmatter#618
cuongdo-byterover wants to merge 3 commits intomainfrom
fix/ENG-2676

Conversation

@cuongdo-byterover
Copy link
Copy Markdown
Collaborator

*.abstract.md and *.overview.md (and other derived artifacts) are excluded from the push payload via isExcludedFromSync(). When they landed in a doc's related: field, the parent doc shipped to remote with dangling references to siblings that were never sent.

Two write paths could produce them; both now filter:

  • consolidate.ts addRelatedLinks (dream's CROSS_REFERENCE) filters incoming AND pre-existing entries, so each touch opportunistically cleans up legacy dangling refs.
  • curate-tool.ts filterValidFiles drops excluded paths from relations before MarkdownWriter writes them — covers both ADD and UPDATE since both call sites share the helper.

Pre-existing dangling refs in untouched docs are out of scope; a separate migration can scrub them.

Summary

  • Problem:
  • Why it matters:
  • What changed:
  • What did NOT change (scope boundary):

Type of change

  • Bug fix
  • New feature
  • Refactor (no behavior change)
  • Documentation
  • Test
  • Chore (build, dependencies, CI)

Scope (select all touched areas)

  • TUI / REPL
  • Agent / Tools
  • LLM Providers
  • Server / Daemon
  • Shared (constants, types, transport events)
  • CLI Commands (oclif)
  • Hub / Connectors
  • Cloud Sync
  • CI/CD / Infra

Linked issues

  • Closes #
  • Related #

Root cause (bug fixes only, otherwise write N/A)

  • Root cause:
  • Why this was not caught earlier:

Test plan

  • Coverage added:
    • Unit test
    • Integration test
    • Manual verification only
  • Test file(s):
  • Key scenario(s) covered:

User-visible changes

List user-visible changes (including defaults, config, or CLI output).
If none, write None.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording

Checklist

  • Tests added or updated and passing (npm test)
  • Lint passes (npm run lint)
  • Type check passes (npm run typecheck)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • Documentation updated (if applicable)
  • No breaking changes (or clearly documented above)
  • Branch is up to date with main

Risks and mitigations

List real risks for this PR. If none, write None.

  • Risk:
    • Mitigation:

`*.abstract.md` and `*.overview.md` (and other derived artifacts) are
excluded from the push payload via isExcludedFromSync(). When they
landed in a doc's `related:` field, the parent doc shipped to remote
with dangling references to siblings that were never sent.

Two write paths could produce them; both now filter:

- consolidate.ts `addRelatedLinks` (dream's CROSS_REFERENCE) filters
  incoming AND pre-existing entries, so each touch opportunistically
  cleans up legacy dangling refs.
- curate-tool.ts `filterValidFiles` drops excluded paths from
  `relations` before MarkdownWriter writes them — covers both ADD and
  UPDATE since both call sites share the helper.

Pre-existing dangling refs in untouched docs are out of scope; a
separate migration can scrub them.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Claude finished @cuongdo-byterover's task in 3m 45s —— View job


PR Review — fix: [ENG-2676] strip derived-artifact paths from related: frontmatter

Score: 3/5 — solid, well-scoped fix on the dream/CROSS_REFERENCE side, with one meaningful gap on the curate side that contradicts the PR description.

  • Read changed files (curate-tool.ts, consolidate.ts, both tests)
  • Cross-checked derived-artifact.ts, conflict-resolver.ts, conflict-detector.ts, and the surrounding curate UPDATE flow
  • Verified bug fix and identified edge cases
  • Posted inline comments + this summary

What's good

  • isExcludedFromSync is the right predicate. It's already the canonical "won't be pushed" gate (sync/snapshot/merge), so reusing it in the writers keeps the invariant consistent across the system instead of inventing a new filter.
  • consolidate.ts cleans both incoming AND pre-existing entries. The opportunistic-cleanup-on-touch behavior matches the Why: in the PR body and is captured by the new test (auth/legacy.abstract.md assertion at consolidate.test.ts:332).
  • Comments explain the why ("they'd be dangling refs on remote"), not the what. Future readers won't have to re-derive the motivation.

What's missing or worth tightening

  1. (blocking) curate UPDATE leaks derived paths back via structural-loss merge. Inline comment on curate-tool.ts:996. Summary: filterValidFiles cleans proposed.relations, but resolveStructuralLoss at line 1021 union-merges existingParsed.relations (unfiltered) back in. If the existing file had related: ['a.md', 'a.abstract.md'], the final write still contains a.abstract.md — and lostRelations falsely fires, also bumping impact to 'high' (false-positive review trigger). The PR description explicitly claims UPDATE is covered ("covers both ADD and UPDATE since both call sites share the helper") — it isn't, in the legacy-data case. Fix is a one-line filter on existingParsed.relations before structural-loss detection (or push the filter into conflict-resolver.ts).
  2. (should-fix) test gap. Only the ADD path is exercised. Please add a test that calls UPDATE against a pre-seeded file whose related: contains a derived-artifact path, and assert it is stripped. Without that, the regression that fix (1) addresses can silently come back.
  3. (consider) executeCrossReference doesn't filter target files. Inline comment on consolidate.ts:626-628. The LLM may include .abstract.md/.overview.md in action.files, in which case the loop calls addRelatedLinks on a derived artifact — i.e., writes frontmatter to a file owned by the abstract pipeline. The new filter only cleans the relatedPaths payload; targets are still vulnerable. Defensive-only — feel free to defer.
  4. (nit) empty-result write. consolidate.ts:681-682 will still write related: [] frontmatter to a previously-clean file when both incoming and (post-filter) existing are empty. Cosmetic.

Out of scope (correctly noted in PR body)

Pre-existing dangling refs in untouched docs — agreed, separate migration. The current PR's opportunistic cleanup on dream's CROSS_REFERENCE is a nice middle ground.


· branch fix/ENG-2676

Comment thread src/server/infra/dream/operations/consolidate.ts Outdated
… UPDATE

PR #618 review flagged that the curate UPDATE path could still leak
derived-artifact entries into `related:` because `existingParsed.relations`
flowed unfiltered into both `detectStructuralLoss` (false-positive
`lostRelations`, bumping impact to high) and `resolveStructuralLoss`
(union-merging the legacy bad data back into the final write).

Filter `existingParsed.relations` through isExcludedFromSync immediately
after parse so all downstream conflict logic sees a clean view. Adds an
UPDATE-path test that pre-seeds a stale `.abstract.md` entry and asserts
it is stripped.

Also fixes the reviewer nit on consolidate.ts:681 — early-return when
`incoming` is empty and there's no existing frontmatter, so we don't
wrap an otherwise-clean file in `---\nrelated: []\n---\n`.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Claude finished @cuongdo-byterover's task in 3m 8s —— View job


PR Review — fix: [ENG-2676] strip derived-artifact paths from related: frontmatter

Score: 4/5 — solid follow-up from 3/5. The blocking UPDATE structural-loss leak is properly fixed; new test locks in the regression-prevention. Remaining items are non-blocking refinements.

  • Read changed files (curate-tool.ts, consolidate.ts, both tests)
  • Verify the structural-loss leak fix on UPDATE path
  • Verify test coverage for UPDATE path with pre-seeded derived-artifact relations
  • Check executeCrossReference target filtering
  • Check empty-result write nit
  • Post inline comments + summary

What's good

  • Blocking UPDATE leak from prior review is fixed at curate-tool.ts:1000-1003. Filtering existingParsed.relations before detectStructuralLoss (and the union-merging resolveStructuralLoss at conflict-resolver.ts:130) is the correct point of intervention. As a bonus, this also eliminates false-positive lostRelations counts in conflict-detector.ts:75 that would have erroneously elevated impact to 'high' and triggered review.
  • Mutation safety verified. MarkdownWriter.parseContent returns a fresh ContextData object each call, so mutating existingParsed.relations in place doesn't leak state to other callers.
  • Test coverage closes the gap. The new curate-tool.test.ts:638-688 test exercises the exact UPDATE-with-legacy-data scenario flagged previously — pre-seeded .abstract.md in related: is stripped, while a legitimate sibling (sunrise_cafe_menu.md) survives the structural-loss merge.
  • Empty-result write nit partially addressed. The early return at consolidate.ts:681 correctly avoids wrapping a previously-clean file in ---\nrelated: []\n---\n.

What's worth tightening (all non-blocking)

  1. executeCrossReference target files still unfiltered (consolidate.ts:626-628). addRelatedLinks filters payload + existing entries, but if the LLM ever puts a .abstract.md / .overview.md in action.files, we'd write related: frontmatter into a derived artifact. Inline suggestion includes a one-line fix. Defensive only.
  2. Empty-result frontmatter wrap, partial coverage (consolidate.ts:664-672). The incoming.length === 0 guard at line 681 handles the no-frontmatter case, but a file that has frontmatter without a related key still gets parsed.related = [] written if both arrays end up empty. Cosmetic.
  3. Test could lock in the impact-elevation invariant (curate-tool.test.ts:638-688). The new test asserts file content; consider adding an assertion on result.applied[0].impact to also pin down the secondary effect — that the fix prevents false-positive 'high' elevation from lostRelations.
  4. Tighten the existing-relations guard (curate-tool.ts:1000). if (existingParsed?.relations) matches empty arrays too — ?.relations?.length skips the no-op filter. Pure cosmetic.

Out of scope (correctly noted in PR body)

Pre-existing dangling refs in untouched docs — agreed, separate migration. The opportunistic-cleanup-on-touch behavior is the right middle ground.

Verification not done

  • npm test / npm run lint / npm run typecheck not run — dependency install was unavailable in this environment. CI on the PR is the source of truth.

· branch fix/ENG-2676

Comment on lines +1000 to +1003
if (existingParsed?.relations) {
// Drop legacy dangling refs before conflict-detection; otherwise resolver unions them back.
existingParsed.relations = existingParsed.relations.filter((r) => !isExcludedFromSync(r))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: This addresses the blocking issue from the prior review cleanly. Filtering existingParsed.relations here (before detectStructuralLoss at line 1023 and resolveStructuralLoss at line 1026, which union-merges existing.relations with proposed.relations per conflict-resolver.ts:130) ensures legacy dangling refs get stripped on UPDATE rather than re-injected via structural-loss recovery. As a bonus, this also prevents false-positive lostRelations bumps in conflict-detector.ts:75 that would have elevated impact and unnecessarily triggered review.

Mutating existingParsed.relations in place is safe — MarkdownWriter.parseContent (line 875) returns a fresh ContextData object, and existingParsed is local to this function.

nitpick (non-blocking): if (existingParsed?.relations) matches both empty and populated arrays. Could tighten to if (existingParsed?.relations?.length) to skip the no-op filter when the array is already empty. Pure cosmetic.

Comment on lines 664 to 672
try {
const parsed = yamlLoad(yamlBlock) as null | Record<string, unknown>
if (parsed && typeof parsed === 'object') {
const existing = Array.isArray(parsed.related) ? (parsed.related as string[]) : []
parsed.related = [...new Set([...existing, ...relatedPaths])]
const existing = (Array.isArray(parsed.related) ? (parsed.related as string[]) : [])
.filter((p) => !isExcludedFromSync(p))
parsed.related = [...new Set([...existing, ...incoming])]
const newYaml = yamlDump(parsed, {flowLevel: 1, lineWidth: -1, sortKeys: false}).trimEnd()
await atomicWrite(filePath, `---\n${newYaml}\n---\n${body}`)
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (non-blocking): This handles the no-frontmatter empty-result case nicely (early return at line 681), but if the file has frontmatter without a related key and incoming ends up empty after filtering, we still set parsed.related = [] and rewrite the file — adding an empty related: [] field that wasn't there before. Edge case, cosmetic.

Could short-circuit when nothing meaningful changed:

const existingArr = Array.isArray(parsed.related) ? (parsed.related as string[]) : []
const existing = existingArr.filter((p) => !isExcludedFromSync(p))
if (existing.length === existingArr.length && incoming.length === 0) return
parsed.related = [...new Set([...existing, ...incoming])]

Feel free to defer.

Comment on lines +638 to +688
})

it('strips legacy derived-artifact entries from existing related: on UPDATE (conflict-resolver path)', async () => {
// Pre-seed a file whose related: already contains a stale .abstract.md
// entry (legacy data from before the fix). UPDATE must not union it
// back through resolveStructuralLoss.
const tool = createCurateTool()
const targetDir = join(basePath, 'operations/cafe')
await fs.mkdir(targetDir, {recursive: true})
const seedPath = join(targetDir, 'menu_notes.md')
const seed = [
'---',
'title: Menu Notes',
'summary: original notes',
'tags: []',
'related: [operations/cafe/sunrise_cafe_menu.md, operations/cafe/sunrise_cafe_menu.abstract.md]',
'keywords: []',
"createdAt: '2026-04-01T00:00:00.000Z'",
"updatedAt: '2026-04-10T00:00:00.000Z'",
'---',
'## Reason\nseed',
'## Raw Concept\n**Task:** seed',
].join('\n')
await fs.writeFile(seedPath, seed, 'utf8')

const result = (await tool.execute({
basePath,
operations: [
{
confidence: 'high',
content: {
keywords: [],
relations: ['operations/cafe/ingredient_sourcing.md'],
snippets: ['updated notes'],
tags: [],
},
impact: 'low',
path: 'operations/cafe',
reason: 'testing UPDATE relations filter',
title: 'Menu Notes',
type: 'UPDATE',
},
],
})) as CurateOutput

expect(result.applied[0].status).to.equal('success')
const written = await fs.readFile(seedPath, 'utf8')
expect(written).to.include('operations/cafe/sunrise_cafe_menu.md')
expect(written).to.include('operations/cafe/ingredient_sourcing.md')
expect(written).to.not.include('sunrise_cafe_menu.abstract.md')
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: The UPDATE test (strips legacy derived-artifact entries from existing related: on UPDATE (conflict-resolver path)) is exactly what was missing — pre-seeds a file with a stale .abstract.md reference, runs UPDATE with a fresh proposal, and asserts the legacy entry is stripped while a real existing relation (sunrise_cafe_menu.md) survives the structural-loss merge. This is the regression-prevention coverage that matters; without it, fix could silently regress.

suggestion (non-blocking): Consider adding one more assertion against result.applied[0].impact (or whichever field surfaces structural-loss elevation) to lock in the secondary effect: that filtering the existing relations also avoids a false-positive lostRelations count, so impact stays at 'low' rather than getting bumped to 'high'. That's the silent failure mode this fix also closes — currently the test only verifies file content, not the review/impact behavior.


async function addRelatedLinks(filePath: string, relatedPaths: string[]): Promise<void> {
// Skip paths that won't be pushed — they'd be dangling refs on remote.
const incoming = relatedPaths.filter((p) => !isExcludedFromSync(p))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (defensive, non-blocking): The new filter cleans the payload (paths written into the target's related: frontmatter) and the existing entries in the target's frontmatter, but the target file itself (the filePath argument) is whatever the LLM put in action.files at consolidate.ts:628. If the LLM ever includes a .abstract.md / .overview.md / archive stub there, we'd write related: frontmatter into a derived artifact (owned by the abstract pipeline, excluded from sync — so the writes get overwritten on the next dream run anyway, but it's wasted I/O and confusing for anyone reading those files locally).

A one-liner inside the executeCrossReference map would harden this:

action.files.map((file) => {
  if (isExcludedFromSync(file)) return Promise.resolve()
  const otherFiles = action.files.filter((f) => f !== file)
  return addRelatedLinks(join(contextTreeDir, file), otherFiles)
}),

Defer if you prefer to keep this PR narrowly scoped.

Round 2 of review-driven cleanup. None blocking, all small.

- Filter executeCrossReference target files: skip derived-artifact paths
  in `action.files` so addRelatedLinks never writes `related:` onto a
  `.abstract.md` / `.overview.md` if the LLM ever proposes one.
- addRelatedLinks: don't introduce `related: []` into a file whose
  frontmatter never had the key, when the merged set is empty.
- Tighten existingParsed.relations guard from truthy to length-aware so
  the no-op filter is actually skipped on empty arrays.
- UPDATE relations test: assert `impact === 'low'` to lock in the
  no-false-positive structural-loss-elevation invariant. Also retains
  the legitimate sibling in proposed so the only "lost" entry would be
  the filtered derived one — the assertion now genuinely exercises the
  filter's effect on conflict-detector counting.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Claude finished @cuongdo-byterover's task in 2m 32s —— View job


PR Review — fix: [ENG-2676] strip derived-artifact paths from related: frontmatter

Score: 5/5 — Final round addresses all 4 non-blocking items from the prior review cleanly. Up from 4/5 → ship-ready.


What's good in this round

  1. Item Feat/init #1executeCrossReference target filter (consolidate.ts:626-632). eligibleFiles is computed once before the Promise.all, and both file (write target) and otherFiles (peer list) derive from it. That means a single filter pass guarantees consistency: derived artifacts are excluded from being written to AND from appearing in others' related:. Cleaner than my suggested per-iteration check.

  2. Item Feat/refactor structure #2 — empty-result frontmatter guard (consolidate.ts:669-674). hadRelated = Array.isArray(parsed.related) is captured before the merge so the early-return if (!hadRelated && merged.length === 0) return correctly distinguishes "file already had related:, we cleaned it to empty" (write) from "file never had it, would be net-new noise" (skip). The existing incoming.length === 0 guard at line 687 handles the no-frontmatter case. Both edges of my comment now covered.

  3. Item add the app cli #3 — impact-elevation lock-in (curate-tool.test.ts:687). The test was tightened correctly: by adding sunrise_cafe_menu.md back to the proposed relations, the only would-be-lost item is the filtered legacy_recipe.abstract.md. So expect(result.applied[0].impact).to.equal('low') now genuinely exercises the secondary effect — without the fix, countLostItems in conflict-detector.ts:75 would return 1, hasLoss flips true, and deriveImpactFromLoss returns 'high'. The inline comment at lines 663-665 spells out this counterfactual clearly. Excellent regression-prevention coverage.

  4. Item feat: update readme and contribuing guide #4 — length-aware guard (curate-tool.ts:1000). ?.length skips the no-op filter pass on empty arrays. Pure cosmetic but tidy.

Minor observations (defer-able, won't block)

  1. CROSS_REFERENCE test doesn't exercise the target filter against a real on-disk derived artifact. consolidate.test.ts:308-333 only seeds jwt.md and oauth.md — the .abstract.md / .overview.md paths in action.files don't exist on disk. So even without the new eligibleFiles filter, addRelatedLinks would fail at readFile and silently early-return (consolidate.ts:651-652). The fix is correct as a defensive measure, but the test only proves payload filtering — not the target filter. A stronger test would seed auth/intro.abstract.md with frontmatter and assert it's byte-identical after consolidate. Pure defensive testing.

  2. previousTexts still includes derived artifacts (consolidate.ts:607-613). The loop iterates raw action.files, so if an .abstract.md is in there with content in fileContents, it gets into the operation result's previousTexts. Same for reviewBackupStore.save() at line 617-621 and inputFiles: action.files at line 636. Harmless (derived artifacts aren't re-synced and don't drive review UI behavior), but it's a dangling spot of inconsistency with the new filter. Cosmetic — feel free to defer.

Verification not done

  • npm test / npm run lint / npm run typecheck — dependency install not available in this environment. Trust CI on the PR.

Summary

This is the right shape of follow-up: surgical, scoped, and each change traceable to a specific review item. The blocking issue from review #1 stays fixed (verified existingParsed.relations filter still in place at line 1000-1003 with the tighter guard), and the new test now locks in both the file-content invariant and the impact-elevation invariant. Ship it.
· branch fix/ENG-2676

Copy link
Copy Markdown
Collaborator

@RyanNg1403 RyanNg1403 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E2E verified on a paid-tier project: brv curate writes clean related: despite *.abstract.md / *.overview.md siblings on disk, and brv dream CROSS_REFERENCE strips both incoming AND pre-existing derived-artifact entries on every touched doc — fix works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants