Skip to content

Add TraceDecay registry, LCM, and transcript audit improvements#142

Draft
ScriptedAlchemy wants to merge 6 commits into
masterfrom
codex/tracedecay-introspection-improvements
Draft

Add TraceDecay registry, LCM, and transcript audit improvements#142
ScriptedAlchemy wants to merge 6 commits into
masterfrom
codex/tracedecay-introspection-improvements

Conversation

@ScriptedAlchemy

@ScriptedAlchemy ScriptedAlchemy commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add a first-class tracedecay projects CLI for global registry inspection:
    • tracedecay projects list
    • tracedecay projects search <query>
    • tracedecay projects context <selector>
    • text and --json output
  • Fix tracedecay_lcm_status for provider: "all" so it aggregates provider data instead of treating all as a literal provider.
  • Extend all-provider handling through LCM payload/GC reference helpers so status stays consistent across raw messages, payloads, DAG, lifecycle, redaction, and GC metadata.
  • Add catch_up: false support to tracedecay_message_search for strictly read-only audits of already-ingested transcript messages.
  • Make skill-writer automation search all LCM providers by default so Codex-origin evidence is not missed by the scheduled self-improvement path.
  • Fix tracedecay_lcm_grep hash separator queries such as issue#123 so Cursor transcript searches do not hit SQLite FTS syntax errors.
  • Enrich tracedecay_context with bounded untracked project memory matches, including include_memory, memory_limit, and memory_min_trust controls, so fact-store context is visible in the normal code-context path without inflating explicit retrieval/recall counters.

Audit basis

This branch came out of a 30-agent read-only audit of local Codex/Cursor/Hermes transcripts, TraceDecay project stores, LCM state, hook analytics, daemon logs, response handles, and global registry state.

Key observed problems:

  • provider="all" LCM status returned zero even when Codex/Cursor provider-specific counts were populated.
  • MCP had project registry tools, but the CLI lacked a direct project registry surface, forcing tracedecay tool ... or manual store inspection.
  • Active project identity is drifted: the global registry resolves proj_b4a8bbe4953823c4 to /home/zack/projects/tracedecay, while store metadata still references stale TokenSave roots.
  • Registry/storage hygiene is poor: thousands of stale /tmp aliases/project configs and orphan profile-sharded stores remain.
  • Usage analytics are not normalized: analytics_events, savings_ledger, and turns are empty in the active store, so adherence must be inferred from transcripts.
  • Hook/hint telemetry is incomplete for Codex and vulnerable to concurrent JSONL loss.
  • Cursor has raw LCM data but zero summary nodes in the active store; Cursor logs also show literal ${workspaceFolder} startup failures and an FTS # query failure.
  • Daemon runtime/status is too thin for self-diagnosis; scheduler skip noise and high memory/CPU need summarized status surfaces.
  • Response handles are mostly fixed locally, but expired handles remain in inactive/unregistered stores and selector mismatch can make retrieval fragile.
  • New follow-up from user: ignored dependency roots such as node_modules should stay ignored by default, but imported dependency symbols/types should be discoverable through lazy/on-demand dependency indexing instead of unignoring whole dependency trees.
  • Fact-store usage is materially underrepresented in transcripts compared with code-context/search calls, and the active store has very few durable facts relative to the number of sessions.

Tests

  • cargo fmt --check
  • cargo check --workspace
  • cargo test --test cli_help_test -- --nocapture
  • cargo test --test cli_non_interactive_test projects_ -- --nocapture
  • cargo test --test mcp_handler_test lcm_status_all_provider_aggregates_provider_counts -- --nocapture
  • cargo test --test mcp_handler_test lcm_status_all_provider_counts_payload_health_once -- --nocapture
  • cargo test --test mcp_handler_test lcm_status_ -- --nocapture --test-threads=1
  • cargo test --test mcp_handler_test message_search_catches_up_provider_transcripts_before_querying -- --nocapture
  • cargo test --test mcp_handler_test message_search_can_skip_catch_up_for_read_only_audits -- --nocapture
  • cargo test --test automation_skill_writer_runner_test -- --nocapture
  • cargo test --test session_lcm_query_test grep_like_fallback_handles_hash_separator_queries -- --nocapture
  • cargo test --test session_lcm_query_test grep_like_fallback_recalls_infix -- --nocapture
  • cargo test --test mcp_handler_test context_ -- --nocapture
  • cargo test --test mcp_handler_test context_memory -- --nocapture
  • cargo test --test memory_test search_facts_bump_access_count_only_for_returned_results -- --nocapture
  • cargo test --workspace --no-run
  • tracedecay sync

Note: cargo test --test mcp_handler_test lcm_status_ -- --nocapture showed an existing parallel fixture collision in lcm_status_reports_lifecycle_fields_and_resolved_storage_scope; the same filtered LCM status suite passes with --test-threads=1.

Follow-up work

  • Add registry/doctor checks for active root vs global registry vs config.json vs store_manifest.json identity drift.
  • Add storage hygiene reporting/GC for stale /tmp projects, orphan store dirs, branch DB retention, and response handles.
  • Populate normalized analytics for TraceDecay tool use, shell fallback, skill reads, hint shown, hint outcome, and transport failure reason.
  • Add durable hint/skill analytics so analytics_events.skill_name and analytics_events.hint_category are populated instead of requiring transcript inference.
  • Add per-project memory curation automation that mines repeated high-confidence decisions/preferences from sessions, dedupes them, and stages fact-store updates.
  • Continue improving transcript-audit safety by adding explicit project scoping defaults and normalized tool metadata.
  • Improve Cursor integration: avoid literal ${workspaceFolder} failures, add Cursor summary health checks, and sanitize/fallback LCM FTS queries containing #.
  • Expand daemon/status tooling with runtime metrics, scheduler summaries, latest errors, and backlog/storage sizing.
  • Design lazy ignored-dependency indexing so project scans can keep node_modules/gitignored roots out of the primary graph while resolving imported symbols on demand.

@changeset-bot

changeset-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 7c5a0ad

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@ScriptedAlchemy

Copy link
Copy Markdown
Owner Author

Follow-up design: lazy indexing for ignored dependency symbols

User goal: keep expensive roots such as node_modules and normal .gitignore matches ignored by default, but still allow TraceDecay to answer questions about imported dependency types/symbols when project code references them.

Current relevant code/docs:

  • Exclusion/gitignore policy lives in src/config.rs (is_excluded, is_excluded_dir, is_included_dir) and defaults already exclude nested node_modules.
  • Full indexing scans only accepted project files in src/tracedecay.rs::index_all_with_progress_verbose via scan_files, then extracts nodes and resolves unresolved refs globally.
  • Reference resolution currently resolves only against nodes already in the graph (docs/INDEX-DESIGN.md says unresolved refs become edges after matching against indexed nodes).
  • User docs currently say missing symbols may be caused by skipped .gitignore files, but the only practical workaround is broad unignore/indexing.

Approaches considered:

  1. Sidecar dependency symbol cache, populated lazily. Recommended.

    • Keep ignored roots out of the primary project scan.
    • When unresolved refs/imports point at ignored dependency roots, resolve the package path using language/package-manager rules, index only the needed file(s) or package type entrypoints into a separate dependency graph scope/cache.
    • Add metadata to distinguish project vs dependency nodes, dependency package/version/root, and cache freshness.
    • Query tools can opt into dependency expansion when project graph has unresolved refs.
    • Best tradeoff: preserves fast default indexing while making dependency symbols discoverable on demand.
  2. Manifest/type-entrypoint indexing during sync.

    • For JS/TS, read package manifests and index only exported type entrypoints (types, exports, .d.ts) for imported packages.
    • Faster than full lazy source indexing, but language-specific and incomplete for packages without type entrypoints or for deep imports.
  3. User-managed include overlays.

    • Add commands/config to include selected dependency subtrees, e.g. one package under node_modules.
    • Simple, but pushes discovery burden to the user and risks repeated broad indexing mistakes.

Recommended first implementation slice:

  • Add an ignored_dependency_candidates table or metadata file recording imports/unresolved refs whose likely resolution target was skipped by ignore rules.
  • For TypeScript/JavaScript first, resolve package/deep import paths under node_modules without walking the whole tree.
  • Add tracedecay dependency index <package-or-import> or an MCP tool that lazily indexes only selected dependency entrypoint files into a sidecar scope.
  • Update symbol search/context tools to say when a missing symbol may exist behind an ignored dependency and suggest/trigger lazy indexing.
  • Add guardrails: max files/bytes per lazy index, package-version cache key, TTL/hash invalidation, and explicit user/agent-visible scope (dependency, not project-owned code).

Likely target areas:

  • src/config.rs: classify ignored paths vs dependency roots rather than only excluded/not excluded.
  • src/tracedecay.rs: scan/index orchestration and separate dependency scope wiring.
  • reference resolver path around unresolved refs: emit dependency-resolution candidates instead of silently leaving refs unresolved.
  • JS/TS extractor/import parsing tests: start with import type { Foo } from "pkg" and deep import cases.
  • CLI/MCP definitions: lazy dependency index/status/query affordance.
  • Tests: default node_modules remains excluded from full sync; importing a dependency records a lazy candidate; explicit lazy index adds only selected dependency symbols; search/context can find the lazily indexed type without indexing all of node_modules.

Open design question:

  • Whether lazy indexing should be automatic on first missing-symbol query or require an explicit command/tool call. My recommendation is explicit for the first implementation, then add safe automatic mode once size limits and cache invalidation are proven.

@ScriptedAlchemy ScriptedAlchemy force-pushed the codex/tracedecay-introspection-improvements branch from f7390f1 to 476377c Compare June 29, 2026 06:55
@ScriptedAlchemy ScriptedAlchemy force-pushed the codex/tracedecay-introspection-improvements branch from 476377c to 1d61cdb Compare June 29, 2026 06:59
@ScriptedAlchemy ScriptedAlchemy changed the title Add project registry CLI and aggregate LCM status Add TraceDecay registry, LCM, and transcript audit improvements Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant