Skip to content

feat(agents): context compaction (budget notices, mid-turn floor, background compaction)#4605

Open
kevin-dp wants to merge 30 commits into
mainfrom
feat/context-compaction
Open

feat(agents): context compaction (budget notices, mid-turn floor, background compaction)#4605
kevin-dp wants to merge 30 commits into
mainfrom
feat/context-compaction

Conversation

@kevin-dp

Copy link
Copy Markdown
Contributor

Adds context compaction to the agents runtime, which previously only truncated at the window limit. Modelled on OpenAI Codex's summarization but adapted to our event-sourced timeline: a compaction checkpoint is a durable context_inserted row placed at a stored watermark, so history reconstruction folds everything up to the watermark into a summary.

What's included

  • Token-usage gauge — persist cache-inclusive context_input_tokens + context_window per step; ContextUsageIndicator shows "X% used" in the composer footer.
  • Budget notices — inject a <token_budget> message into the model context at 25 / 50 / 75% usage (synthesized at the model-call seam, not persisted).
  • Tool-output truncation — cap any single oversized tool_result with a placeholder.
  • Mid-turn synchronous floor — at the 90% hard ceiling, compact before the model step via the adapter's transformContext hook (so a single tool-heavy turn can't blow the window).
  • Background (turn-end) compaction — non-blocking: at 85% a detached summarize runs off the critical path; its checkpoint is applied at the next turn's start, or immediately if it finishes while idle. Each generation uses a watermark-unique checkpoint id so a new run can't supersede a prior completed one, and summarize calls are bounded by a hard timeout.
  • UI — a "Compacting…" indicator (distinct blocking vs. background styling) and a collapsible "Context compacted" entry in the conversation timeline.

Design notes

  • Reconstruction places the summary at the stored watermark, so a prompt+answer that arrived while a background summary ran (physically after the checkpoint, logically after the watermark) are kept verbatim after the summary.
  • Only complete checkpoints act as a reconstruction watermark; running/failed are UI-only and never hide history (crash-safe).
  • Thresholds are env-tunable: ELECTRIC_AGENTS_COMPACT_CEILING, ELECTRIC_AGENTS_COMPACT_BG_CEILING, ELECTRIC_AGENTS_COMPACT_MIN_TOKENS.

Testing

  • Full runtime suite green (1020 passing), including dedicated compaction, reconstruction, mid-turn, and background unit tests.
  • Verified live in the desktop app end-to-end (compaction applies, context shrinks, indicator clears).

Note

PR #4596 (context-usage ring gauge + breakdown) is stacked on this branch; re-target it to main after this merges.

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Desktop Builds

Build artifacts for commit 28ce457.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@netlify

netlify Bot commented Jun 17, 2026

Copy link
Copy Markdown

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 4d680ea
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a326faf70e5790008d42977
😎 Deploy Preview https://deploy-preview-4605--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.99401% with 147 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.89%. Comparing base (8f4368d) to head (28ce457).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-runtime/src/process-wake.ts 45.65% 50 Missing ⚠️
...s-server-ui/src/components/CompactionIndicator.tsx 0.00% 27 Missing ⚠️
...server-ui/src/components/ContextUsageIndicator.tsx 0.00% 26 Missing ⚠️
...agents-server-ui/src/components/EntityTimeline.tsx 0.00% 14 Missing ⚠️
packages/agents-runtime/src/context-factory.ts 93.33% 10 Missing ⚠️
packages/agents-server-ui/src/lib/compaction.ts 0.00% 9 Missing ⚠️
...es/agents-server-ui/src/hooks/useEntityTimeline.ts 0.00% 4 Missing ⚠️
...s/agents-server-ui/src/components/MessageInput.tsx 0.00% 3 Missing ⚠️
packages/agents-runtime/src/pi-adapter.ts 96.36% 2 Missing ⚠️
packages/agents-runtime/src/timeline-context.ts 96.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4605      +/-   ##
==========================================
+ Coverage   57.56%   57.89%   +0.32%     
==========================================
  Files         342      350       +8     
  Lines       39993    40644     +651     
  Branches    11633    11828     +195     
==========================================
+ Hits        23023    23529     +506     
- Misses      16933    17078     +145     
  Partials       37       37              
Flag Coverage Δ
packages/agents 72.75% <ø> (ø)
packages/agents-mobile 80.67% <ø> (ø)
packages/agents-runtime 83.20% <88.94%> (+0.24%) ⬆️
packages/agents-server 75.32% <ø> (-0.22%) ⬇️
packages/agents-server-ui 7.51% <6.74%> (-0.01%) ⬇️
packages/electric-ax 47.62% <ø> (ø)
typescript 57.89% <77.99%> (+0.32%) ⬆️
unit-tests 57.89% <77.99%> (+0.32%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Mobile Build

Local mobile checks ran for commit 28ce457.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

@kevin-dp

Copy link
Copy Markdown
Contributor Author

🤖 Automated review — context compaction

Generated by a review agent. Severity-ranked; cite file:line.

Findings

[high] — mid-turn checkpoint over-covers and drops the kept tail on the next reconstructioncontext-factory.ts (mid-turn writeCheckpoint) + timeline-context.ts compactionWatermarkOf
The mid-turn compactor keeps messages.slice(coveredCount) (the last COMPACT_KEEP_TAIL=6 messages) verbatim for the current model call, but the persisted complete checkpoint is written with the fixed id compaction and no attrs.watermark. So compactionWatermarkOf falls back to the checkpoint row's own stream order, which is the latest order at write time. On the next turn, reconstruction drops every timeline item with at <= checkpoint.at — including the 6 tail messages, which are neither in the summary (it only covered 0..coveredCount) nor kept → lost context. Background compaction avoids this by storing watermark: head; the mid-turn path should likewise persist the boundary watermark it actually summarized. No test reconstructs across a mid-turn compaction.

[low] — an orphaned running checkpoint pins the UI indicator foreverCompactionIndicator.tsx
The indicator picks the globally-newest kind:"compaction" row and spins if it is running. Reconstruction correctly ignores orphaned running rows, but the UI has no equivalent guard: if a process crashes after writing a background running row and before its terminal complete/failed, the spinner stays forever (no later row supersedes it, since each generation has a unique id). Cosmetic.

[low] — stale "95%" commentscompaction.ts, token-accountant.ts JSDoc, context-factory.ts
Several comments still say 95% / "critical at 95%" but CONTEXT_USAGE_HARD_CEILING = 0.9. Stale only.

[nit] — pi-adapter.ts pendingRequestMessageCount = messages.length records the pre-compaction length while the request actually sent is the shorter compacted list. Because anchorTokens is re-anchored to real cache-inclusive usage each step-end, the drift is bounded and only affects when the mid-turn trigger fires — not correctness.

Verified correct

  • Tool-pair invariant in reconstruction (each run emits its tool_call+tool_result together under one at, so a watermark can't split a within-run pair).
  • Background generation ids (compaction-bg-<watermark>) correctly prevent a new running from superseding a prior complete.
  • running/crashed checkpoints never hide history.
  • Idle-apply path: no double-apply (pendingBackgroundCompaction nulled after applying; pre-handler and idle apply are mutually exclusive); the settle-just-before-idle race is handled; the onSettled slot-identity guard prevents stale callbacks.
  • Summarize timeout: call.catch(()=>{}) attached synchronously before the race (no unhandled rejection), timer cleared in finally. Tested for stall/abort/empty-summary.
  • Schema fields additive/optional; reconstruction tolerates missing fields.

Test gaps

  • No test reconstructs across a mid-turn checkpoint (exactly the [high] path).
  • No orphaned-running UI indicator test.
  • Idle-apply loop and "settle just before idle" race not unit-tested at the process-wake level.

Overall

The architecture is sound and unusually well-commented; the background lifecycle, supersession, watermark ordering, and timeout race are carefully designed and well-tested. The one finding I'd block on is the mid-turn over-coverage ([high]): unlike background compaction, the synchronous floor doesn't persist a watermark, so its checkpoint silently drops the ~6 verbatim tail messages on the following turn's reconstruction — a real (if bounded) context-loss bug on the 90% path. I'd want that fixed (persist the boundary watermark) plus a reconstruction test before merge; everything else is low/nit/cosmetic.

kevin-dp added a commit that referenced this pull request Jun 17, 2026
…4605]

The mid-turn (sync floor) checkpoint was persisted with no attrs.watermark, so
reconstruction fell back to the checkpoint row's own (latest) stream position
and, on the next turn, dropped every item before it — including the verbatim
tail the mid-turn summary deliberately excluded (keepTail). Those recent
messages were in neither the summary nor the kept set, so they were silently
lost on the 90% path.

Fix: the mid-turn compactor now folds the WHOLE context into the summary
(Codex-style — no verbatim pre-compaction tail), and the checkpoint is stamped
with watermark = current timeline head. Summary and watermark now agree, so
reconstruction folds exactly what was summarized and keeps everything appended
afterward (the model's post-compaction output + the next prompt). This also
removes the keepTail + tool-pair orphan-guard complexity.

Recent context is still shown after a compaction via the sticky view (summary +
messages appended since), so within-turn coherence is preserved.

Also drop the stale "95%" hard-ceiling comments (it has been 90% since the
ceiling was lowered to match Codex).

Tests: rewrote compaction-midturn for the summarize-everything behavior; added a
reconstruction test asserting post-compaction messages survive a mid-turn
checkpoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevin-dp

Copy link
Copy Markdown
Contributor Author

Addressed the review (8ce00aa):

[high] mid-turn tail-loss — fixed. The mid-turn checkpoint was persisted with no attrs.watermark, so reconstruction fell back to the checkpoint row's own (latest) position and dropped the verbatim tail the summary had excluded. Fix: the mid-turn compactor now folds the whole context into the summary (Codex-style, no verbatim pre-compaction tail) and the checkpoint is stamped with watermark = current timeline head. Summary and watermark now agree, so reconstruction folds exactly what was summarized and keeps everything appended afterward. This also removed the keepTail + tool-pair orphan-guard machinery. Added the missing reconstruction test (timeline-compaction.test.ts → "mid-turn checkpoint keeps messages produced after compaction") and rewrote compaction-midturn.test.ts for the new behavior. Within-turn recency is preserved via the sticky view (summary + messages appended since).

[low] stale "95%" comments — fixed (hard ceiling is 90%).

[low] orphaned running UI spinner & [nit] pendingRequestMessageCount — acknowledged, left as-is: both are cosmetic/bounded and not context-affecting; happy to follow up separately if wanted.

Full runtime suite green (tsc clean).

kevin-dp added a commit that referenced this pull request Jun 17, 2026
…#4605]

A summarize is bounded by a ~120s hard timeout after which a terminal
(complete/failed) checkpoint is always written, so a `running` checkpoint that
lingers well past that is orphaned — its process crashed before writing the
terminal row, and (with watermark-unique background ids) nothing supersedes it,
pinning the spinner forever. Treat a `running` checkpoint older than 150s as
stale and stop showing it, with a self-clearing timer so the spinner disappears
even when no further events arrive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
kevin-dp added a commit that referenced this pull request Jun 17, 2026
…4605]

`pendingRequestMessageCount` records the incoming (uncompacted) message count,
not the compacted list the adapter may return. That's intentional: pi-agent
passes transformContext the full conversation each step, so the count indexes
that original array — the next step's trailing slice then measures exactly the
messages appended since. Add a comment so it doesn't read as a bug.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevin-dp

Copy link
Copy Markdown
Contributor Author

Circled back and addressed the remaining two as well, each in its own commit:

  • [low] orphaned running spinnerc313578: a summarize always writes a terminal row within its ~120s timeout, so a running checkpoint older than 150s is treated as orphaned (crashed) and no longer rendered, with a self-clearing timer so it disappears even with no further events.
  • [nit] pendingRequestMessageCount0bc94ad: added a comment clarifying it intentionally records the incoming (uncompacted) count, since pi-agent passes transformContext the full conversation each step (so the count indexes that array and the next step's trailing slice is exact).

@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown

Claude Code Review

Summary

Adds context compaction to the agents runtime: a context-window usage gauge, model-facing budget notices, oversized tool-output truncation, a synchronous mid-turn compaction floor at the 90% hard ceiling, and non-blocking background (turn-end) compaction — all built on the event-sourced timeline as durable context_inserted checkpoints placed at a stored watermark. Since iteration 6 the only change is a one-line test-fixture fix (commit 28ce45798) that restores the agents-runtime typecheck after the rebase. I re-verified it and the PR remains ready to merge.

What's Working Well

(Unchanged since iteration 6 — re-verified intact.)

  • Crash-safe reconstruction (timeline-context.ts): only complete checkpoints act as a watermark; running/failed rows are UI-only and never hide history.
  • Mid-turn summary/watermark agreement (compaction-midturn.ts + context-factory.ts): the mid-turn compactor folds the whole context (no verbatim tail) and the checkpoint is stamped with watermark = timeline head, snapshotted before the summarize await, so an event materializing during a slow summarize cannot push the head past what the summary covered.
  • Watermark-unique background ids (compaction-bg-<watermark>) so a new running generation cannot supersede a prior complete one.
  • Bounded summarize (compaction-summarize.ts): signal/timeoutMs plus a hard race timer, losing promise rejection swallowed synchronously, clearTimeout in finally.
  • Single source of truth for usage (token-accountant.ts) shared by the UI gauge and runtime triggers.

Review of Changes Since Iteration 6

  • test(agents): add missing doUnobserve to a context config fixture (28ce45798) — the runAgent budget-notice fixture (context-factory.test.ts:601) predated main now-required doUnobserve field on HandlerContextConfig (context-factory.ts:169); the rebase auto-merged without flagging it, breaking the agents-runtime typecheck. The one-line addition brings this fixture in line with the five other HandlerContextConfig fixtures in the same file (lines 100/193/342/470) and the shared helper (context-test-helpers.ts:345), all of which already supply doUnobserve. Correct, test-only, no runtime impact.

Issues Found

Critical (Must Fix): None.

Important (Should Fix): None.

Suggestions (Nice to Have):

  • Stale PR-description env var (still open). The "Design notes" section still lists ELECTRIC_AGENTS_COMPACT_MIN_TOKENS as an env-tunable threshold, but the mid-turn min-tokens floor was dropped in 86d5e6689 and the var is no longer referenced anywhere under packages/agents-runtime/src/. The changeset correctly omits it (only ELECTRIC_AGENTS_COMPACT_CEILING and ELECTRIC_AGENTS_COMPACT_BG_CEILING). Worth trimming the description so the only public-facing doc of the knobs matches the code. Cosmetic; no code impact.

Issue Conformance

No linked issue — per project convention a warning, substantially mitigated by an unusually detailed PR description and changeset that enumerate scope. Implementation matches the described scope (modulo the stale MIN_TOKENS mention above); no scope creep observed.

Previous Review Status

Incremental review (iteration 7). All items raised across iterations 1–6 remain resolved. The sole change since iteration 6 is the test-fixture typecheck fix described above; the compaction logic is unchanged. Codecov reports all tests successful. The one cosmetic doc nit (stale MIN_TOKENS in the PR description) is still open but is non-blocking.


Review iteration: 7 | 2026-06-18

kevin-dp added a commit that referenced this pull request Jun 17, 2026
#4605]

The mid-turn checkpoint computed its watermark (timeline head) when the
`complete` row was written — after the summarize await. Any event that
materialized into the StreamDB during that (slow) await would bump the head
past what the summary actually covered, so reconstruction could drop an
un-summarized item next turn (item.at <= watermark). Narrow (mid-turn blocks
the agent, and pending inbox rows don't materialize) but timing-dependent.

Snapshot the head when the `running` row is written instead — before the
await — so coverage and watermark derive from the same instant, matching how
background compaction already captures its head up front.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
kevin-dp added a commit that referenced this pull request Jun 17, 2026
…eview #4605]

The orphan-clearing logic added to CompactionIndicator had no test, and a
regression where rows stop carrying `timestamp` would silently revert to the
lingering-spinner bug (NaN → not orphaned → spinner stays). This package has no
React-render harness, so extract the decision into a pure
`isRunningCheckpointOrphaned(timestamp, now)` helper (with STALE_RUNNING_MS) in
lib/ and unit-test it: fresh → shown, just under the deadline → shown, at/past
the deadline → hidden, and missing/unparseable timestamp → shown (documented).
The component now imports the helper, so the tested logic is exactly what runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevin-dp

kevin-dp commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the careful review. Went through each point:

Addressed (one commit each):

  • Suggestion 1 — mid-turn watermark captured after the awaitb130e42a0. Agreed, real (if narrow) timing-dependence. The watermark is now snapshotted when the running row is written — before the summarize await — so an event materializing into the StreamDB during the slow summarize can't bump the head past what the summary covered. Mirrors how background compaction snapshots its head up front.

  • Important — untested orphan-clearing logic445c7552e. Agreed. This package has no React-render harness (no jsdom/testing-library; passWithNoTests), so rather than pull that in I extracted the decision into a pure isRunningCheckpointOrphaned(timestamp, now) helper (+ STALE_RUNNING_MS) and unit-tested it: fresh → shown, just under deadline → shown, at/past deadline → hidden, and missing/unparseable timestamp → shown (the exact NaN path you flagged, now locked in). The component imports the helper, so the tested logic is what runs.

Declined (with rationale):

  • Suggestion 2 — reuse selectLatestContextUsage in ContextUsageIndicator. The usage math is already shared (the component calls computeContextUsage); the only duplication is the "latest step by _seq" loop, and the component needs the full latest row (its model_id, and on the stacked feat(agents): context-usage ring gauge + composition breakdown #4596 the context_breakdown too) — so a single pass returning the row is cleaner here than selectLatestContextUsage + a second lookup. Happy to revisit if you'd prefer a shared selectLatestContextStep row-returning helper.

  • Suggestion 3 — as any on the query collection. It's an established pattern across the existing UI query sources (e.g. lib/compaction.ts); properly typing q.from(db.collections.x) is a worthwhile but separate cross-cutting effort rather than something to one-off here.

Full runtime suite green; UI typechecks + the new helper test pass.

kevin-dp and others added 15 commits June 18, 2026 12:10
Foundation for context compaction: measure and surface how full the
model's context window is, with no behavior change yet.

- pi-adapter: capture cache-INCLUSIVE prompt size (input + cacheRead +
  cacheWrite) and the model context window per step. The existing
  input_tokens deliberately excludes cache reads for budget accounting,
  but cached tokens still occupy the window, so a fullness gauge needs
  the inclusive total.
- persist context_input_tokens + context_window on the step row
  (optional/additive) via the outbound bridge.
- token-accountant: single source of truth for usage ratio, severity
  level, and the compaction thresholds (85% background / 95% ceiling).
- UI: ContextUsageIndicator renders "NN% used" in the composer footer
  from the same helper, coloured at the 85/95 thresholds.

Observational only — nothing compacts yet; this validates the token
accounting before later phases act on it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ndow in pi-adapter

The bridge persistence and the usage-ratio helper were covered, but the
adapter's cache-INCLUSIVE total (input + cacheRead + cacheWrite) — the
accuracy premise of the context gauge — was not. Assert it equals 1350
where the uncached input_tokens is 150, and that context_window is emitted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Once context usage reaches 25%, inject a <token_budget> notice into the
model's messages stating remaining tokens + percent, so the model can
pace itself. Recomputed each call from the latest step's persisted
usage, so it is always current.

- token-accountant: selectLatestContextUsage (latest step with usage),
  shouldSurfaceContextBudget (gate at 25%), formatContextBudgetNotice,
  and withContextBudgetNotice (inject just before the final message).
- context-factory/runAgent: synthesize and inject before the model call.

Synthesized (not persisted) on purpose: a self-superseding context row
would leave load_context_history tombstones, which are misleading for an
ephemeral budget hint. Synthesis stays deterministic (pure function of
persisted steps) so replay/fork reproduce it, with no row churn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drive a real agent run with a capturing streamFn and assert the
<token_budget> notice reaches the model's context, gated on a seeded
step's usage: present at 80% usage (with correct "20k tokens (20%)
remaining" wording), absent at 10%.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se 2)

Cap any single tool_result at ~10k tokens, replacing the body with a
visible "[Output truncated: ...]" placeholder before the model call, so
one giant output can't fill the context window on its own. Preserves
toolCallId/isError so tool-call pairing stays valid. Mirrors Codex's
per-message truncation. Truncation is always explicit, never silent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Foundation for context compaction: a compaction checkpoint is a
context_inserted row tagged attrs.kind="compaction". timelineMessages
now treats the newest such checkpoint's order as a watermark — items
before it are dropped (summarized away) and the checkpoint renders the
summary in their place. No checkpoint -> watermark is -Infinity -> a
strict no-op, so this is inert until the summarizer (next step) writes
one.

Adds compaction.ts with the checkpoint constants and the Codex
summarization prompt + summary prefix (reused verbatim) for the
summarization step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
summarizeMessages sends the full history + Codex's summarization prompt
to the conversation's own model (a cheap small-window model would
overflow a near-full context) and prefixes the result with Codex's
summary preamble. The model call is injected via a `complete` seam
(defaults to pi-ai completeSimple) so it is unit-testable without a
network call or API key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires the compaction engine into the model-call path. Before a turn, if
the last step left context at/over 95% of the window (and the history is
actually large), summarize it, persist a compaction checkpoint
(context_inserted kind=compaction), and send only the summary this turn
— the current ask still arrives via runInput, and future turns
reconstruct from the checkpoint watermark. Failure degrades gracefully
(logs and proceeds uncompacted).

- AgentConfig.summarizeComplete: model-call seam for the summarizer
  (defaults to the conversation model; injected by tests).
- guard on estimated history size avoids re-compacting an already
  compacted (small) history while the last step's usage reads stale-high.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CONTEXT_USAGE_HARD_CEILING 0.95 -> 0.90, matching Codex's auto-compaction
threshold. Drives the synchronous compaction trigger and the UI gauge's
"critical" colour. Background start stays at 85%.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drives the real runtime end-to-end and makes a real Anthropic
summarization call, asserting the returned summary carries the Codex
prefix and retains key conversation facts. Skipped unless
RUN_LIVE_COMPACTION=1 + LIVE_ANTHROPIC_API_KEY are set, so it never runs
(or costs) in CI. Verified passing with claude-haiku-4-5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Its purpose (confirming a real model summarizes correctly through the
compaction path) is done. Ongoing coverage lives in the deterministic
stubbed tests (compaction-trigger, compaction-summarize,
timeline-compaction); a live test pinning a model id + needing a paid
key would only rot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e/failed)

Foundation for surfacing synchronous compaction in the UI. The trigger
now writes a `running` checkpoint before summarizing and updates it to
`complete` (with the summary) or `failed` after. Only a `complete`
checkpoint acts as the reconstruction watermark, so an in-flight or
crashed compaction never hides history; running/failed checkpoints are
UI-only markers, skipped from the model context.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Shows a spinner + "Compacting context…" in the composer footer while a
synchronous compaction is in flight, reading the latest compaction
checkpoint row and showing it while attrs.status is "running" (clears on
complete). Tells the user why the turn paused and that their next prompt
is being queued. Mirrors the ContextUsageIndicator pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ELECTRIC_AGENTS_COMPACT_CEILING (0..1, default 0.9) and
ELECTRIC_AGENTS_COMPACT_MIN_TOKENS (default contextWindow/2) let the
synchronous compaction path be exercised without filling a real window.
RFC §12 tunables; default behavior unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surfaces a completed compaction checkpoint as a collapsed, expandable
"Context compacted" marker in the message history, at the point
compaction happened. Adds a compaction custom timeline source
(mirroring the comments source) reading compaction_summary
context_inserted rows, a compaction row kind across the timeline
dispatch, and a CompactionTimelineRow card (InlineEventCard, expand to
view the summary). Running/failed checkpoints are filtered out — those
are shown by the live composer indicator instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
kevin-dp and others added 14 commits June 18, 2026 12:11
Compaction now runs before EVERY model step (not just between turns), so
a single turn that balloons across many tool calls can no longer exhaust
the context window before the turn ends. Wired through pi-agent-core's
transformContext hook.

- compaction-midturn.ts: createMidTurnCompactor — folds older messages
  into a summary, returns [summary, ...recent tail], caches the summary
  for the rest of the turn (re-summarizing chained off the prior summary
  only if the tail grows back over the ceiling).
- pi-adapter: Codex-style token signal — anchor on the last step's REAL
  cache-inclusive usage + estimate only the trailing items appended
  since, vs the model's real context window (not an estimate of the whole
  history). transformContext + initialContextTokens options added.
- context-factory: build the compactor and pass it to the adapter;
  removed the per-turn pre-sampling compaction block it replaces.

Reuses the summarizer, checkpoint lifecycle (running/complete/failed),
and UI. summarizeAgentMessages summarizes the AgentMessage[] the hook
receives without re-converting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The keepTail boundary could split a tool_call/tool_result pair — folding
the assistant tool_use into the summary while keeping the matching
tool_result in the tail. Anthropic rejects the orphaned tool_result
(400 invalid_request_error: "tool_result must have a corresponding
tool_use block in the previous message"). Advance the fold boundary past
any leading tool-result messages so the kept tail starts on a fresh
user/assistant turn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Non-blocking compaction so the user almost never waits for the 90% sync
floor. Self-contained so it can be reverted as one commit.

- Trigger (process-wake): after a turn whose usage ≥ 85% (env
  ELECTRIC_AGENTS_COMPACT_BG_CEILING), kick off a DETACHED summarization.
  Its checkpoint is applied at the NEXT turn's start, OR — if the summarize
  finishes while the entity is idle — immediately by waking the idle loop
  and writing the checkpoint WITHOUT running the agent (so the indicator
  never lingers past completion). The slow summarize never blocks; a fast
  follow-up prompt just runs un-compacted.
- context-factory: maybeStartBackgroundCompaction / writeBackgroundCheckpoint
  / failBackgroundCheckpoint on the handler-context result. Snapshots the
  timeline head as the watermark; summarizes the whole reconstructed
  history; writes a background-flavored running→complete checkpoint.
- Unique checkpoint id per generation (compaction-bg-<watermark>): context
  supersession keys on id alone, so with one shared id the NEXT background's
  `running` row silently superseded the PREVIOUS `complete` one — erasing
  the watermark and undoing every compaction (context never shrank, the
  indicator stuck on "running"). running→complete→failed of one generation
  share the id; the next generation can't clobber it. Mid-turn sync keeps
  the shared id on purpose (its re-summarization chain wants supersession).
- Reconstruction (timeline-context): checkpoints carry a stored
  attrs.watermark and the summary is rendered AT that watermark — so a
  prompt+answer that arrived while a background summary ran (physically
  after the checkpoint, logically after the watermark) are kept verbatim
  AFTER the summary (RFC §8.5). Falls back to the row's order for sync.
- UI: CompactionIndicator shows a subtle "Compacting in background…" for
  background checkpoints, distinct from the blocking sync indicator.
- Summarize hardening (compaction-summarize): bound every summarize call
  with a 120s deadline — the anthropic provider only enforces a timeout/
  abort when the caller passes timeoutMs/signal and never retries, so a
  stalled stream hung forever. Forward timeoutMs + an AbortSignal the
  provider honours AND race a hard timer; a timeout becomes an ordinary
  failure retried next turn-end.

The mid-turn 90% sync floor stays as the safety net for single runaway
turns. To drop the feature: revert this commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The `minTokens` guard (default 2000, env ELECTRIC_AGENTS_COMPACT_MIN_TOKENS)
never changed the outcome at any realistic ceiling — 90% of a real context
window is always far above 2000 tokens, so the floor only ever mattered when
testing with an artificially low ceiling. Codex has no equivalent floor (it
triggers on a single token threshold). Remove the knob, its env override, and
the now-unused positiveFromEnv helper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4605]

The mid-turn (sync floor) checkpoint was persisted with no attrs.watermark, so
reconstruction fell back to the checkpoint row's own (latest) stream position
and, on the next turn, dropped every item before it — including the verbatim
tail the mid-turn summary deliberately excluded (keepTail). Those recent
messages were in neither the summary nor the kept set, so they were silently
lost on the 90% path.

Fix: the mid-turn compactor now folds the WHOLE context into the summary
(Codex-style — no verbatim pre-compaction tail), and the checkpoint is stamped
with watermark = current timeline head. Summary and watermark now agree, so
reconstruction folds exactly what was summarized and keeps everything appended
afterward (the model's post-compaction output + the next prompt). This also
removes the keepTail + tool-pair orphan-guard complexity.

Recent context is still shown after a compaction via the sticky view (summary +
messages appended since), so within-turn coherence is preserved.

Also drop the stale "95%" hard-ceiling comments (it has been 90% since the
ceiling was lowered to match Codex).

Tests: rewrote compaction-midturn for the summarize-everything behavior; added a
reconstruction test asserting post-compaction messages survive a mid-turn
checkpoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#4605]

A summarize is bounded by a ~120s hard timeout after which a terminal
(complete/failed) checkpoint is always written, so a `running` checkpoint that
lingers well past that is orphaned — its process crashed before writing the
terminal row, and (with watermark-unique background ids) nothing supersedes it,
pinning the spinner forever. Treat a `running` checkpoint older than 150s as
stale and stop showing it, with a self-clearing timer so the spinner disappears
even when no further events arrive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4605]

`pendingRequestMessageCount` records the incoming (uncompacted) message count,
not the compacted list the adapter may return. That's intentional: pi-agent
passes transformContext the full conversation each step, so the count indexes
that original array — the next step's trailing slice then measures exactly the
messages appended since. Add a comment so it doesn't read as a bug.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#4605]

The mid-turn checkpoint computed its watermark (timeline head) when the
`complete` row was written — after the summarize await. Any event that
materialized into the StreamDB during that (slow) await would bump the head
past what the summary actually covered, so reconstruction could drop an
un-summarized item next turn (item.at <= watermark). Narrow (mid-turn blocks
the agent, and pending inbox rows don't materialize) but timing-dependent.

Snapshot the head when the `running` row is written instead — before the
await — so coverage and watermark derive from the same instant, matching how
background compaction already captures its head up front.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eview #4605]

The orphan-clearing logic added to CompactionIndicator had no test, and a
regression where rows stop carrying `timestamp` would silently revert to the
lingering-spinner bug (NaN → not orphaned → spinner stays). This package has no
React-render harness, so extract the decision into a pure
`isRunningCheckpointOrphaned(timestamp, now)` helper (with STALE_RUNNING_MS) in
lib/ and unit-test it: fresh → shown, just under the deadline → shown, at/past
the deadline → hidden, and missing/unparseable timestamp → shown (documented).
The component now imports the helper, so the tested logic is exactly what runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The RFC isn't checked in, so comments can't reference it; phase numbers are
internal sequencing, not something the code should narrate. Remove the dangling
RFC/§/phase references and tighten the surrounding comments to be brief and
only where they clarify something non-obvious.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the `for (;;)` + breaks with a `do…while (appliedBackgroundDuringIdle)`
so the loop's actual condition (re-idle while background compactions keep
settling) is explicit, and trim the over-long `pendingBackgroundCompaction`
comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tighten the over-long comments in the compaction indicator helper + component
and the reconstruction watermark block, dropping a redundant inline restatement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kevin-dp kevin-dp force-pushed the feat/context-compaction branch from df21a6e to 295d9ab Compare June 18, 2026 10:11
The runAgent budget-notice fixture predated main's required doUnobserve
field on HandlerContextConfig; the rebase auto-merged without flagging it,
breaking the agents-runtime typecheck.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant