feat(agents): context compaction (budget notices, mid-turn floor, background compaction) by kevin-dp · Pull Request #4605 · electric-sql/electric

kevin-dp · 2026-06-17T08:13:06Z

Adds context compaction to the agents runtime, which previously only truncated at the window limit. Modelled on OpenAI Codex's summarization but adapted to our event-sourced timeline: a compaction checkpoint is a durable context_inserted row placed at a stored watermark, so history reconstruction folds everything up to the watermark into a summary.

What's included

Token-usage gauge — persist cache-inclusive context_input_tokens + context_window per step; ContextUsageIndicator shows "X% used" in the composer footer.
Budget notices — inject a <token_budget> message into the model context at 25 / 50 / 75% usage (synthesized at the model-call seam, not persisted).
Tool-output truncation — cap any single oversized tool_result with a placeholder.
Mid-turn synchronous floor — at the 90% hard ceiling, compact before the model step via the adapter's transformContext hook (so a single tool-heavy turn can't blow the window).
Background (turn-end) compaction — non-blocking: at 85% a detached summarize runs off the critical path; its checkpoint is applied at the next turn's start, or immediately if it finishes while idle. Each generation uses a watermark-unique checkpoint id so a new run can't supersede a prior completed one, and summarize calls are bounded by a hard timeout.
UI — a "Compacting…" indicator (distinct blocking vs. background styling) and a collapsible "Context compacted" entry in the conversation timeline.

Design notes

Reconstruction places the summary at the stored watermark, so a prompt+answer that arrived while a background summary ran (physically after the checkpoint, logically after the watermark) are kept verbatim after the summary.
Only complete checkpoints act as a reconstruction watermark; running/failed are UI-only and never hide history (crash-safe).
Thresholds are env-tunable: ELECTRIC_AGENTS_COMPACT_CEILING, ELECTRIC_AGENTS_COMPACT_BG_CEILING, ELECTRIC_AGENTS_COMPACT_MIN_TOKENS.

Testing

Full runtime suite green (1020 passing), including dedicated compaction, reconstruction, mid-turn, and background unit tests.
Verified live in the desktop app end-to-end (compaction applies, context shrinks, indicator clears).

Note

PR #4596 (context-usage ring gauge + breakdown) is stacked on this branch; re-target it to main after this merges.

🤖 Generated with Claude Code

github-actions · 2026-06-17T08:13:56Z

Electric Agents Desktop Builds

Build artifacts for commit 28ce457.

Platform	Status	Artifact
macOS Apple Silicon	Passed	DMG
macOS Intel	Passed	DMG
Windows x64	Passed	Installer
Linux x64	Passed	AppImage / deb

Workflow run

netlify · 2026-06-17T08:15:02Z

✅ Deploy Preview for electric-next ready!

Name	Link
🔨 Latest commit	`4d680ea`
🔍 Latest deploy log	https://app.netlify.com/projects/electric-next/deploys/6a326faf70e5790008d42977
😎 Deploy Preview	https://deploy-preview-4605--electric-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

codecov · 2026-06-17T08:15:16Z

Codecov Report

❌ Patch coverage is 77.99401% with 147 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.89%. Comparing base (8f4368d) to head (28ce457).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
packages/agents-runtime/src/process-wake.ts	45.65%	50 Missing ⚠️
...s-server-ui/src/components/CompactionIndicator.tsx	0.00%	27 Missing ⚠️
...server-ui/src/components/ContextUsageIndicator.tsx	0.00%	26 Missing ⚠️
...agents-server-ui/src/components/EntityTimeline.tsx	0.00%	14 Missing ⚠️
packages/agents-runtime/src/context-factory.ts	93.33%	10 Missing ⚠️
packages/agents-server-ui/src/lib/compaction.ts	0.00%	9 Missing ⚠️
...es/agents-server-ui/src/hooks/useEntityTimeline.ts	0.00%	4 Missing ⚠️
...s/agents-server-ui/src/components/MessageInput.tsx	0.00%	3 Missing ⚠️
packages/agents-runtime/src/pi-adapter.ts	96.36%	2 Missing ⚠️
packages/agents-runtime/src/timeline-context.ts	96.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4605      +/-   ##
==========================================
+ Coverage   57.56%   57.89%   +0.32%     
==========================================
  Files         342      350       +8     
  Lines       39993    40644     +651     
  Branches    11633    11828     +195     
==========================================
+ Hits        23023    23529     +506     
- Misses      16933    17078     +145     
  Partials       37       37

Flag	Coverage Δ
packages/agents	`72.75% <ø> (ø)`
packages/agents-mobile	`80.67% <ø> (ø)`
packages/agents-runtime	`83.20% <88.94%> (+0.24%)`	⬆️
packages/agents-server	`75.32% <ø> (-0.22%)`	⬇️
packages/agents-server-ui	`7.51% <6.74%> (-0.01%)`	⬇️
packages/electric-ax	`47.62% <ø> (ø)`
typescript	`57.89% <77.99%> (+0.32%)`	⬆️
unit-tests	`57.89% <77.99%> (+0.32%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-06-17T08:17:34Z

Electric Agents Mobile Build

Local mobile checks ran for commit 28ce457.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

kevin-dp · 2026-06-17T08:53:28Z

🤖 Automated review — context compaction

Generated by a review agent. Severity-ranked; cite file:line.

Findings

[high] — mid-turn checkpoint over-covers and drops the kept tail on the next reconstruction — context-factory.ts (mid-turn writeCheckpoint) + timeline-context.ts compactionWatermarkOf
The mid-turn compactor keeps messages.slice(coveredCount) (the last COMPACT_KEEP_TAIL=6 messages) verbatim for the current model call, but the persisted complete checkpoint is written with the fixed id compaction and no attrs.watermark. So compactionWatermarkOf falls back to the checkpoint row's own stream order, which is the latest order at write time. On the next turn, reconstruction drops every timeline item with at <= checkpoint.at — including the 6 tail messages, which are neither in the summary (it only covered 0..coveredCount) nor kept → lost context. Background compaction avoids this by storing watermark: head; the mid-turn path should likewise persist the boundary watermark it actually summarized. No test reconstructs across a mid-turn compaction.

[low] — an orphaned running checkpoint pins the UI indicator forever — CompactionIndicator.tsx
The indicator picks the globally-newest kind:"compaction" row and spins if it is running. Reconstruction correctly ignores orphaned running rows, but the UI has no equivalent guard: if a process crashes after writing a background running row and before its terminal complete/failed, the spinner stays forever (no later row supersedes it, since each generation has a unique id). Cosmetic.

[low] — stale "95%" comments — compaction.ts, token-accountant.ts JSDoc, context-factory.ts
Several comments still say 95% / "critical at 95%" but CONTEXT_USAGE_HARD_CEILING = 0.9. Stale only.

[nit] — pi-adapter.ts pendingRequestMessageCount = messages.length records the pre-compaction length while the request actually sent is the shorter compacted list. Because anchorTokens is re-anchored to real cache-inclusive usage each step-end, the drift is bounded and only affects when the mid-turn trigger fires — not correctness.

Verified correct

Tool-pair invariant in reconstruction (each run emits its tool_call+tool_result together under one at, so a watermark can't split a within-run pair).
Background generation ids (compaction-bg-<watermark>) correctly prevent a new running from superseding a prior complete.
running/crashed checkpoints never hide history.
Idle-apply path: no double-apply (pendingBackgroundCompaction nulled after applying; pre-handler and idle apply are mutually exclusive); the settle-just-before-idle race is handled; the onSettled slot-identity guard prevents stale callbacks.
Summarize timeout: call.catch(()=>{}) attached synchronously before the race (no unhandled rejection), timer cleared in finally. Tested for stall/abort/empty-summary.
Schema fields additive/optional; reconstruction tolerates missing fields.

Test gaps

No test reconstructs across a mid-turn checkpoint (exactly the [high] path).
No orphaned-running UI indicator test.
Idle-apply loop and "settle just before idle" race not unit-tested at the process-wake level.

Overall

The architecture is sound and unusually well-commented; the background lifecycle, supersession, watermark ordering, and timeout race are carefully designed and well-tested. The one finding I'd block on is the mid-turn over-coverage ([high]): unlike background compaction, the synchronous floor doesn't persist a watermark, so its checkpoint silently drops the ~6 verbatim tail messages on the following turn's reconstruction — a real (if bounded) context-loss bug on the 90% path. I'd want that fixed (persist the boundary watermark) plus a reconstruction test before merge; everything else is low/nit/cosmetic.

…4605] The mid-turn (sync floor) checkpoint was persisted with no attrs.watermark, so reconstruction fell back to the checkpoint row's own (latest) stream position and, on the next turn, dropped every item before it — including the verbatim tail the mid-turn summary deliberately excluded (keepTail). Those recent messages were in neither the summary nor the kept set, so they were silently lost on the 90% path. Fix: the mid-turn compactor now folds the WHOLE context into the summary (Codex-style — no verbatim pre-compaction tail), and the checkpoint is stamped with watermark = current timeline head. Summary and watermark now agree, so reconstruction folds exactly what was summarized and keeps everything appended afterward (the model's post-compaction output + the next prompt). This also removes the keepTail + tool-pair orphan-guard complexity. Recent context is still shown after a compaction via the sticky view (summary + messages appended since), so within-turn coherence is preserved. Also drop the stale "95%" hard-ceiling comments (it has been 90% since the ceiling was lowered to match Codex). Tests: rewrote compaction-midturn for the summarize-everything behavior; added a reconstruction test asserting post-compaction messages survive a mid-turn checkpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

kevin-dp · 2026-06-17T09:13:29Z

Addressed the review (8ce00aa):

[high] mid-turn tail-loss — fixed. The mid-turn checkpoint was persisted with no attrs.watermark, so reconstruction fell back to the checkpoint row's own (latest) position and dropped the verbatim tail the summary had excluded. Fix: the mid-turn compactor now folds the whole context into the summary (Codex-style, no verbatim pre-compaction tail) and the checkpoint is stamped with watermark = current timeline head. Summary and watermark now agree, so reconstruction folds exactly what was summarized and keeps everything appended afterward. This also removed the keepTail + tool-pair orphan-guard machinery. Added the missing reconstruction test (timeline-compaction.test.ts → "mid-turn checkpoint keeps messages produced after compaction") and rewrote compaction-midturn.test.ts for the new behavior. Within-turn recency is preserved via the sticky view (summary + messages appended since).

[low] stale "95%" comments — fixed (hard ceiling is 90%).

[low] orphaned running UI spinner & [nit] pendingRequestMessageCount — acknowledged, left as-is: both are cosmetic/bounded and not context-affecting; happy to follow up separately if wanted.

Full runtime suite green (tsc clean).

…#4605] A summarize is bounded by a ~120s hard timeout after which a terminal (complete/failed) checkpoint is always written, so a `running` checkpoint that lingers well past that is orphaned — its process crashed before writing the terminal row, and (with watermark-unique background ids) nothing supersedes it, pinning the spinner forever. Treat a `running` checkpoint older than 150s as stale and stop showing it, with a self-clearing timer so the spinner disappears even when no further events arrive. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…4605] `pendingRequestMessageCount` records the incoming (uncompacted) message count, not the compacted list the adapter may return. That's intentional: pi-agent passes transformContext the full conversation each step, so the count indexes that original array — the next step's trailing slice then measures exactly the messages appended since. Add a comment so it doesn't read as a bug. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

kevin-dp · 2026-06-17T09:20:46Z

Circled back and addressed the remaining two as well, each in its own commit:

[low] orphaned running spinner — c313578: a summarize always writes a terminal row within its ~120s timeout, so a running checkpoint older than 150s is treated as orphaned (crashed) and no longer rendered, with a self-clearing timer so it disappears even with no further events.
[nit] pendingRequestMessageCount — 0bc94ad: added a comment clarifying it intentionally records the incoming (uncompacted) count, since pi-agent passes transformContext the full conversation each step (so the count indexes that array and the next step's trailing slice is exact).

claude · 2026-06-17T09:28:28Z

Claude Code Review

Summary

Adds context compaction to the agents runtime: a context-window usage gauge, model-facing budget notices, oversized tool-output truncation, a synchronous mid-turn compaction floor at the 90% hard ceiling, and non-blocking background (turn-end) compaction — all built on the event-sourced timeline as durable context_inserted checkpoints placed at a stored watermark. Since iteration 6 the only change is a one-line test-fixture fix (commit 28ce45798) that restores the agents-runtime typecheck after the rebase. I re-verified it and the PR remains ready to merge.

What's Working Well

(Unchanged since iteration 6 — re-verified intact.)

Crash-safe reconstruction (timeline-context.ts): only complete checkpoints act as a watermark; running/failed rows are UI-only and never hide history.
Mid-turn summary/watermark agreement (compaction-midturn.ts + context-factory.ts): the mid-turn compactor folds the whole context (no verbatim tail) and the checkpoint is stamped with watermark = timeline head, snapshotted before the summarize await, so an event materializing during a slow summarize cannot push the head past what the summary covered.
Watermark-unique background ids (compaction-bg-<watermark>) so a new running generation cannot supersede a prior complete one.
Bounded summarize (compaction-summarize.ts): signal/timeoutMs plus a hard race timer, losing promise rejection swallowed synchronously, clearTimeout in finally.
Single source of truth for usage (token-accountant.ts) shared by the UI gauge and runtime triggers.

Review of Changes Since Iteration 6

test(agents): add missing doUnobserve to a context config fixture (28ce45798) — the runAgent budget-notice fixture (context-factory.test.ts:601) predated main now-required doUnobserve field on HandlerContextConfig (context-factory.ts:169); the rebase auto-merged without flagging it, breaking the agents-runtime typecheck. The one-line addition brings this fixture in line with the five other HandlerContextConfig fixtures in the same file (lines 100/193/342/470) and the shared helper (context-test-helpers.ts:345), all of which already supply doUnobserve. Correct, test-only, no runtime impact.

Issues Found

Critical (Must Fix): None.

Important (Should Fix): None.

Suggestions (Nice to Have):

Stale PR-description env var (still open). The "Design notes" section still lists ELECTRIC_AGENTS_COMPACT_MIN_TOKENS as an env-tunable threshold, but the mid-turn min-tokens floor was dropped in 86d5e6689 and the var is no longer referenced anywhere under packages/agents-runtime/src/. The changeset correctly omits it (only ELECTRIC_AGENTS_COMPACT_CEILING and ELECTRIC_AGENTS_COMPACT_BG_CEILING). Worth trimming the description so the only public-facing doc of the knobs matches the code. Cosmetic; no code impact.

Issue Conformance

No linked issue — per project convention a warning, substantially mitigated by an unusually detailed PR description and changeset that enumerate scope. Implementation matches the described scope (modulo the stale MIN_TOKENS mention above); no scope creep observed.

Previous Review Status

Incremental review (iteration 7). All items raised across iterations 1–6 remain resolved. The sole change since iteration 6 is the test-fixture typecheck fix described above; the compaction logic is unchanged. Codecov reports all tests successful. The one cosmetic doc nit (stale MIN_TOKENS in the PR description) is still open but is non-blocking.

Review iteration: 7 | 2026-06-18

#4605] The mid-turn checkpoint computed its watermark (timeline head) when the `complete` row was written — after the summarize await. Any event that materialized into the StreamDB during that (slow) await would bump the head past what the summary actually covered, so reconstruction could drop an un-summarized item next turn (item.at <= watermark). Narrow (mid-turn blocks the agent, and pending inbox rows don't materialize) but timing-dependent. Snapshot the head when the `running` row is written instead — before the await — so coverage and watermark derive from the same instant, matching how background compaction already captures its head up front. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…eview #4605] The orphan-clearing logic added to CompactionIndicator had no test, and a regression where rows stop carrying `timestamp` would silently revert to the lingering-spinner bug (NaN → not orphaned → spinner stays). This package has no React-render harness, so extract the decision into a pure `isRunningCheckpointOrphaned(timestamp, now)` helper (with STALE_RUNNING_MS) in lib/ and unit-test it: fresh → shown, just under the deadline → shown, at/past the deadline → hidden, and missing/unparseable timestamp → shown (documented). The component now imports the helper, so the tested logic is exactly what runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

kevin-dp · 2026-06-17T09:41:40Z

Thanks for the careful review. Went through each point:

Addressed (one commit each):

Suggestion 1 — mid-turn watermark captured after the await → b130e42a0. Agreed, real (if narrow) timing-dependence. The watermark is now snapshotted when the running row is written — before the summarize await — so an event materializing into the StreamDB during the slow summarize can't bump the head past what the summary covered. Mirrors how background compaction snapshots its head up front.
Important — untested orphan-clearing logic → 445c7552e. Agreed. This package has no React-render harness (no jsdom/testing-library; passWithNoTests), so rather than pull that in I extracted the decision into a pure isRunningCheckpointOrphaned(timestamp, now) helper (+ STALE_RUNNING_MS) and unit-tested it: fresh → shown, just under deadline → shown, at/past deadline → hidden, and missing/unparseable timestamp → shown (the exact NaN path you flagged, now locked in). The component imports the helper, so the tested logic is what runs.

Declined (with rationale):

Suggestion 2 — reuse selectLatestContextUsage in ContextUsageIndicator. The usage math is already shared (the component calls computeContextUsage); the only duplication is the "latest step by _seq" loop, and the component needs the full latest row (its model_id, and on the stacked feat(agents): context-usage ring gauge + composition breakdown #4596 the context_breakdown too) — so a single pass returning the row is cleaner here than selectLatestContextUsage + a second lookup. Happy to revisit if you'd prefer a shared selectLatestContextStep row-returning helper.
Suggestion 3 — as any on the query collection. It's an established pattern across the existing UI query sources (e.g. lib/compaction.ts); properly typing q.from(db.collections.x) is a worthwhile but separate cross-cutting effort rather than something to one-off here.

Full runtime suite green; UI typechecks + the new helper test pass.

Foundation for context compaction: measure and surface how full the model's context window is, with no behavior change yet. - pi-adapter: capture cache-INCLUSIVE prompt size (input + cacheRead + cacheWrite) and the model context window per step. The existing input_tokens deliberately excludes cache reads for budget accounting, but cached tokens still occupy the window, so a fullness gauge needs the inclusive total. - persist context_input_tokens + context_window on the step row (optional/additive) via the outbound bridge. - token-accountant: single source of truth for usage ratio, severity level, and the compaction thresholds (85% background / 95% ceiling). - UI: ContextUsageIndicator renders "NN% used" in the composer footer from the same helper, coloured at the 85/95 thresholds. Observational only — nothing compacts yet; this validates the token accounting before later phases act on it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ndow in pi-adapter The bridge persistence and the usage-ratio helper were covered, but the adapter's cache-INCLUSIVE total (input + cacheRead + cacheWrite) — the accuracy premise of the context gauge — was not. Assert it equals 1350 where the uncached input_tokens is 150, and that context_window is emitted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Once context usage reaches 25%, inject a <token_budget> notice into the model's messages stating remaining tokens + percent, so the model can pace itself. Recomputed each call from the latest step's persisted usage, so it is always current. - token-accountant: selectLatestContextUsage (latest step with usage), shouldSurfaceContextBudget (gate at 25%), formatContextBudgetNotice, and withContextBudgetNotice (inject just before the final message). - context-factory/runAgent: synthesize and inject before the model call. Synthesized (not persisted) on purpose: a self-superseding context row would leave load_context_history tombstones, which are misleading for an ephemeral budget hint. Synthesis stays deterministic (pure function of persisted steps) so replay/fork reproduce it, with no row churn. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drive a real agent run with a capturing streamFn and assert the <token_budget> notice reaches the model's context, gated on a seeded step's usage: present at 80% usage (with correct "20k tokens (20%) remaining" wording), absent at 10%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…se 2) Cap any single tool_result at ~10k tokens, replacing the body with a visible "[Output truncated: ...]" placeholder before the model call, so one giant output can't fill the context window on its own. Preserves toolCallId/isError so tool-call pairing stays valid. Mirrors Codex's per-message truncation. Truncation is always explicit, never silent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Foundation for context compaction: a compaction checkpoint is a context_inserted row tagged attrs.kind="compaction". timelineMessages now treats the newest such checkpoint's order as a watermark — items before it are dropped (summarized away) and the checkpoint renders the summary in their place. No checkpoint -> watermark is -Infinity -> a strict no-op, so this is inert until the summarizer (next step) writes one. Adds compaction.ts with the checkpoint constants and the Codex summarization prompt + summary prefix (reused verbatim) for the summarization step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

summarizeMessages sends the full history + Codex's summarization prompt to the conversation's own model (a cheap small-window model would overflow a near-full context) and prefixes the result with Codex's summary preamble. The model call is injected via a `complete` seam (defaults to pi-ai completeSimple) so it is unit-testable without a network call or API key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wires the compaction engine into the model-call path. Before a turn, if the last step left context at/over 95% of the window (and the history is actually large), summarize it, persist a compaction checkpoint (context_inserted kind=compaction), and send only the summary this turn — the current ask still arrives via runInput, and future turns reconstruct from the checkpoint watermark. Failure degrades gracefully (logs and proceeds uncompacted). - AgentConfig.summarizeComplete: model-call seam for the summarizer (defaults to the conversation model; injected by tests). - guard on estimated history size avoids re-compacting an already compacted (small) history while the last step's usage reads stale-high. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CONTEXT_USAGE_HARD_CEILING 0.95 -> 0.90, matching Codex's auto-compaction threshold. Drives the synchronous compaction trigger and the UI gauge's "critical" colour. Background start stays at 85%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drives the real runtime end-to-end and makes a real Anthropic summarization call, asserting the returned summary carries the Codex prefix and retains key conversation facts. Skipped unless RUN_LIVE_COMPACTION=1 + LIVE_ANTHROPIC_API_KEY are set, so it never runs (or costs) in CI. Verified passing with claude-haiku-4-5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Its purpose (confirming a real model summarizes correctly through the compaction path) is done. Ongoing coverage lives in the deterministic stubbed tests (compaction-trigger, compaction-summarize, timeline-compaction); a live test pinning a model id + needing a paid key would only rot. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e/failed) Foundation for surfacing synchronous compaction in the UI. The trigger now writes a `running` checkpoint before summarizing and updates it to `complete` (with the summary) or `failed` after. Only a `complete` checkpoint acts as the reconstruction watermark, so an in-flight or crashed compaction never hides history; running/failed checkpoints are UI-only markers, skipped from the model context. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Shows a spinner + "Compacting context…" in the composer footer while a synchronous compaction is in flight, reading the latest compaction checkpoint row and showing it while attrs.status is "running" (clears on complete). Tells the user why the turn paused and that their next prompt is being queued. Mirrors the ContextUsageIndicator pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ELECTRIC_AGENTS_COMPACT_CEILING (0..1, default 0.9) and ELECTRIC_AGENTS_COMPACT_MIN_TOKENS (default contextWindow/2) let the synchronous compaction path be exercised without filling a real window. RFC §12 tunables; default behavior unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Surfaces a completed compaction checkpoint as a collapsed, expandable "Context compacted" marker in the message history, at the point compaction happened. Adds a compaction custom timeline source (mirroring the comments source) reading compaction_summary context_inserted rows, a compaction row kind across the timeline dispatch, and a CompactionTimelineRow card (InlineEventCard, expand to view the summary). Running/failed checkpoints are filtered out — those are shown by the live composer indicator instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Compaction now runs before EVERY model step (not just between turns), so a single turn that balloons across many tool calls can no longer exhaust the context window before the turn ends. Wired through pi-agent-core's transformContext hook. - compaction-midturn.ts: createMidTurnCompactor — folds older messages into a summary, returns [summary, ...recent tail], caches the summary for the rest of the turn (re-summarizing chained off the prior summary only if the tail grows back over the ceiling). - pi-adapter: Codex-style token signal — anchor on the last step's REAL cache-inclusive usage + estimate only the trailing items appended since, vs the model's real context window (not an estimate of the whole history). transformContext + initialContextTokens options added. - context-factory: build the compactor and pass it to the adapter; removed the per-turn pre-sampling compaction block it replaces. Reuses the summarizer, checkpoint lifecycle (running/complete/failed), and UI. summarizeAgentMessages summarizes the AgentMessage[] the hook receives without re-converting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The keepTail boundary could split a tool_call/tool_result pair — folding the assistant tool_use into the summary while keeping the matching tool_result in the tail. Anthropic rejects the orphaned tool_result (400 invalid_request_error: "tool_result must have a corresponding tool_use block in the previous message"). Advance the fold boundary past any leading tool-result messages so the kept tail starts on a fresh user/assistant turn. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Non-blocking compaction so the user almost never waits for the 90% sync floor. Self-contained so it can be reverted as one commit. - Trigger (process-wake): after a turn whose usage ≥ 85% (env ELECTRIC_AGENTS_COMPACT_BG_CEILING), kick off a DETACHED summarization. Its checkpoint is applied at the NEXT turn's start, OR — if the summarize finishes while the entity is idle — immediately by waking the idle loop and writing the checkpoint WITHOUT running the agent (so the indicator never lingers past completion). The slow summarize never blocks; a fast follow-up prompt just runs un-compacted. - context-factory: maybeStartBackgroundCompaction / writeBackgroundCheckpoint / failBackgroundCheckpoint on the handler-context result. Snapshots the timeline head as the watermark; summarizes the whole reconstructed history; writes a background-flavored running→complete checkpoint. - Unique checkpoint id per generation (compaction-bg-<watermark>): context supersession keys on id alone, so with one shared id the NEXT background's `running` row silently superseded the PREVIOUS `complete` one — erasing the watermark and undoing every compaction (context never shrank, the indicator stuck on "running"). running→complete→failed of one generation share the id; the next generation can't clobber it. Mid-turn sync keeps the shared id on purpose (its re-summarization chain wants supersession). - Reconstruction (timeline-context): checkpoints carry a stored attrs.watermark and the summary is rendered AT that watermark — so a prompt+answer that arrived while a background summary ran (physically after the checkpoint, logically after the watermark) are kept verbatim AFTER the summary (RFC §8.5). Falls back to the row's order for sync. - UI: CompactionIndicator shows a subtle "Compacting in background…" for background checkpoints, distinct from the blocking sync indicator. - Summarize hardening (compaction-summarize): bound every summarize call with a 120s deadline — the anthropic provider only enforces a timeout/ abort when the caller passes timeoutMs/signal and never retries, so a stalled stream hung forever. Forward timeoutMs + an AbortSignal the provider honours AND race a hard timer; a timeout becomes an ordinary failure retried next turn-end. The mid-turn 90% sync floor stays as the safety net for single runaway turns. To drop the feature: revert this commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The `minTokens` guard (default 2000, env ELECTRIC_AGENTS_COMPACT_MIN_TOKENS) never changed the outcome at any realistic ceiling — 90% of a real context window is always far above 2000 tokens, so the floor only ever mattered when testing with an artificially low ceiling. Codex has no equivalent floor (it triggers on a single token threshold). Remove the knob, its env override, and the now-unused positiveFromEnv helper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…4605] The mid-turn (sync floor) checkpoint was persisted with no attrs.watermark, so reconstruction fell back to the checkpoint row's own (latest) stream position and, on the next turn, dropped every item before it — including the verbatim tail the mid-turn summary deliberately excluded (keepTail). Those recent messages were in neither the summary nor the kept set, so they were silently lost on the 90% path. Fix: the mid-turn compactor now folds the WHOLE context into the summary (Codex-style — no verbatim pre-compaction tail), and the checkpoint is stamped with watermark = current timeline head. Summary and watermark now agree, so reconstruction folds exactly what was summarized and keeps everything appended afterward (the model's post-compaction output + the next prompt). This also removes the keepTail + tool-pair orphan-guard complexity. Recent context is still shown after a compaction via the sticky view (summary + messages appended since), so within-turn coherence is preserved. Also drop the stale "95%" hard-ceiling comments (it has been 90% since the ceiling was lowered to match Codex). Tests: rewrote compaction-midturn for the summarize-everything behavior; added a reconstruction test asserting post-compaction messages survive a mid-turn checkpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…#4605] A summarize is bounded by a ~120s hard timeout after which a terminal (complete/failed) checkpoint is always written, so a `running` checkpoint that lingers well past that is orphaned — its process crashed before writing the terminal row, and (with watermark-unique background ids) nothing supersedes it, pinning the spinner forever. Treat a `running` checkpoint older than 150s as stale and stop showing it, with a self-clearing timer so the spinner disappears even when no further events arrive. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…4605] `pendingRequestMessageCount` records the incoming (uncompacted) message count, not the compacted list the adapter may return. That's intentional: pi-agent passes transformContext the full conversation each step, so the count indexes that original array — the next step's trailing slice then measures exactly the messages appended since. Add a comment so it doesn't read as a bug. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

#4605] The mid-turn checkpoint computed its watermark (timeline head) when the `complete` row was written — after the summarize await. Any event that materialized into the StreamDB during that (slow) await would bump the head past what the summary actually covered, so reconstruction could drop an un-summarized item next turn (item.at <= watermark). Narrow (mid-turn blocks the agent, and pending inbox rows don't materialize) but timing-dependent. Snapshot the head when the `running` row is written instead — before the await — so coverage and watermark derive from the same instant, matching how background compaction already captures its head up front. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…eview #4605] The orphan-clearing logic added to CompactionIndicator had no test, and a regression where rows stop carrying `timestamp` would silently revert to the lingering-spinner bug (NaN → not orphaned → spinner stays). This package has no React-render harness, so extract the decision into a pure `isRunningCheckpointOrphaned(timestamp, now)` helper (with STALE_RUNNING_MS) in lib/ and unit-test it: fresh → shown, just under the deadline → shown, at/past the deadline → hidden, and missing/unparseable timestamp → shown (documented). The component now imports the helper, so the tested logic is exactly what runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The RFC isn't checked in, so comments can't reference it; phase numbers are internal sequencing, not something the code should narrate. Remove the dangling RFC/§/phase references and tighten the surrounding comments to be brief and only where they clarify something non-obvious. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the `for (;;)` + breaks with a `do…while (appliedBackgroundDuringIdle)` so the loop's actual condition (re-idle while background compactions keep settling) is explicit, and trim the over-long `pendingBackgroundCompaction` comment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Tighten the over-long comments in the compaction indicator helper + component and the reconstruction watermark block, dropping a redundant inline restatement. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The runAgent budget-notice fixture predated main's required doUnobserve field on HandlerContextConfig; the rebase auto-merged without flagging it, breaking the agents-runtime typecheck.

kevin-dp added the claude label Jun 17, 2026

kevin-dp and others added 15 commits June 18, 2026 12:10

kevin-dp and others added 14 commits June 18, 2026 12:11

chore: changeset for context compaction

5a1ea25

chore: downgrade context compaction changeset to patch

26a1901

docs(agents): trim more excessive compaction comments

295d9ab

Tighten the over-long comments in the compaction indicator helper + component and the reconstruction watermark block, dropping a redundant inline restatement. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

kevin-dp force-pushed the feat/context-compaction branch from df21a6e to 295d9ab Compare June 18, 2026 10:11

test(agents): add missing doUnobserve to a context config fixture

28ce457

The runAgent budget-notice fixture predated main's required doUnobserve field on HandlerContextConfig; the rebase auto-merged without flagging it, breaking the agents-runtime typecheck.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): context compaction (budget notices, mid-turn floor, background compaction)#4605

feat(agents): context compaction (budget notices, mid-turn floor, background compaction)#4605
kevin-dp wants to merge 30 commits into
mainfrom
feat/context-compaction

kevin-dp commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

kevin-dp commented Jun 17, 2026

Uh oh!

kevin-dp commented Jun 17, 2026

Uh oh!

kevin-dp commented Jun 17, 2026

Uh oh!

claude Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

kevin-dp commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kevin-dp commented Jun 17, 2026

What's included

Design notes

Testing

Note

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Electric Agents Desktop Builds

Uh oh!

netlify Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for electric-next ready!

Uh oh!

codecov Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Electric Agents Mobile Build

Uh oh!

kevin-dp commented Jun 17, 2026

🤖 Automated review — context compaction

Findings

Verified correct

Test gaps

Overall

Uh oh!

kevin-dp commented Jun 17, 2026

Uh oh!

kevin-dp commented Jun 17, 2026

Uh oh!

claude Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Uh oh!

kevin-dp commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 17, 2026 •

edited

Loading

netlify Bot commented Jun 17, 2026 •

edited

Loading

codecov Bot commented Jun 17, 2026 •

edited

Loading

github-actions Bot commented Jun 17, 2026 •

edited

Loading

claude Bot commented Jun 17, 2026 •

edited

Loading

kevin-dp commented Jun 17, 2026 •

edited

Loading