feat(agents): context-usage ring gauge + composition breakdown#4596
feat(agents): context-usage ring gauge + composition breakdown#4596kevin-dp wants to merge 13 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## feat/context-compaction #4596 +/- ##
===========================================================
+ Coverage 57.89% 57.90% +0.01%
===========================================================
Files 350 352 +2
Lines 40644 40719 +75
Branches 11828 11842 +14
===========================================================
+ Hits 23529 23578 +49
- Misses 17078 17104 +26
Partials 37 37
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Electric Agents Mobile BuildLocal mobile checks ran for commit The EAS Android preview build was skipped because the |
d64248e to
81e1cf4
Compare
🤖 Automated review — context-usage ring gauge + composition breakdownGenerated by a review agent. Severity-ranked; cite Findings[medium] — [low] — legend percents may not sum to 100% — [low] — duplicated clamp logic — [nit] — schema/persistence verified clean — [nit] — SVG ring + accessibility verified clean — Verified correct
Test gaps
Verification
OverallMergeable after addressing the tools-estimate issue. The math, clamping, SVG geometry, schema additivity, and parse tolerance are all correct and well-tested; the React/accessibility wiring is solid. The biggest real risk is the [medium]: |
fa79f63 to
e21fadd
Compare
…view #4596] `approxTokens(opts.tools)` hit approxTokens' array branch, which charges a flat ~64 per non-text block — so the "Tools" segment was ~64×toolCount regardless of how large each tool's name/description/parameter schema actually is, and the shortfall silently inflated the "Messages" remainder. Serialize the tool array first so the estimate reflects what really occupies the prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review #4596] The popover header recomputed `min(1, usedTokens/contextWindow)` locally — exactly what `computeContextUsage` already stored as `usage.ratio` (and what the trigger gauge uses). Use it directly so the two can't drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s [review #4596] A segment with a small but non-zero share rendered a coloured swatch + bar sliver labelled "0%", which reads as a contradiction. Label any non-zero segment that rounds to 0% as "<1%" instead. (Per-segment rounding can still make the four rows not sum to exactly 100% — acceptable for a composition popover; exact apportionment isn't worth the complexity here.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…[review #4596] The breakdown clamps `used` to the window so the segments still sum to the window and `free` can't go negative when a step reports usage above the window. That path was exercised but untested; lock it in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks — addressed the actionable findings, one commit each:
The two [nit]s (schema additivity, SVG/accessibility) were verified-clean, so no changes there. Full runtime suite green; UI typecheck + tests pass. |
Claude Code ReviewSummary Adds a circular ring gauge plus a hover composition-breakdown popover to the agents-server-ui context-usage indicator, backed by a new additive What's Working Well
Issues Found Critical (Must Fix): None. Important (Should Fix): None. Suggestions (Nice to Have):
Issue Conformance No linked issue — a soft warning per convention, but the PR description is clear for a self-contained UI enhancement. Changeset ( Previous Review Status All actionable findings from prior iterations remain resolved (tools estimate, Review iteration: 7 | 2026-06-18 |
Claude Code ReviewSummary Adds a circular SVG ring gauge to the composer-footer context-usage indicator plus a hover popover that breaks the prompt down into system / tools / messages / free, modelled on Claude Code's What's Working Well
Issues Found Critical (Must Fix): None. Important (Should Fix): None. Suggestions (Nice to Have):
Issue Conformance No linked issue (per the review context) — not unusual for an internal UI follow-up, but a tracking issue or reference would help. The PR description is unusually thorough and accurately matches the implementation, including the messages-as-remainder rationale and the base-branch note (targets Note on prior automated review The author's own automated review comment (2026-06-17) flagged a Review iteration: 1 | 2026-06-17 |
…bel [review #4596] Switching the indicator to a HoverCard moved the token/window/model detail into hover-only popover content, so the trigger's aria-label dropped to just the percent — a small accessibility regression for keyboard/screen-reader users, who could previously read those numbers from the old Tooltip's label. Restore the `<used> / <window> tokens · <model>` summary to the trigger's own aria-label. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review #4596] The stacked bar already skips zero-ratio segments, but the legend rendered all four rows unconditionally — so a step with no persisted breakdown (older events) showed noisy "System prompt — 0 — 0%" / "Tools — 0 — 0%" rows. Gate the legend rows on `tokens > 0` to match the bar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for the two reviews — both flagged no critical/important issues. Addressed the two actionable suggestions, one commit each:
The "legend percents needn't sum to 100%" note is cosmetic and already mitigated by the UI typecheck + tests pass (111). |
…view #4596] `approxTokens(opts.tools)` hit approxTokens' array branch, which charges a flat ~64 per non-text block — so the "Tools" segment was ~64×toolCount regardless of how large each tool's name/description/parameter schema actually is, and the shortfall silently inflated the "Messages" remainder. Serialize the tool array first so the estimate reflects what really occupies the prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review #4596] The popover header recomputed `min(1, usedTokens/contextWindow)` locally — exactly what `computeContextUsage` already stored as `usage.ratio` (and what the trigger gauge uses). Use it directly so the two can't drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s [review #4596] A segment with a small but non-zero share rendered a coloured swatch + bar sliver labelled "0%", which reads as a contradiction. Label any non-zero segment that rounds to 0% as "<1%" instead. (Per-segment rounding can still make the four rows not sum to exactly 100% — acceptable for a composition popover; exact apportionment isn't worth the complexity here.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3caa029 to
9e02ec4
Compare
…[review #4596] The breakdown clamps `used` to the window so the segments still sum to the window and `free` can't go negative when a step reports usage above the window. That path was exercised but untested; lock it in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bel [review #4596] Switching the indicator to a HoverCard moved the token/window/model detail into hover-only popover content, so the trigger's aria-label dropped to just the percent — a small accessibility regression for keyboard/screen-reader users, who could previously read those numbers from the old Tooltip's label. Restore the `<used> / <window> tokens · <model>` summary to the trigger's own aria-label. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review #4596] The stacked bar already skips zero-ratio segments, but the legend rendered all four rows unconditionally — so a step with no persisted breakdown (older events) showed noisy "System prompt — 0 — 0%" / "Tools — 0 — 0%" rows. Gate the legend rows on `tokens > 0` to match the bar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
samwillis
left a comment
There was a problem hiding this comment.
Reviewed the current head against feat/context-compaction and the previous review thread. The earlier findings are addressed (tools estimate serialization, ratio reuse, <1% tiny segments, over-window clamp test, aria-label detail, and empty legend buckets), and I didn’t find any blocking regressions. CI is green. Approving.
df21a6e to
295d9ab
Compare
…view #4596] `approxTokens(opts.tools)` hit approxTokens' array branch, which charges a flat ~64 per non-text block — so the "Tools" segment was ~64×toolCount regardless of how large each tool's name/description/parameter schema actually is, and the shortfall silently inflated the "Messages" remainder. Serialize the tool array first so the estimate reflects what really occupies the prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review #4596] The popover header recomputed `min(1, usedTokens/contextWindow)` locally — exactly what `computeContextUsage` already stored as `usage.ratio` (and what the trigger gauge uses). Use it directly so the two can't drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s [review #4596] A segment with a small but non-zero share rendered a coloured swatch + bar sliver labelled "0%", which reads as a contradiction. Label any non-zero segment that rounds to 0% as "<1%" instead. (Per-segment rounding can still make the four rows not sum to exactly 100% — acceptable for a composition popover; exact apportionment isn't worth the complexity here.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…[review #4596] The breakdown clamps `used` to the window so the segments still sum to the window and `free` can't go negative when a step reports usage above the window. That path was exercised but untested; lock it in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5fb6a8f to
87bfdb9
Compare
…bel [review #4596] Switching the indicator to a HoverCard moved the token/window/model detail into hover-only popover content, so the trigger's aria-label dropped to just the percent — a small accessibility regression for keyboard/screen-reader users, who could previously read those numbers from the old Tooltip's label. Restore the `<used> / <window> tokens · <model>` summary to the trigger's own aria-label. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review #4596] The stacked bar already skips zero-ratio segments, but the legend rendered all four rows unconditionally — so a step with no persisted breakdown (older events) showed noisy "System prompt — 0 — 0%" / "Tools — 0 — 0%" rows. Gate the legend rows on `tokens > 0` to match the bar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…icator Replace the flat dot before the "X%" label with a small SVG donut whose arc fills proportionally to the context-window usage. Track + progress arc both use currentColor, so the existing normal/warning/critical level colour tints the ring without extra wiring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ges / free)
Hovering the usage indicator now reveals a per-part composition of the prompt,
modelled on Claude Code's `/context`: a stacked bar + legend showing how much
of the window the system prompt, tool definitions, conversation messages, and
free space each occupy.
The runtime persists an approximate decomposition of the stable request parts:
pi-adapter estimates `{ system, tools }` token cost (char/4 via approxTokens)
once per call and writes it to the step as `context_breakdown` alongside the
cache-inclusive `context_input_tokens`. The UI derives the "messages" bucket as
the real total minus those estimates, so the segments always sum to the gauge
even though the part figures are approximate. New shared helpers
`computeContextBreakdown` / `parseContextBreakdown` in token-accountant keep the
math testable and out of the component.
- entity-schema: additive `context_breakdown` string column on steps.
- outbound-bridge / pi-adapter: compute + persist the estimate.
- token-accountant: breakdown helpers + types, exported from the client entry.
- UI: HoverCard popover (ContextUsageDetails) with a composition bar + legend.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…view #4596] `approxTokens(opts.tools)` hit approxTokens' array branch, which charges a flat ~64 per non-text block — so the "Tools" segment was ~64×toolCount regardless of how large each tool's name/description/parameter schema actually is, and the shortfall silently inflated the "Messages" remainder. Serialize the tool array first so the estimate reflects what really occupies the prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review #4596] The popover header recomputed `min(1, usedTokens/contextWindow)` locally — exactly what `computeContextUsage` already stored as `usage.ratio` (and what the trigger gauge uses). Use it directly so the two can't drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s [review #4596] A segment with a small but non-zero share rendered a coloured swatch + bar sliver labelled "0%", which reads as a contradiction. Label any non-zero segment that rounds to 0% as "<1%" instead. (Per-segment rounding can still make the four rows not sum to exactly 100% — acceptable for a composition popover; exact apportionment isn't worth the complexity here.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…[review #4596] The breakdown clamps `used` to the window so the segments still sum to the window and `free` can't go negative when a step reports usage above the window. That path was exercised but untested; lock it in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bel [review #4596] Switching the indicator to a HoverCard moved the token/window/model detail into hover-only popover content, so the trigger's aria-label dropped to just the percent — a small accessibility regression for keyboard/screen-reader users, who could previously read those numbers from the old Tooltip's label. Restore the `<used> / <window> tokens · <model>` summary to the trigger's own aria-label. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review #4596] The stacked bar already skips zero-ratio segments, but the legend rendered all four rows unconditionally — so a step with no persisted breakdown (older events) showed noisy "System prompt — 0 — 0%" / "Tools — 0 — 0%" rows. Gate the legend rows on `tokens > 0` to match the bar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tighten the `context_breakdown` column comment and the tokenBreakdown intro to be brief and to the point. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Trim the computeContextBreakdown JSDoc, the ContextUsageDetails component doc, and an over-long test comment to be brief and to the point. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
87bfdb9 to
ac19f7d
Compare
Improves the composer-footer context-usage indicator (built on the Phase 0 gauge from the base branch). Two commits:
1. Circular ring gauge
Replaces the flat dot before the
X%label with a small SVG donut whose arc fills proportionally to context-window usage. Track + progress arc both usecurrentColor, so the existing normal/warning/critical level colour tints it for free.2. Composition breakdown popover
Hovering the indicator reveals a per-part breakdown of the prompt — a stacked bar + legend showing how much of the window the system prompt, tool definitions, conversation messages, and free space each occupy (modelled on Claude Code's
/context).How the data works:
{ system, tools }, char/4 viaapproxTokens) once per model call and persists them on the step as a new additivecontext_breakdowncolumn, next to the real cache-inclusivecontext_input_tokens.real total − system − tools, and Free aswindow − total, so the segments always sum to the gauge even though the system/tools figures are approximations (the panel notes this).computeContextBreakdown/parseContextBreakdownkeep the math intoken-accountantand out of the component.Files
entity-schema: additivecontext_breakdownstring column on steps.outbound-bridge/pi-adapter: compute + persist the estimate.token-accountant: breakdown helpers + types, exported from the client entry (+ 4 unit tests).ContextUsageRing,ContextUsageDetails(HoverCard popover), wired intoContextUsageIndicator.Testing
tscclean.Note on base
Targets
feat/context-compactionbecause it extends the Phase 0 gauge that only exists there. Re-target tomainonce the compaction PR merges.🤖 Generated with Claude Code