Skip to content

Fix/agent loop ux#14

Merged
DoyleDev merged 6 commits into
mainfrom
fix/agent-loop-ux
May 21, 2026
Merged

Fix/agent loop ux#14
DoyleDev merged 6 commits into
mainfrom
fix/agent-loop-ux

Conversation

@DoyleDev
Copy link
Copy Markdown
Collaborator

No description provided.

DoyleDev added 6 commits May 20, 2026 16:42
Anthropic via Databricks Gateway rejects requests where a tool_result
block has no matching tool_use in a prior assistant message:

  API 400: messages.0.content.0: unexpected tool_use_id found in
  tool_result blocks: toolu_bdrk_…. Each tool_result block must have
  a corresponding tool_use block in the previous message.

trimHistory could leave orphans in two ways:
1. The 50-message slice cut between an assistant-with-tool_calls and
   its tool result message.
2. The char-budget loop shifted off the assistant message but kept
   subsequent tool results.

Now: after slicing/trimming, walk the kept array tracking
tool_calls[].id from assistant messages. Any role:"tool" message
whose tool_call_id isn't in that set is dropped. Non-tool messages
pass through unchanged.

Co-authored-by: Isaac
Previously the brick indicator was removed when the first content chunk
arrived — the streaming text itself was supposed to be the "still
working" cue. But Opus 4.7 often pauses mid-stream between paragraphs
and the typewriter catches up, leaving static text on screen with no
indication that more is coming. Users assume Mason has frozen.

Don't removeThinking on first chunk anymore. After creating the
streaming bubble, move the thinking div to *after* it so DOM order is
text-then-bricks. The brick stays visible through the entire stream.
addMessageEl (when "Calling tool:" lands) removes the thinking div if
the response had tool_calls, and send()'s finally removes it when
chatLoop exits for a text-only response.

Co-authored-by: Isaac
Tool schemas are the heaviest static portion of every turn. With ~80
tools at ~200 tokens each, Mason was re-sending ~16K tokens of static
tool definitions on every single turn — roughly $0.24 per turn at
Opus 4.7 input pricing, $12 over a 50-turn agentic session, just for
tool overhead.

Anthropic supports prompt caching via cache_control: {type: "ephemeral"}.
Marked content is cached for 5 minutes; subsequent turns within the
window read at ~10% of input cost.

Mason now applies cache_control to:
- The last tool definition (Anthropic caches everything up to and
  including the marked element, so a single breakpoint covers the
  entire tools array)
- The last system message (covers skills manifest + user system
  prompt + tool-aware nudge — everything stable before chat history)

No-op for non-Claude models. OpenAI prefixes >1024 tokens cache
automatically already; Gemini uses a separate cachedContents API
that doesn't fit our shape; Meta/Qwen have no standard caching.

Also logs data.usage.cache_creation_input_tokens and
cache_read_input_tokens from non-streaming responses so we can
verify caching is engaging in practice (and how much it saves).

Co-authored-by: Isaac
The chat loop silently exited after 10 tool-call rounds. Polling
patterns (waiting on a job run with manage_job_runs status checks)
or multi-step builds (creating tables + seeding + verifying) blow
past 10 easily — users saw the brick disappear and idle return with
no explanation.

- Raise ITERATION_BUDGET to 40. That covers realistic polling plus
  several follow-ups in the same turn.
- Track iterationsUsed and, after the while loop exits without a
  clean text-branch return, surface an in-chat error explaining
  what happened and how to recover ("send 'continue' or break the
  task into smaller steps"). Different message when we exit via
  the type-mismatch break vs the budget exhaustion path.

Co-authored-by: Isaac
The cache-usage log only fired on non-streamed responses, but every
chat with tools now streams (see PR #9). Without seeing the usage
breakdown there was no way to verify the Anthropic prompt caching
actually engaged.

- Set stream_options.include_usage = true on the chat completions
  body when shouldStream is set. The upstream then emits one final
  SSE chunk with chunk.usage populated and empty choices.
- Parse chunk.usage in the SSE loop and log:
    [CHAT] Usage (streamed) — input: X, output: Y,
                              cache_created: A, cache_read: B
- cache_created > 0 means the cache was warmed this turn.
  cache_read > 0 means a prior cache was reused — that's the savings
  signal.

Now users can verify caching by watching the renderer console (or
the packaged-build log) after a few turns. If cache_read climbs
turn-over-turn while staying close to the tool-schema size, caching
is doing its job.

Co-authored-by: Isaac
The Databricks Gateway sends chunk.usage in every streaming SSE event
with running totals — Mason was logging the line ~38 times per turn,
each with output_tokens still climbing. Capture the latest usage and
log once after the stream loop completes so the console shows one
clear line per turn.

Co-authored-by: Isaac
@DoyleDev DoyleDev merged commit 3b808c3 into main May 21, 2026
0 of 3 checks passed
@DoyleDev DoyleDev mentioned this pull request May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant