Fix/agent loop ux#14
Merged
Merged
Conversation
Anthropic via Databricks Gateway rejects requests where a tool_result block has no matching tool_use in a prior assistant message: API 400: messages.0.content.0: unexpected tool_use_id found in tool_result blocks: toolu_bdrk_…. Each tool_result block must have a corresponding tool_use block in the previous message. trimHistory could leave orphans in two ways: 1. The 50-message slice cut between an assistant-with-tool_calls and its tool result message. 2. The char-budget loop shifted off the assistant message but kept subsequent tool results. Now: after slicing/trimming, walk the kept array tracking tool_calls[].id from assistant messages. Any role:"tool" message whose tool_call_id isn't in that set is dropped. Non-tool messages pass through unchanged. Co-authored-by: Isaac
Previously the brick indicator was removed when the first content chunk arrived — the streaming text itself was supposed to be the "still working" cue. But Opus 4.7 often pauses mid-stream between paragraphs and the typewriter catches up, leaving static text on screen with no indication that more is coming. Users assume Mason has frozen. Don't removeThinking on first chunk anymore. After creating the streaming bubble, move the thinking div to *after* it so DOM order is text-then-bricks. The brick stays visible through the entire stream. addMessageEl (when "Calling tool:" lands) removes the thinking div if the response had tool_calls, and send()'s finally removes it when chatLoop exits for a text-only response. Co-authored-by: Isaac
Tool schemas are the heaviest static portion of every turn. With ~80
tools at ~200 tokens each, Mason was re-sending ~16K tokens of static
tool definitions on every single turn — roughly $0.24 per turn at
Opus 4.7 input pricing, $12 over a 50-turn agentic session, just for
tool overhead.
Anthropic supports prompt caching via cache_control: {type: "ephemeral"}.
Marked content is cached for 5 minutes; subsequent turns within the
window read at ~10% of input cost.
Mason now applies cache_control to:
- The last tool definition (Anthropic caches everything up to and
including the marked element, so a single breakpoint covers the
entire tools array)
- The last system message (covers skills manifest + user system
prompt + tool-aware nudge — everything stable before chat history)
No-op for non-Claude models. OpenAI prefixes >1024 tokens cache
automatically already; Gemini uses a separate cachedContents API
that doesn't fit our shape; Meta/Qwen have no standard caching.
Also logs data.usage.cache_creation_input_tokens and
cache_read_input_tokens from non-streaming responses so we can
verify caching is engaging in practice (and how much it saves).
Co-authored-by: Isaac
The chat loop silently exited after 10 tool-call rounds. Polling
patterns (waiting on a job run with manage_job_runs status checks)
or multi-step builds (creating tables + seeding + verifying) blow
past 10 easily — users saw the brick disappear and idle return with
no explanation.
- Raise ITERATION_BUDGET to 40. That covers realistic polling plus
several follow-ups in the same turn.
- Track iterationsUsed and, after the while loop exits without a
clean text-branch return, surface an in-chat error explaining
what happened and how to recover ("send 'continue' or break the
task into smaller steps"). Different message when we exit via
the type-mismatch break vs the budget exhaustion path.
Co-authored-by: Isaac
The cache-usage log only fired on non-streamed responses, but every chat with tools now streams (see PR #9). Without seeing the usage breakdown there was no way to verify the Anthropic prompt caching actually engaged. - Set stream_options.include_usage = true on the chat completions body when shouldStream is set. The upstream then emits one final SSE chunk with chunk.usage populated and empty choices. - Parse chunk.usage in the SSE loop and log: [CHAT] Usage (streamed) — input: X, output: Y, cache_created: A, cache_read: B - cache_created > 0 means the cache was warmed this turn. cache_read > 0 means a prior cache was reused — that's the savings signal. Now users can verify caching by watching the renderer console (or the packaged-build log) after a few turns. If cache_read climbs turn-over-turn while staying close to the tool-schema size, caching is doing its job. Co-authored-by: Isaac
The Databricks Gateway sends chunk.usage in every streaming SSE event with running totals — Mason was logging the line ~38 times per turn, each with output_tokens still climbing. Capture the latest usage and log once after the stream loop completes so the console shows one clear line per turn. Co-authored-by: Isaac
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.