Fix/agent loop ux by DoyleDev · Pull Request #14 · databricks-solutions/mason

DoyleDev · 2026-05-21T19:05:07Z

No description provided.

Anthropic via Databricks Gateway rejects requests where a tool_result block has no matching tool_use in a prior assistant message: API 400: messages.0.content.0: unexpected tool_use_id found in tool_result blocks: toolu_bdrk_…. Each tool_result block must have a corresponding tool_use block in the previous message. trimHistory could leave orphans in two ways: 1. The 50-message slice cut between an assistant-with-tool_calls and its tool result message. 2. The char-budget loop shifted off the assistant message but kept subsequent tool results. Now: after slicing/trimming, walk the kept array tracking tool_calls[].id from assistant messages. Any role:"tool" message whose tool_call_id isn't in that set is dropped. Non-tool messages pass through unchanged. Co-authored-by: Isaac

Previously the brick indicator was removed when the first content chunk arrived — the streaming text itself was supposed to be the "still working" cue. But Opus 4.7 often pauses mid-stream between paragraphs and the typewriter catches up, leaving static text on screen with no indication that more is coming. Users assume Mason has frozen. Don't removeThinking on first chunk anymore. After creating the streaming bubble, move the thinking div to *after* it so DOM order is text-then-bricks. The brick stays visible through the entire stream. addMessageEl (when "Calling tool:" lands) removes the thinking div if the response had tool_calls, and send()'s finally removes it when chatLoop exits for a text-only response. Co-authored-by: Isaac

Tool schemas are the heaviest static portion of every turn. With ~80 tools at ~200 tokens each, Mason was re-sending ~16K tokens of static tool definitions on every single turn — roughly $0.24 per turn at Opus 4.7 input pricing, $12 over a 50-turn agentic session, just for tool overhead. Anthropic supports prompt caching via cache_control: {type: "ephemeral"}. Marked content is cached for 5 minutes; subsequent turns within the window read at ~10% of input cost. Mason now applies cache_control to: - The last tool definition (Anthropic caches everything up to and including the marked element, so a single breakpoint covers the entire tools array) - The last system message (covers skills manifest + user system prompt + tool-aware nudge — everything stable before chat history) No-op for non-Claude models. OpenAI prefixes >1024 tokens cache automatically already; Gemini uses a separate cachedContents API that doesn't fit our shape; Meta/Qwen have no standard caching. Also logs data.usage.cache_creation_input_tokens and cache_read_input_tokens from non-streaming responses so we can verify caching is engaging in practice (and how much it saves). Co-authored-by: Isaac

The chat loop silently exited after 10 tool-call rounds. Polling patterns (waiting on a job run with manage_job_runs status checks) or multi-step builds (creating tables + seeding + verifying) blow past 10 easily — users saw the brick disappear and idle return with no explanation. - Raise ITERATION_BUDGET to 40. That covers realistic polling plus several follow-ups in the same turn. - Track iterationsUsed and, after the while loop exits without a clean text-branch return, surface an in-chat error explaining what happened and how to recover ("send 'continue' or break the task into smaller steps"). Different message when we exit via the type-mismatch break vs the budget exhaustion path. Co-authored-by: Isaac

The cache-usage log only fired on non-streamed responses, but every chat with tools now streams (see PR #9). Without seeing the usage breakdown there was no way to verify the Anthropic prompt caching actually engaged. - Set stream_options.include_usage = true on the chat completions body when shouldStream is set. The upstream then emits one final SSE chunk with chunk.usage populated and empty choices. - Parse chunk.usage in the SSE loop and log: [CHAT] Usage (streamed) — input: X, output: Y, cache_created: A, cache_read: B - cache_created > 0 means the cache was warmed this turn. cache_read > 0 means a prior cache was reused — that's the savings signal. Now users can verify caching by watching the renderer console (or the packaged-build log) after a few turns. If cache_read climbs turn-over-turn while staying close to the tool-schema size, caching is doing its job. Co-authored-by: Isaac

The Databricks Gateway sends chunk.usage in every streaming SSE event with running totals — Mason was logging the line ~38 times per turn, each with output_tokens still climbing. Capture the latest usage and log once after the stream loop completes so the console shows one clear line per turn. Co-authored-by: Isaac

DoyleDev added 6 commits May 20, 2026 16:42

DoyleDev merged commit 3b808c3 into main May 21, 2026
0 of 3 checks passed

DoyleDev mentioned this pull request May 21, 2026

chore(release): 1.4.2 #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/agent loop ux#14

Fix/agent loop ux#14
DoyleDev merged 6 commits into
mainfrom
fix/agent-loop-ux

DoyleDev commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DoyleDev commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant