Skip to content

[BUG] Ollama streaming adapter drops tool_calls emitted before the done chunk #1922

@djmcgreal-cc

Description

@djmcgreal-cc

📋 Prerequisites

  • Searched existing issues
  • Reproducible

🐛 Bug Description

The KAgentOllamaLlm streaming path in kagent-adk/src/kagent/adk/models/_ollama.py only reads tool_calls from the chunk where chunk.done == True. However, Ollama's /api/chat streaming protocol emits tool_calls in an earlier chunk and then sends a separate final chunk with done=True, tool_calls=None, content="". As a result, when an Agent has spec.declarative.stream: true (the default), every tool call the model makes is silently discarded. The agent yields an LlmResponse with empty content.parts: [], no event is enqueued, and the A2A request hangs in a dequeue_event poll loop until the client times out.

🔄 Steps to Reproduce

  1. Apply a ModelConfig pointing at any Ollama-hosted model with native tool calling (llama3.2:3b, qwen2.5:3b, etc.).
  2. Apply a declarative Agent with stream: true and at least one MCP tool (e.g. the default my-first-k8s-agent with k8s_get_resources).
  3. Send a prompt that should trigger a tool call ("any exciting events in my cluster recently?").
  4. Observe: no reply is ever returned to the UI; Phoenix shows an LlmResponse with parts: [] despite non-zero eval_count.

🔬 Direct evidence

Streaming Ollama with the same tool, hitting the upstream directly:

$ curl -s POST /api/chat -d '{"model":"llama3.2:3b","stream":true,"tools":[...],"messages":[...]}'
done=False content=''  tool_calls=[{'function': {'name': 'k8s_get_resources', 'arguments': {'resource_type': 'events'}}}]
done=True  content=''  tool_calls=None

The tool call arrives in the non-final chunk.

🩹 Code location

kagent-adk/src/kagent/adk/models/_ollama.py (streaming branch in generate_content_async):

```python
async for chunk in response:
if chunk.message.content:
aggregated_text += chunk.message.content
yield LlmResponse(..., partial=True, ...)
if chunk.done:
final_parts = []
if aggregated_text:
final_parts.append(types.Part.from_text(text=aggregated_text))
for tc in chunk.message.tool_calls or []: # ← only the done chunk
...
```

Should accumulate `tool_calls` across all chunks:

```python
aggregated_tool_calls: list = []
async for chunk in response:
if chunk.message.content:
...
if chunk.message.tool_calls:
aggregated_tool_calls.extend(chunk.message.tool_calls)
if chunk.done:
...
for tc in aggregated_tool_calls:
...
```

The non-streaming branch in the same function handles this correctly — it's only the streaming path that's broken.

🩺 Workaround

Set `spec.declarative.stream: false` on the Agent CR. The non-streaming path correctly emits `function_call` parts.

💻 Environment

  • Chart: `kagent-0.9.4`
  • App image: `cr.kagent.dev/kagent-dev/kagent/app:0.9.4`
  • `kagent-adk`: 0.3.0
  • Ollama backend: tested against `llama3.2:3b` and `gemma3n:e4b` aliases; reproducible with any tool-capable Ollama model
  • Kubernetes: kind in devcontainer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions