📋 Prerequisites
🐛 Bug Description
The KAgentOllamaLlm streaming path in kagent-adk/src/kagent/adk/models/_ollama.py only reads tool_calls from the chunk where chunk.done == True. However, Ollama's /api/chat streaming protocol emits tool_calls in an earlier chunk and then sends a separate final chunk with done=True, tool_calls=None, content="". As a result, when an Agent has spec.declarative.stream: true (the default), every tool call the model makes is silently discarded. The agent yields an LlmResponse with empty content.parts: [], no event is enqueued, and the A2A request hangs in a dequeue_event poll loop until the client times out.
🔄 Steps to Reproduce
- Apply a
ModelConfig pointing at any Ollama-hosted model with native tool calling (llama3.2:3b, qwen2.5:3b, etc.).
- Apply a declarative
Agent with stream: true and at least one MCP tool (e.g. the default my-first-k8s-agent with k8s_get_resources).
- Send a prompt that should trigger a tool call ("any exciting events in my cluster recently?").
- Observe: no reply is ever returned to the UI; Phoenix shows an
LlmResponse with parts: [] despite non-zero eval_count.
🔬 Direct evidence
Streaming Ollama with the same tool, hitting the upstream directly:
$ curl -s POST /api/chat -d '{"model":"llama3.2:3b","stream":true,"tools":[...],"messages":[...]}'
done=False content='' tool_calls=[{'function': {'name': 'k8s_get_resources', 'arguments': {'resource_type': 'events'}}}]
done=True content='' tool_calls=None
The tool call arrives in the non-final chunk.
🩹 Code location
kagent-adk/src/kagent/adk/models/_ollama.py (streaming branch in generate_content_async):
```python
async for chunk in response:
if chunk.message.content:
aggregated_text += chunk.message.content
yield LlmResponse(..., partial=True, ...)
if chunk.done:
final_parts = []
if aggregated_text:
final_parts.append(types.Part.from_text(text=aggregated_text))
for tc in chunk.message.tool_calls or []: # ← only the done chunk
...
```
Should accumulate `tool_calls` across all chunks:
```python
aggregated_tool_calls: list = []
async for chunk in response:
if chunk.message.content:
...
if chunk.message.tool_calls:
aggregated_tool_calls.extend(chunk.message.tool_calls)
if chunk.done:
...
for tc in aggregated_tool_calls:
...
```
The non-streaming branch in the same function handles this correctly — it's only the streaming path that's broken.
🩺 Workaround
Set `spec.declarative.stream: false` on the Agent CR. The non-streaming path correctly emits `function_call` parts.
💻 Environment
- Chart: `kagent-0.9.4`
- App image: `cr.kagent.dev/kagent-dev/kagent/app:0.9.4`
- `kagent-adk`: 0.3.0
- Ollama backend: tested against `llama3.2:3b` and `gemma3n:e4b` aliases; reproducible with any tool-capable Ollama model
- Kubernetes: kind in devcontainer
📋 Prerequisites
🐛 Bug Description
The
KAgentOllamaLlmstreaming path inkagent-adk/src/kagent/adk/models/_ollama.pyonly readstool_callsfrom the chunk wherechunk.done == True. However, Ollama's/api/chatstreaming protocol emitstool_callsin an earlier chunk and then sends a separate final chunk withdone=True, tool_calls=None, content="". As a result, when an Agent hasspec.declarative.stream: true(the default), every tool call the model makes is silently discarded. The agent yields anLlmResponsewith emptycontent.parts: [], no event is enqueued, and the A2A request hangs in adequeue_eventpoll loop until the client times out.🔄 Steps to Reproduce
ModelConfigpointing at any Ollama-hosted model with native tool calling (llama3.2:3b,qwen2.5:3b, etc.).Agentwithstream: trueand at least one MCP tool (e.g. the defaultmy-first-k8s-agentwithk8s_get_resources).LlmResponsewithparts: []despite non-zeroeval_count.🔬 Direct evidence
Streaming Ollama with the same tool, hitting the upstream directly:
The tool call arrives in the non-final chunk.
🩹 Code location
kagent-adk/src/kagent/adk/models/_ollama.py(streaming branch ingenerate_content_async):```python
async for chunk in response:
if chunk.message.content:
aggregated_text += chunk.message.content
yield LlmResponse(..., partial=True, ...)
if chunk.done:
final_parts = []
if aggregated_text:
final_parts.append(types.Part.from_text(text=aggregated_text))
for tc in chunk.message.tool_calls or []: # ← only the done chunk
...
```
Should accumulate `tool_calls` across all chunks:
```python
aggregated_tool_calls: list = []
async for chunk in response:
if chunk.message.content:
...
if chunk.message.tool_calls:
aggregated_tool_calls.extend(chunk.message.tool_calls)
if chunk.done:
...
for tc in aggregated_tool_calls:
...
```
The non-streaming branch in the same function handles this correctly — it's only the streaming path that's broken.
🩺 Workaround
Set `spec.declarative.stream: false` on the Agent CR. The non-streaming path correctly emits `function_call` parts.
💻 Environment