diff --git a/docs/_data/nav.yml b/docs/_data/nav.yml
index 4390e1b21..3c5b094ee 100644
--- a/docs/_data/nav.yml
+++ b/docs/_data/nav.yml
@@ -94,6 +94,8 @@
url: /features/acp/
- title: API Server
url: /features/api-server/
+ - title: Chat Server
+ url: /features/chat-server/
- title: Evaluation
url: /features/evaluation/
- title: RAG
diff --git a/docs/configuration/models/index.md b/docs/configuration/models/index.md
index a86b915e7..77669060a 100644
--- a/docs/configuration/models/index.md
+++ b/docs/configuration/models/index.md
@@ -40,7 +40,7 @@ models:
| Property | Type | Required | Description |
| --------------------- | ---------- | -------- | ------------------------------------------------------------------------------------- |
| `provider` | string | ✓ | Provider: `openai`, `anthropic`, `google`, `amazon-bedrock`, `dmr`, `mistral`, `xai`, `nebius`, `minimax`, `requesty`, `azure`, `ollama`, `github-copilot`, or any [named provider]({{ '/providers/custom/' | relative_url }}). |
-| `model` | string | ✓ | Model name (e.g., `gpt-4o`, `claude-sonnet-4-0`, `gemini-2.5-flash`) |
+| `model` | string | ✓ | Model name (e.g., `gpt-4o`, `claude-sonnet-4-5`, `gemini-2.5-flash`) |
| `temperature` | float | ✗ | Sampling randomness. Range is provider-dependent — typically `0.0–2.0` (Anthropic caps at `1.0`). `0.0` is deterministic. |
| `max_tokens` | int | ✗ | Maximum response length in tokens |
| `top_p` | float | ✗ | Nucleus sampling threshold (`0.0–1.0`) |
@@ -232,7 +232,7 @@ models:
# Anthropic
claude:
provider: anthropic
- model: claude-sonnet-4-0
+ model: claude-sonnet-4-5
max_tokens: 64000
# Google Gemini
diff --git a/docs/configuration/structured-output/index.md b/docs/configuration/structured-output/index.md
index f3208bd5d..24fc500c8 100644
--- a/docs/configuration/structured-output/index.md
+++ b/docs/configuration/structured-output/index.md
@@ -192,7 +192,7 @@ agents:
```yaml
agents:
classifier:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Classify support tickets
instruction: |
Classify the support ticket into the appropriate category
diff --git a/docs/configuration/tools/index.md b/docs/configuration/tools/index.md
index 261ce26b2..c1990bfea 100644
--- a/docs/configuration/tools/index.md
+++ b/docs/configuration/tools/index.md
@@ -395,7 +395,7 @@ toolsets:
```yaml
agents:
root:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Full-featured developer assistant
instruction: You are an expert developer.
toolsets:
diff --git a/docs/features/api-server/index.md b/docs/features/api-server/index.md
index be3e54fdb..aead43afb 100644
--- a/docs/features/api-server/index.md
+++ b/docs/features/api-server/index.md
@@ -190,6 +190,6 @@ Toggle auto-approve with `POST /api/sessions/:id/tools/toggle` for automated wor
diff --git a/docs/features/chat-server/index.md b/docs/features/chat-server/index.md
new file mode 100644
index 000000000..6cb1fa06d
--- /dev/null
+++ b/docs/features/chat-server/index.md
@@ -0,0 +1,230 @@
+---
+title: "Chat Server"
+description: "Expose your agents through an OpenAI-compatible Chat Completions API so any tool that already speaks OpenAI can drive a docker-agent agent."
+permalink: /features/chat-server/
+---
+
+# Chat Server
+
+_Expose your agents through an OpenAI-compatible Chat Completions API so any tool that already speaks OpenAI can drive a docker-agent agent._
+
+## Overview
+
+The `docker agent serve chat` command starts an HTTP server that exposes one or
+more agents through an **OpenAI-compatible Chat Completions API** at
+`/v1/chat/completions` and `/v1/models`. Any client that already speaks the
+OpenAI protocol — for example
+[Open WebUI](https://github.com/open-webui/open-webui), `curl`, the OpenAI
+Python SDK, or LangChain — can drive a docker-agent agent without any custom
+integration.
+
+```bash
+# Single agent — exposed as the model `root`
+$ docker agent serve chat agent.yaml
+
+# Multi-agent config — every agent in the team becomes a model
+$ docker agent serve chat ./team.yaml
+
+# Pick a specific agent from a multi-agent config
+$ docker agent serve chat ./team.yaml --agent reviewer
+
+# Run an agent straight from the registry
+$ docker agent serve chat agentcatalog/pirate --listen 127.0.0.1:9090
+
+# Require a Bearer token, sourced from an env var
+$ docker agent serve chat agent.yaml --api-key-env CHAT_BEARER_TOKEN
+```
+
+
+
💡 When to use chat server vs. API server
+
+
Use the chat server when you want to plug docker-agent into existing OpenAI-compatible tooling (chat UIs, IDE integrations, OpenAI SDK clients). Use the API server when you want full control over sessions, agent execution, tool-call confirmations, and streamed runtime events.
+
+
+
+## Endpoints
+
+The OpenAI-compatible endpoints live under the `/v1` prefix to match the
+OpenAI API surface. The OpenAPI specification is served at the top level so it
+can be discovered without authentication.
+
+| Method | Path | Description |
+| ------ | ---------------------- | ---------------------------------------------------------------------- |
+| `GET` | `/v1/models` | List the agents that this server exposes as models |
+| `POST` | `/v1/chat/completions` | Send messages and receive a completion (regular or streaming) |
+| `GET` | `/openapi.json` | OpenAPI specification for the chat server |
+
+The model identifier in `POST /v1/chat/completions` is the **agent name**.
+For a single-agent config that's typically `root`; for a multi-agent config,
+each named agent becomes its own selectable model.
+
+## Quick Start
+
+```bash
+# 1. Start the server
+$ docker agent serve chat agent.yaml
+Listening on 127.0.0.1:8083
+OpenAI-compatible chat completions endpoint: http://127.0.0.1:8083/v1/chat/completions
+
+# 2. List exposed agents (models)
+$ curl http://127.0.0.1:8083/v1/models
+{"object":"list","data":[{"id":"root","object":"model","owned_by":"docker-agent"}]}
+
+# 3. Send a chat request
+$ curl http://127.0.0.1:8083/v1/chat/completions \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "model": "root",
+ "messages": [{"role": "user", "content": "Hello!"}]
+ }'
+```
+
+### Streaming
+
+Set `"stream": true` in the request body to receive a Server-Sent Events
+(SSE) stream of OpenAI-format `chat.completion.chunk` deltas:
+
+```bash
+$ curl -N http://127.0.0.1:8083/v1/chat/completions \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "model": "root",
+ "stream": true,
+ "messages": [{"role": "user", "content": "Stream a poem"}]
+ }'
+```
+
+### Drive it from the OpenAI Python SDK
+
+Because the wire format is OpenAI-compatible, point any OpenAI client at the
+chat server's `base_url` and use the agent name as the model:
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+ base_url="http://127.0.0.1:8083/v1",
+ api_key="not-needed-when-no-api-key-flag", # required by the SDK, ignored if no auth
+)
+
+resp = client.chat.completions.create(
+ model="root",
+ messages=[{"role": "user", "content": "Hello!"}],
+)
+print(resp.choices[0].message.content)
+```
+
+## Server-side Conversation Caching
+
+By default the server is **stateless**: every request must contain the full
+message history, exactly like OpenAI's API. Enable server-side caching by
+setting `--conversations-max` to a positive value, then send a stable
+`X-Conversation-Id` header on each request:
+
+```bash
+$ docker agent serve chat agent.yaml --conversations-max 100 --conversation-ttl 30m
+```
+
+```bash
+$ curl http://127.0.0.1:8083/v1/chat/completions \
+ -H 'Content-Type: application/json' \
+ -H 'X-Conversation-Id: my-thread-1' \
+ -d '{
+ "model": "root",
+ "messages": [{"role": "user", "content": "Remember my name is Alice"}]
+ }'
+
+$ curl http://127.0.0.1:8083/v1/chat/completions \
+ -H 'Content-Type: application/json' \
+ -H 'X-Conversation-Id: my-thread-1' \
+ -d '{
+ "model": "root",
+ "messages": [{"role": "user", "content": "What is my name?"}]
+ }'
+```
+
+Cached conversations are evicted after `--conversation-ttl` of inactivity, or
+when the cache hits `--conversations-max` items (oldest entries are evicted
+first).
+
+## Authentication
+
+The chat server has **no authentication by default**. To require a Bearer
+token, pass `--api-key` (literal value) or `--api-key-env` (name of an
+environment variable that holds the value):
+
+```bash
+$ docker agent serve chat agent.yaml --api-key-env CHAT_BEARER_TOKEN
+```
+
+Clients must then send an `Authorization: Bearer ` header on every
+request to `/v1/*`. Both `/v1/models` and `/v1/chat/completions` are
+protected once a key is set.
+
+
+
⚠️ Public exposure
+
+
The default listen address is 127.0.0.1:8083. If you bind to a non-loopback address, always set --api-key or --api-key-env — there is no other authentication layer.
+
+
+
+## CORS
+
+CORS is **disabled by default**. To allow a browser-based client to call the
+server, set `--cors-origin` to the exact origin (scheme + host + port) that
+should be allowed:
+
+```bash
+$ docker agent serve chat agent.yaml --cors-origin https://my-ui.example.com
+```
+
+## CLI Flags
+
+```bash
+docker agent serve chat | [flags]
+```
+
+| Flag | Default | Description |
+| ----------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------- |
+| `-a, --agent ` | (all agents) | Name of the agent to expose. If omitted, every agent in the config is exposed as a separate model. |
+| `-l, --listen ` | `127.0.0.1:8083` | Address to listen on. |
+| `--cors-origin ` | (none) | Allowed CORS origin (e.g. `https://example.com`). Empty disables CORS. |
+| `--api-key ` | (none) | Required Bearer token clients must present (`Authorization: Bearer `). Empty disables auth. |
+| `--api-key-env ` | (none) | Read the API key from this environment variable instead of the command line. |
+| `--max-request-size ` | `1048576` (1 MiB) | Maximum request body size. |
+| `--request-timeout ` | `5m` | Per-request timeout (covers model + tool calls + streaming). |
+| `--conversations-max ` | `0` | Cache up to N conversations server-side, keyed by `X-Conversation-Id`. `0` disables — clients must resend history. |
+| `--conversation-ttl ` | `30m` | Idle TTL after which a cached conversation is evicted. |
+| `--max-idle-runtimes ` | `4` | Maximum number of idle runtimes pooled per agent. `0` disables pooling. |
+
+All [runtime configuration flags]({{ '/features/cli/#runtime-configuration-flags' | relative_url }})
+(`--working-dir`, `--env-from-file`, `--models-gateway`, `--hook-*`, …) are
+also accepted.
+
+## Open WebUI Integration
+
+Open WebUI can talk to any OpenAI-compatible endpoint. To plug docker-agent
+in:
+
+1. Start the chat server, optionally with auth:
+
+ ```bash
+ $ docker agent serve chat agent.yaml \
+ --listen 127.0.0.1:8083 \
+ --cors-origin http://localhost:3000 \
+ --api-key-env OPEN_WEBUI_TOKEN
+ ```
+
+2. In Open WebUI, add an OpenAI-compatible connection:
+
+ - **API Base URL:** `http://127.0.0.1:8083/v1`
+ - **API Key:** the value of `OPEN_WEBUI_TOKEN`
+
+3. Each agent in your config appears as a selectable model.
+
+
+
ℹ️ See also
+
+
For the docker-agent–native HTTP API (sessions, tool-call confirmation, runtime events), see the API Server. For full CLI flag documentation, see the CLI Reference.
+
+
diff --git a/docs/features/cli/index.md b/docs/features/cli/index.md
index 3417edd29..d8b451114 100644
--- a/docs/features/cli/index.md
+++ b/docs/features/cli/index.md
@@ -65,8 +65,8 @@ $ docker agent run [config] [message...] [flags]
$ docker agent run agent.yaml
$ docker agent run agent.yaml "Fix the bug in auth.go"
$ docker agent run agent.yaml -a developer --yolo
-$ docker agent run agent.yaml --model anthropic/claude-sonnet-4-0
-$ docker agent run agent.yaml --model "dev=openai/gpt-4o,reviewer=anthropic/claude-sonnet-4-0"
+$ docker agent run agent.yaml --model anthropic/claude-sonnet-4-5
+$ docker agent run agent.yaml --model "dev=openai/gpt-4o,reviewer=anthropic/claude-sonnet-4-5"
$ docker agent run agent.yaml --session -1 # resume last session
$ docker agent run agent.yaml --prompt-file ./context.md # include file as context
@@ -265,6 +265,8 @@ $ curl http://127.0.0.1:8083/v1/chat/completions \
-d '{"model": "root", "messages": [{"role": "user", "content": "hello"}]}'
```
+See [Chat Server]({{ '/features/chat-server/' | relative_url }}) for the full feature reference.
+
### `docker agent share push` / `docker agent share pull`
Share agents via OCI registries.
@@ -344,7 +346,7 @@ $ docker agent alias add other ociReference
# Add an alias with runtime options
$ docker agent alias add yolo-coder agentcatalog/coder --yolo
$ docker agent alias add fast-coder agentcatalog/coder --model openai/gpt-4o-mini
-$ docker agent alias add turbo agentcatalog/coder --yolo --model anthropic/claude-sonnet-4-0
+$ docker agent alias add turbo agentcatalog/coder --yolo --model anthropic/claude-sonnet-4-5
# Use an alias
$ docker agent run pirate
@@ -364,7 +366,7 @@ $ docker agent alias ls
Registered aliases (3):
fast-coder → agentcatalog/coder [model=openai/gpt-4o-mini]
- turbo → agentcatalog/coder [yolo, model=anthropic/claude-sonnet-4-0]
+ turbo → agentcatalog/coder [yolo, model=anthropic/claude-sonnet-4-5]
yolo-coder → agentcatalog/coder [yolo]
Run an alias with: docker agent run
diff --git a/docs/features/mcp-mode/index.md b/docs/features/mcp-mode/index.md
index 9440df9c3..5b62742cf 100644
--- a/docs/features/mcp-mode/index.md
+++ b/docs/features/mcp-mode/index.md
@@ -108,14 +108,14 @@ When you expose a multi-agent configuration via MCP, each agent becomes a separa
```yaml
agents:
root:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Main coordinator
sub_agents: [designer, engineer]
designer:
model: openai/gpt-5-mini
description: UI/UX design specialist
engineer:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Software engineer
```
diff --git a/docs/features/remote-mcp/index.md b/docs/features/remote-mcp/index.md
index 82eb8be24..9177c4a21 100644
--- a/docs/features/remote-mcp/index.md
+++ b/docs/features/remote-mcp/index.md
@@ -188,7 +188,7 @@ Combine multiple remote MCP servers in a single agent:
```yaml
agents:
root:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
instruction: |
You help manage projects and deployments.
toolsets:
diff --git a/docs/guides/go-sdk/index.md b/docs/guides/go-sdk/index.md
index 72afb3e12..108aad4fe 100644
--- a/docs/guides/go-sdk/index.md
+++ b/docs/guides/go-sdk/index.md
@@ -294,7 +294,7 @@ openaiClient, _ := openai.NewClient(ctx, &latest.ModelConfig{
// Anthropic
anthropicClient, _ := anthropic.NewClient(ctx, &latest.ModelConfig{
Provider: "anthropic",
- Model: "claude-sonnet-4-0",
+ Model: "claude-sonnet-4-5",
}, env)
// Google Gemini
diff --git a/docs/guides/tips/index.md b/docs/guides/tips/index.md
index 89681a744..0bc50d379 100644
--- a/docs/guides/tips/index.md
+++ b/docs/guides/tips/index.md
@@ -130,7 +130,7 @@ Always set `max_iterations` for agents with powerful tools to prevent infinite l
```yaml
agents:
developer:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Development assistant
instruction: You are a developer.
max_iterations: 30 # Reasonable limit for development tasks
@@ -150,7 +150,7 @@ Configure fallback models for resilience against provider outages or rate limits
```yaml
agents:
root:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Reliable assistant
instruction: You are a helpful assistant.
fallback:
@@ -210,7 +210,7 @@ For defense in depth, use both permissions and [sandbox mode]({{ '/configuration
```yaml
agents:
secure_dev:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Secure development assistant
instruction: You are a secure coding assistant.
toolsets:
@@ -300,7 +300,7 @@ The root agent uses descriptions to decide which sub-agent to delegate to:
```yaml
agents:
root:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Technical lead
instruction: Delegate to specialists based on the task.
sub_agents: [frontend, backend, devops]
@@ -371,7 +371,7 @@ Set your preferred default model in `~/.config/cagent/config.yaml`:
```yaml
settings:
- default_model: anthropic/claude-sonnet-4-0
+ default_model: anthropic/claude-sonnet-4-5
```
This model is used when you run `docker agent run` without a config file.
@@ -411,7 +411,7 @@ With a simple reviewer agent:
# reviewer.yaml
agents:
root:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: PR reviewer
instruction: |
Review pull requests for code quality, bugs, and security issues.
diff --git a/docs/index.md b/docs/index.md
index 2d96855dd..6c47f39eb 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -60,7 +60,7 @@ Create a file called `agent.yaml`:
```yaml
agents:
root:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: A helpful coding assistant
instruction: |
You are an expert developer. Help users write clean,
diff --git a/docs/providers/bedrock/index.md b/docs/providers/bedrock/index.md
index ba7e7c8ca..a2703b469 100644
--- a/docs/providers/bedrock/index.md
+++ b/docs/providers/bedrock/index.md
@@ -64,7 +64,7 @@ models:
models:
bedrock:
provider: amazon-bedrock
- model: anthropic.claude-3-sonnet-20240229-v1:0
+ model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
provider_opts:
role_arn: "arn:aws:iam::123456789012:role/BedrockAccessRole"
external_id: "my-external-id"
diff --git a/docs/providers/custom/index.md b/docs/providers/custom/index.md
index 44350e3ca..14829e96a 100644
--- a/docs/providers/custom/index.md
+++ b/docs/providers/custom/index.md
@@ -181,7 +181,7 @@ providers:
agents:
root:
- model: router/anthropic/claude-sonnet-4-0
+ model: router/anthropic/claude-sonnet-4-5
```
### Azure OpenAI
diff --git a/docs/providers/openai/index.md b/docs/providers/openai/index.md
index c4c655885..965893e1a 100644
--- a/docs/providers/openai/index.md
+++ b/docs/providers/openai/index.md
@@ -45,7 +45,7 @@ models:
| `gpt-4o` | Multimodal, balanced performance |
| `gpt-4o-mini` | Cheapest, fast for simple tasks |
-Find more model names at [modelname.ai](https://modelname.ai/).
+Find more model names at [modelnames.ai](https://modelnames.ai/) or in the [official OpenAI docs](https://platform.openai.com/docs/models).
## Thinking Budget
diff --git a/docs/tools/lsp/index.md b/docs/tools/lsp/index.md
index 88ac3c6b6..61882e18e 100644
--- a/docs/tools/lsp/index.md
+++ b/docs/tools/lsp/index.md
@@ -45,7 +45,7 @@ The LSP toolset provides these tools to the agent:
```yaml
agents:
developer:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Code developer with LSP support
instruction: You are a software developer.
toolsets:
@@ -136,7 +136,7 @@ You can configure multiple LSP servers for different file types:
```yaml
agents:
polyglot:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Multi-language developer
instruction: You are a full-stack developer.
toolsets:
diff --git a/docs/tools/transfer-task/index.md b/docs/tools/transfer-task/index.md
index c48eda6f4..806d09cdd 100644
--- a/docs/tools/transfer-task/index.md
+++ b/docs/tools/transfer-task/index.md
@@ -27,7 +27,7 @@ agents:
sub_agents: [developer, researcher]
developer:
- model: anthropic/claude-sonnet-4-0
+ model: anthropic/claude-sonnet-4-5
description: Expert software developer
instruction: Write clean, production-ready code.
toolsets: