intelowlproject · berardifra · Jun 20, 2026
diff --git a/docs/IntelOwl/advanced_configuration.md b/docs/IntelOwl/advanced_configuration.md
@@ -280,6 +280,28 @@ FLOWER_PWD
 
 or change the `.htpasswd` file that is created in the `docker` directory in the `intelowl_flower` container.
 
+## Chatbot
+
+The optional LLM chatbot (enabled with the `--ollama` flag, see
+[installation](./installation.md#chatbot-ollama)) is configured through the following variables, set
+like every other secret in `docker/env_file_app`. All have sensible defaults; override them only if
+needed.
+
+| Variable | Default | Purpose |
+|---|---|---|
+| `OLLAMA_BASE_URL` | `http://ollama:11434` | URL of the Ollama runtime. |
+| `OLLAMA_MODEL` | `qwen2.5:3b` | Model the agent uses; must support Ollama tool calling (see [Fine-tuning & Prompting](./chatbot_tuning.md)). |
+| `CHATBOT_MESSAGE_RETENTION_DAYS` | `90` | Conversations idle for this many days are pruned by a daily task. |
+| `CHATBOT_RATE_LIMIT` | `5` | Max messages a user may send per window (REST and WebSocket share the bucket). |
+| `CHATBOT_RATE_LIMIT_WINDOW` | `60` | Rate-limit window, in seconds. |
+| `CHATBOT_PENDING_ACTION_TTL` | `600` | Lifetime, in seconds, of a previewed-analysis confirmation before it expires. |
+
+**CPU / GPU.** The chatbot runs on CPU by default and the `qwen2.5:3b` default is sized for that. GPU
+passthrough is not yet supported (tracked in
+[issue #3717](https://github.com/intelowlproject/IntelOwl/issues/3717)). For changing the model, the
+context window, or packaging a custom model, see the
+[Fine-tuning & Prompting](./chatbot_tuning.md) guide.
+
 ## Manual Usage
 
 The `./start` script essentially acts as a wrapper over Docker Compose, performing additional checks.

diff --git a/docs/IntelOwl/chatbot.md b/docs/IntelOwl/chatbot.md
@@ -0,0 +1,101 @@
+# Chatbot
+
+The IntelOwl chatbot is a locally-hosted LLM assistant that answers natural-language questions over
+your threat-intelligence data. It runs entirely on your own deployment (Ollama) and **never sends
+data to external APIs** — your jobs, observables and reports never leave the instance.
+
+The chatbot is an optional component. If your deployment was not started with the Ollama service,
+the chat button does not connect; see the
+[deployment guide](./installation.md#chatbot-ollama) to enable it.
+
+## What it can do
+
+Ask in plain language and the assistant answers by calling read-only IntelOwl tools on your behalf:
+
+- search your jobs and show a job's details;
+- summarize a job or an investigation;
+- show an investigation's job tree;
+- show a job's aggregated data model;
+- list the analyzers available on the instance;
+- recommend a playbook for an observable.
+
+Everything it returns is **scoped to what you can already see** in the UI (your own data, plus what
+your organization and TLP visibility allow). The chatbot cannot reveal anything you could not reach
+through the normal interface.
+
+## Opening the chat
+
+Click the chat-bubble icon in the top navigation bar to open the chat drawer. The drawer overlays
+the current page and the rest of the app stays usable while it is open; it also stays available as
+you navigate between pages.
+
+![chat drawer](./static/chatbot/drawer.png)
+
+A small status badge shows whether the assistant is connected and ready.
+
+## Asking a question
+
+Type a question and press send. The answer streams in token by token. When a question needs data,
+the assistant calls one of the tools above and shows the result formatted (tables, lists and links
+are rendered).
+
+Example questions:
+
+- "Show me my most recent jobs."
+- "Summarize job #1234."
+- "Which analyzers can run on a domain?"
+- "Which playbook should I use for an IP address?"
+
+![a chat turn](./static/chatbot/turn.png)
+
+## Quick actions
+
+Below the input you'll find one-click quick actions. They are **context-aware**: on a job page they
+offer job actions ("Summarize this job", "Which plugins ran?", "Show job details", "Evaluate
+results"), on an investigation page they offer investigation actions, and elsewhere they offer
+general ones ("Show my recent jobs", "List my investigations"). Clicking a chip sends the
+corresponding question for you.
+
+![quick actions](./static/chatbot/quick_actions.png)
+
+## Working with the page you're on
+
+The chatbot knows which page you are viewing. If you are on a job or investigation page, you can
+refer to "this job" / "this investigation" and the assistant resolves it from the current page —
+no need to copy the ID.
+
+## Conversations
+
+Each chat is saved as a conversation so you can come back to it. From the drawer you can:
+
+- start a **new chat**;
+- open the **conversation list** to switch between past conversations;
+- review a conversation's **history** (older messages load automatically);
+- **delete** a conversation you no longer need.
+
+Old conversations are pruned automatically after a retention period configured by the operator (see
+the [advanced configuration](./advanced_configuration.md#chatbot)).
+
+## Launching an analysis safely
+
+The chatbot can suggest running a new analysis on an observable, but it **cannot start one by
+itself**. When you ask it to analyze something, it shows a **preview** of what would run and a
+Confirm / Cancel card. The analysis starts only when **you** click **Confirm** (Cancel discards it).
+The same TLP and visibility rules as the normal analysis flow apply.
+
+![confirm an analysis](./static/chatbot/confirm.png)
+
+This is a deliberate safety guardrail: even if the model misbehaves, it has no path to launch an
+analysis — and therefore cannot send an observable to external analyzers — without an explicit click
+from you.
+
+## Limits and availability
+
+- **Rate limit.** To protect the instance there is a per-user limit on how many messages you can
+  send per minute; if you hit it, wait a moment and try again.
+- **Model availability.** If the chatbot worker or the Ollama service is not running, the drawer
+  shows an "unavailable" state instead of the connected badge and turns are not served — ask your
+  operator to enable/restart the Ollama service.
+- **Long conversations.** A conversation is kept in full, but a very long one can exceed the model's
+  context window (`num_ctx`); when that happens Ollama drops the oldest tokens, so the assistant may
+  lose the earliest messages. There is no automatic summarization of the conversation.
diff --git a/docs/IntelOwl/chatbot_tuning.md b/docs/IntelOwl/chatbot_tuning.md
@@ -0,0 +1,126 @@
+# Chatbot: Fine-tuning & Prompting
+
+This guide covers choosing and customizing the language model behind the chatbot, tuning the prompt,
+and packaging a custom model. It is aimed at operators who want to change the default model or
+improve answer quality. For the environment variables referenced here see the
+[chatbot configuration](./advanced_configuration.md#chatbot).
+
+## How the model is wired
+
+The chatbot uses [Ollama](https://ollama.com/) as a local LLM runtime and LangChain's native
+**tool-calling** agent. At build time the backend creates a `ChatOllama` client
+(`api_app/chatbot_manager/agent/agent.py`) pointed at `OLLAMA_BASE_URL` (default
+`http://ollama:11434`) with `temperature=0` and a fixed context window, binds the chatbot tools to
+it through the tool-calling API, and runs a tool-call → observation loop until the model replies with
+plain text.
+
+Two consequences matter for tuning:
+
+- The model **must support Ollama tool calling**. A model that cannot emit tool calls will not work.
+- The backend sets the prompt and the key inference parameters itself (see below), so they are the
+  levers you tune — not a model's baked-in defaults.
+
+## Choosing a model
+
+The default is **`qwen2.5:3b`**. It is chosen on purpose: it is the smallest model that reliably
+picks the right tool and answers from the tool output with **usable latency on a CPU-only deploy**
+(for comparison, a 7B model such as `mistral` was markedly slower on CPU — minutes per agent round,
+often hitting the turn timeout). On stronger
+hardware you can switch to any larger tool-capable Ollama model for better answer quality.
+
+Requirements for a replacement model:
+
+- it supports tool calling in Ollama;
+- the Ollama server is recent enough to **stream while tools are bound** — IntelOwl pins the Ollama
+  image to `ollama/ollama:0.30.7` for this reason (versions older than 0.8.0 cannot stream with
+  tools); keep this in mind if you run Ollama yourself.
+
+## Pointing to a different model
+
+Set the `OLLAMA_MODEL` secret to the model tag you want (it is pulled automatically on first start).
+Keep the **three** places that reference the default in sync if you change the baked-in default
+rather than just overriding the secret:
+
+- `intel_owl/settings/chatbot.py` — `OLLAMA_MODEL` default;
+- `docker/env_file_app_template` — `OLLAMA_MODEL`;
+- `docker/entrypoints/ollama.sh` — `DEFAULT_MODEL` (the entrypoint that pulls the model).
+
+For a normal deployment you only set the `OLLAMA_MODEL` secret; the entrypoint pulls it on startup.
+
+## The context window (`num_ctx`)
+
+The backend requests an **8192-token** context window (`_NUM_CTX` in `agent.py`). This is
+deliberate: Ollama's default of 2048 tokens silently truncates the prompt (the system prompt plus
+the tool schemas already approach ~2.2k tokens), which drops tool definitions and wrecks tool
+selection. 8192 fits the prompt, the conversation history and the tool observations comfortably and
+keeps the prompt prefix stable across iterations (so follow-up rounds hit Ollama's KV cache).
+
+If you move to a larger model with a bigger context window and longer conversations, raising
+`_NUM_CTX` is the knob — at the cost of more memory and slower evaluation.
+
+## The system prompt
+
+The assistant's instructions live in plain text at
+`api_app/chatbot_manager/agent/system_prompt.txt`. This — not a model's built-in system message — is
+what shapes the assistant's behavior, because the backend sends it as the agent's system prompt on
+every turn. The file is organized in sections:
+
+- `[Role]` — who the assistant is and its answer style (concise, data-driven, cite the tools used);
+- `[Tools — when to use each]` — one line per tool telling the model when to call it;
+- `[Rules]` — hard constraints (only the current user's data; call the right tool instead of
+  guessing; `analyze_observable` only previews and never claims it launched anything);
+- `[Response style]` — formatting expectations.
+
+To tune behavior, edit this file. Practical prompting tips for small local models:
+
+- Keep tool descriptions short and action-oriented ("Use for …"); they compete for context space.
+- State hard guarantees in `[Rules]` (data scoping, the preview-only analysis guardrail) — small
+  models follow short imperative rules better than long prose.
+- When you add a new tool, add a matching one-line entry under `[Tools — when to use each]` so the
+  model knows when to reach for it (see [adding a chatbot tool](./contribute.md)).
+
+## Building a custom model with a Modelfile
+
+Use an Ollama [Modelfile](https://docs.ollama.com/modelfile) to **package the weights** you want to
+run — for example a specific quantization, or a fine-tuned model imported from a local GGUF file:
+
+```dockerfile
+# Modelfile
+FROM qwen2.5:3b-instruct-q4_K_M
+# or import your own weights:
+# FROM ./my-finetuned-model.gguf
+```
+
+Build and register it, then point the chatbot at it:
+
+```bash
+ollama create intelowl-llm -f Modelfile
+# then set the secret:
+OLLAMA_MODEL=intelowl-llm
+```
+
+Important: the backend sets `num_ctx`, `temperature` and the system prompt explicitly on every call,
+so a Modelfile's `SYSTEM` and its `PARAMETER num_ctx` / `PARAMETER temperature` are overridden for the
+chatbot (other `PARAMETER` directives the backend does not set still apply). Use the Modelfile to
+choose *which weights* run; use `system_prompt.txt` (and `_NUM_CTX` in
+`agent.py`) to change *how the assistant behaves*.
+
+## Validating a model before rollout
+
+After switching or building a model, confirm it actually tool-calls before relying on it:
+
+1. Bring up the stack with the Ollama service and wait for the model to finish pulling.
+2. Open the chat and send a question that must use a tool, e.g. **"Show my recent jobs"** or
+   **"Summarize job #<id>"**.
+3. Verify the assistant calls a tool (a tool/status indicator appears) and answers from real data,
+   rather than replying generically. The same check works through the REST endpoint
+   `POST /api/chatbot/sessions/message`.
+
+If the model answers without ever calling a tool, it is not tool-calling reliably — pick a different
+model or a less aggressively quantized variant.
+
+## Out of scope
+
+Actual model training (LoRA/PEFT fine-tuning, dataset preparation, GGUF conversion of trained
+adapters) is outside the scope of this guide. This page covers selecting, configuring, prompting and
+packaging existing tool-capable models.
diff --git a/docs/IntelOwl/contribute.md b/docs/IntelOwl/contribute.md
@@ -576,6 +576,101 @@ We are setting the field `evaluation` depending on some logic that we constructe
 If the IP address has been reported by some AbuseIPDB users but, at the same time, is whitelisted by AbuseIPDB, then we set its `evaluation` to `trusted`. On the contrary, if it's not whitelisted, we set it as `malicious`.
 
 
+## How to add a chatbot tool
+
+The optional [chatbot](./chatbot.md) is a LangChain tool-calling agent. Its capabilities are plain
+Python "tools": each wraps an IntelOwl query and is exposed to the model. Adding a capability means
+adding a tool. The agent lives in `api_app/chatbot_manager/agent/`; the tools live in
+`api_app/chatbot_manager/agent/tools/` — **one file per tool**.
+
+### 1. Write the tool
+
+Create `api_app/chatbot_manager/agent/tools/<your_tool>.py`. A tool is a factory that **closes over
+the requesting `user`** and returns a LangChain `@tool`-decorated function. Closing over the user is
+what enforces multi-tenancy: every queryset is scoped to that user, and the model can never widen it.
+
+```python
+from langchain_core.tools import tool
+
+from api_app.chatbot_manager.agent.tools._common import clamp_limit
+from api_app.chatbot_manager.serializers.my_tool import MyToolResultSerializer
+
+
+def make_my_tool(user):
+    @tool("my_tool")
+    def my_tool(query: str = "", limit: int = 10) -> str:
+        """One-line description the model reads to decide when to call this tool.
+
+        Args:
+            query: what to search for.
+            limit: maximum number of results (default 10, max 50).
+        """
+        from api_app.models import Job  # heavy/circular imports stay function-local
+
+        errors = []
+        limit = clamp_limit(limit, errors)
+        # Scope to the user: visible_for_user matches the REST viewsets / UI.
+        qs = Job.objects.visible_for_user(user).filter(analyzable__name__icontains=query)[:limit]
+        return MyToolResultSerializer({"errors": errors, "results": qs}).to_json()
+
+    return my_tool
+```
+
+Conventions to follow (the maintainers enforce them):
+
+- **Scope every query to `user`** with `visible_for_user(user)` (or the appropriate owner/org
+  filter). Treat all arguments as **untrusted** — they come from the LLM: validate them against the
+  enums in `api_app/choices.py` and clamp limits with `clamp_limit` (`agent/tools/_common.py`).
+- **Return a JSON string** with the same `{"errors": [...], "<payload>": ...}` envelope via a DRF
+  serializer's `.to_json()` — never hand-build a dict. LangChain feeds the returned string back to
+  the model as the tool observation.
+- Use named constants and top-level imports (keep only heavy/circular imports function-local, as the
+  existing tools do).
+
+### 2. Add the result serializer
+
+Add `api_app/chatbot_manager/serializers/<your_tool>.py` producing that envelope (build on the
+shared base in `serializers/base.py`, like the other tools). One serializer module per tool keeps
+parallel PRs from colliding on a shared file.
+
+### 3. Register the tool
+
+Add it to `build_tools()` in `api_app/chatbot_manager/agent/tools/__init__.py`:
+
+```python
+from .my_tool import make_my_tool
+
+
+def build_tools(user) -> list:
+    return [
+        # ... existing tools ...
+        make_my_tool(user),
+    ]
+```
+
+### 4. Tell the model when to use it
+
+Add a one-line entry under `[Tools — when to use each]` in
+`api_app/chatbot_manager/agent/system_prompt.txt`. The agent binds the tools through
+`create_tool_calling_agent`; that line is how the model learns when to reach for yours. See the
+[Fine-tuning & Prompting](./chatbot_tuning.md) guide for the prompt structure.
+
+### 5. Test it
+
+Add a per-tool test under `tests/api_app/chatbot_manager/tools/test_<your_tool>.py`. **Mock Ollama
+and any HTTP** — tests must never hit a real model or network. Cover the scoping (a second user must
+not see the first user's data) and the error/empty branches. For a tool that reads the database, also
+keep its **query count invariant to result size** with a query-count guard (an `assertNumQueries` /
+`CaptureQueriesContext` test that stays constant as the result set grows), so a future un-prefetched
+relation cannot introduce an N+1.
+
+Run the chatbot tests (rebuild the test image first if dependencies changed):
+
+```bash
+./start test build && ./start test up
+docker exec intelowl_uwsgi python manage.py test tests.api_app.chatbot_manager --keepdb
+```
+
 ## How to modify a plugin
 
 If the changes that you have to make should stay local, you can just change the configuration inside the `Django admin` page.

diff --git a/docs/IntelOwl/installation.md b/docs/IntelOwl/installation.md
@@ -269,6 +269,32 @@ docker compose --project-directory docker -f docker/default.yml -f docker/postgr
 ```
 </div>
 
+### Chatbot (Ollama)
+
+IntelOwl ships an optional, locally-hosted LLM chatbot (see the [Chatbot](./chatbot.md) user guide).
+It is disabled by default and enabled with the `--ollama` flag, which adds the
+`docker/ollama.override.yml` compose file. That file starts two extra containers:
+
+- **`ollama`** — the local LLM runtime (image `ollama/ollama:0.30.7`), reachable in-cluster at
+  `http://ollama:11434`; no data ever leaves the deployment.
+- **`celery_worker_chatbot`** — a dedicated Celery worker for the chatbot queue, so chatbot tasks
+  stay isolated from the main analyzer/connector workers.
+
+```bash
+./start prod up --ollama
+```
+
+On first start the Ollama entrypoint **pulls the configured model** (`OLLAMA_MODEL`, default
+`qwen2.5:3b`); the first pull downloads a few GB and can take several minutes — the chatbot reports
+itself unavailable until it completes.
+
+**Hardware.** The default `qwen2.5:3b` is chosen to run on **CPU** with usable latency, so no GPU is
+required. Ensure the host has enough free RAM for the model (a few GB for the 3B default; more for
+larger models). **GPU passthrough is not yet supported** out of the box (tracked in
+[issue #3717](https://github.com/intelowlproject/IntelOwl/issues/3717)). For model selection,
+context window and packaging see the [Fine-tuning & Prompting](./chatbot_tuning.md) guide; for the
+chatbot environment variables see the [advanced configuration](./advanced_configuration.md#chatbot).
+
 ### Stop
 
 To stop the application you have to:

diff --git a/docs/IntelOwl/static/chatbot/confirm.png b/docs/IntelOwl/static/chatbot/confirm.png
diff --git a/docs/IntelOwl/static/chatbot/drawer.png b/docs/IntelOwl/static/chatbot/drawer.png
diff --git a/docs/IntelOwl/static/chatbot/quick_actions.png b/docs/IntelOwl/static/chatbot/quick_actions.png
diff --git a/docs/IntelOwl/static/chatbot/turn.png b/docs/IntelOwl/static/chatbot/turn.png