Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/IntelOwl/advanced_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,28 @@ FLOWER_PWD

or change the `.htpasswd` file that is created in the `docker` directory in the `intelowl_flower` container.

## Chatbot

The optional LLM chatbot (enabled with the `--ollama` flag, see
[installation](./installation.md#chatbot-ollama)) is configured through the following variables, set
like every other secret in `docker/env_file_app`. All have sensible defaults; override them only if
needed.

| Variable | Default | Purpose |
|---|---|---|
| `OLLAMA_BASE_URL` | `http://ollama:11434` | URL of the Ollama runtime. |
| `OLLAMA_MODEL` | `qwen2.5:3b` | Model the agent uses; must support Ollama tool calling (see [Fine-tuning & Prompting](./chatbot_tuning.md)). |
| `CHATBOT_MESSAGE_RETENTION_DAYS` | `90` | Conversations idle for this many days are pruned by a daily task. |
| `CHATBOT_RATE_LIMIT` | `5` | Max messages a user may send per window (REST and WebSocket share the bucket). |
| `CHATBOT_RATE_LIMIT_WINDOW` | `60` | Rate-limit window, in seconds. |
| `CHATBOT_PENDING_ACTION_TTL` | `600` | Lifetime, in seconds, of a previewed-analysis confirmation before it expires. |

**CPU / GPU.** The chatbot runs on CPU by default and the `qwen2.5:3b` default is sized for that. GPU
passthrough is not yet supported (tracked in
[issue #3717](https://github.com/intelowlproject/IntelOwl/issues/3717)). For changing the model, the
context window, or packaging a custom model, see the
[Fine-tuning & Prompting](./chatbot_tuning.md) guide.

## Manual Usage

The `./start` script essentially acts as a wrapper over Docker Compose, performing additional checks.
Expand Down
101 changes: 101 additions & 0 deletions docs/IntelOwl/chatbot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Chatbot

The IntelOwl chatbot is a locally-hosted LLM assistant that answers natural-language questions over
your threat-intelligence data. It runs entirely on your own deployment (Ollama) and **never sends
data to external APIs** — your jobs, observables and reports never leave the instance.

The chatbot is an optional component. If your deployment was not started with the Ollama service,
the chat button does not connect; see the
[deployment guide](./installation.md#chatbot-ollama) to enable it.

## What it can do

Ask in plain language and the assistant answers by calling read-only IntelOwl tools on your behalf:

- search your jobs and show a job's details;
- summarize a job or an investigation;
- show an investigation's job tree;
- show a job's aggregated data model;
- list the analyzers available on the instance;
- recommend a playbook for an observable.

Everything it returns is **scoped to what you can already see** in the UI (your own data, plus what
your organization and TLP visibility allow). The chatbot cannot reveal anything you could not reach
through the normal interface.

## Opening the chat

Click the chat-bubble icon in the top navigation bar to open the chat drawer. The drawer overlays
the current page and the rest of the app stays usable while it is open; it also stays available as
you navigate between pages.

![chat drawer](./static/chatbot/drawer.png)

A small status badge shows whether the assistant is connected and ready.

## Asking a question

Type a question and press send. The answer streams in token by token. When a question needs data,
the assistant calls one of the tools above and shows the result formatted (tables, lists and links
are rendered).

Example questions:

- "Show me my most recent jobs."
- "Summarize job #1234."
- "Which analyzers can run on a domain?"
- "Which playbook should I use for an IP address?"

![a chat turn](./static/chatbot/turn.png)

## Quick actions

Below the input you'll find one-click quick actions. They are **context-aware**: on a job page they
offer job actions ("Summarize this job", "Which plugins ran?", "Show job details", "Evaluate
results"), on an investigation page they offer investigation actions, and elsewhere they offer
general ones ("Show my recent jobs", "List my investigations"). Clicking a chip sends the
corresponding question for you.

![quick actions](./static/chatbot/quick_actions.png)

## Working with the page you're on

The chatbot knows which page you are viewing. If you are on a job or investigation page, you can
refer to "this job" / "this investigation" and the assistant resolves it from the current page —
no need to copy the ID.

## Conversations

Each chat is saved as a conversation so you can come back to it. From the drawer you can:

- start a **new chat**;
- open the **conversation list** to switch between past conversations;
- review a conversation's **history** (older messages load automatically);
- **delete** a conversation you no longer need.

Old conversations are pruned automatically after a retention period configured by the operator (see
the [advanced configuration](./advanced_configuration.md#chatbot)).

## Launching an analysis safely

The chatbot can suggest running a new analysis on an observable, but it **cannot start one by
itself**. When you ask it to analyze something, it shows a **preview** of what would run and a
Confirm / Cancel card. The analysis starts only when **you** click **Confirm** (Cancel discards it).
The same TLP and visibility rules as the normal analysis flow apply.

![confirm an analysis](./static/chatbot/confirm.png)

This is a deliberate safety guardrail: even if the model misbehaves, it has no path to launch an
analysis — and therefore cannot send an observable to external analyzers — without an explicit click
from you.

## Limits and availability

- **Rate limit.** To protect the instance there is a per-user limit on how many messages you can
send per minute; if you hit it, wait a moment and try again.
- **Model availability.** If the chatbot worker or the Ollama service is not running, the drawer
shows an "unavailable" state instead of the connected badge and turns are not served — ask your
operator to enable/restart the Ollama service.
- **Long conversations.** A conversation is kept in full, but a very long one can exceed the model's
context window (`num_ctx`); when that happens Ollama drops the oldest tokens, so the assistant may
lose the earliest messages. There is no automatic summarization of the conversation.
126 changes: 126 additions & 0 deletions docs/IntelOwl/chatbot_tuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Chatbot: Fine-tuning & Prompting

This guide covers choosing and customizing the language model behind the chatbot, tuning the prompt,
and packaging a custom model. It is aimed at operators who want to change the default model or
improve answer quality. For the environment variables referenced here see the
[chatbot configuration](./advanced_configuration.md#chatbot).

## How the model is wired

The chatbot uses [Ollama](https://ollama.com/) as a local LLM runtime and LangChain's native
**tool-calling** agent. At build time the backend creates a `ChatOllama` client
(`api_app/chatbot_manager/agent/agent.py`) pointed at `OLLAMA_BASE_URL` (default
`http://ollama:11434`) with `temperature=0` and a fixed context window, binds the chatbot tools to
it through the tool-calling API, and runs a tool-call → observation loop until the model replies with
plain text.

Two consequences matter for tuning:

- The model **must support Ollama tool calling**. A model that cannot emit tool calls will not work.
- The backend sets the prompt and the key inference parameters itself (see below), so they are the
levers you tune — not a model's baked-in defaults.

## Choosing a model

The default is **`qwen2.5:3b`**. It is chosen on purpose: it is the smallest model that reliably
picks the right tool and answers from the tool output with **usable latency on a CPU-only deploy**
(for comparison, a 7B model such as `mistral` was markedly slower on CPU — minutes per agent round,
often hitting the turn timeout). On stronger
hardware you can switch to any larger tool-capable Ollama model for better answer quality.

Requirements for a replacement model:

- it supports tool calling in Ollama;
- the Ollama server is recent enough to **stream while tools are bound** — IntelOwl pins the Ollama
image to `ollama/ollama:0.30.7` for this reason (versions older than 0.8.0 cannot stream with
tools); keep this in mind if you run Ollama yourself.

## Pointing to a different model

Set the `OLLAMA_MODEL` secret to the model tag you want (it is pulled automatically on first start).
Keep the **three** places that reference the default in sync if you change the baked-in default
rather than just overriding the secret:

- `intel_owl/settings/chatbot.py` — `OLLAMA_MODEL` default;
- `docker/env_file_app_template` — `OLLAMA_MODEL`;
- `docker/entrypoints/ollama.sh` — `DEFAULT_MODEL` (the entrypoint that pulls the model).

For a normal deployment you only set the `OLLAMA_MODEL` secret; the entrypoint pulls it on startup.

## The context window (`num_ctx`)

The backend requests an **8192-token** context window (`_NUM_CTX` in `agent.py`). This is
deliberate: Ollama's default of 2048 tokens silently truncates the prompt (the system prompt plus
the tool schemas already approach ~2.2k tokens), which drops tool definitions and wrecks tool
selection. 8192 fits the prompt, the conversation history and the tool observations comfortably and
keeps the prompt prefix stable across iterations (so follow-up rounds hit Ollama's KV cache).

If you move to a larger model with a bigger context window and longer conversations, raising
`_NUM_CTX` is the knob — at the cost of more memory and slower evaluation.

## The system prompt

The assistant's instructions live in plain text at
`api_app/chatbot_manager/agent/system_prompt.txt`. This — not a model's built-in system message — is
what shapes the assistant's behavior, because the backend sends it as the agent's system prompt on
every turn. The file is organized in sections:

- `[Role]` — who the assistant is and its answer style (concise, data-driven, cite the tools used);
- `[Tools — when to use each]` — one line per tool telling the model when to call it;
- `[Rules]` — hard constraints (only the current user's data; call the right tool instead of
guessing; `analyze_observable` only previews and never claims it launched anything);
- `[Response style]` — formatting expectations.

To tune behavior, edit this file. Practical prompting tips for small local models:

- Keep tool descriptions short and action-oriented ("Use for …"); they compete for context space.
- State hard guarantees in `[Rules]` (data scoping, the preview-only analysis guardrail) — small
models follow short imperative rules better than long prose.
- When you add a new tool, add a matching one-line entry under `[Tools — when to use each]` so the
model knows when to reach for it (see [adding a chatbot tool](./contribute.md)).

## Building a custom model with a Modelfile

Use an Ollama [Modelfile](https://docs.ollama.com/modelfile) to **package the weights** you want to
run — for example a specific quantization, or a fine-tuned model imported from a local GGUF file:

```dockerfile
# Modelfile
FROM qwen2.5:3b-instruct-q4_K_M
# or import your own weights:
# FROM ./my-finetuned-model.gguf
```

Build and register it, then point the chatbot at it:

```bash
ollama create intelowl-llm -f Modelfile
# then set the secret:
OLLAMA_MODEL=intelowl-llm
```

Important: the backend sets `num_ctx`, `temperature` and the system prompt explicitly on every call,
so a Modelfile's `SYSTEM` and its `PARAMETER num_ctx` / `PARAMETER temperature` are overridden for the
chatbot (other `PARAMETER` directives the backend does not set still apply). Use the Modelfile to
choose *which weights* run; use `system_prompt.txt` (and `_NUM_CTX` in
`agent.py`) to change *how the assistant behaves*.

## Validating a model before rollout

After switching or building a model, confirm it actually tool-calls before relying on it:

1. Bring up the stack with the Ollama service and wait for the model to finish pulling.
2. Open the chat and send a question that must use a tool, e.g. **"Show my recent jobs"** or
**"Summarize job #<id>"**.
3. Verify the assistant calls a tool (a tool/status indicator appears) and answers from real data,
rather than replying generically. The same check works through the REST endpoint
`POST /api/chatbot/sessions/message`.

If the model answers without ever calling a tool, it is not tool-calling reliably — pick a different
model or a less aggressively quantized variant.

## Out of scope

Actual model training (LoRA/PEFT fine-tuning, dataset preparation, GGUF conversion of trained
adapters) is outside the scope of this guide. This page covers selecting, configuring, prompting and
packaging existing tool-capable models.
95 changes: 95 additions & 0 deletions docs/IntelOwl/contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -576,6 +576,101 @@ We are setting the field `evaluation` depending on some logic that we constructe
If the IP address has been reported by some AbuseIPDB users but, at the same time, is whitelisted by AbuseIPDB, then we set its `evaluation` to `trusted`. On the contrary, if it's not whitelisted, we set it as `malicious`.


## How to add a chatbot tool

The optional [chatbot](./chatbot.md) is a LangChain tool-calling agent. Its capabilities are plain
Python "tools": each wraps an IntelOwl query and is exposed to the model. Adding a capability means
adding a tool. The agent lives in `api_app/chatbot_manager/agent/`; the tools live in
`api_app/chatbot_manager/agent/tools/` — **one file per tool**.

### 1. Write the tool

Create `api_app/chatbot_manager/agent/tools/<your_tool>.py`. A tool is a factory that **closes over
the requesting `user`** and returns a LangChain `@tool`-decorated function. Closing over the user is
what enforces multi-tenancy: every queryset is scoped to that user, and the model can never widen it.

```python
from langchain_core.tools import tool

from api_app.chatbot_manager.agent.tools._common import clamp_limit
from api_app.chatbot_manager.serializers.my_tool import MyToolResultSerializer


def make_my_tool(user):
@tool("my_tool")
def my_tool(query: str = "", limit: int = 10) -> str:
"""One-line description the model reads to decide when to call this tool.

Args:
query: what to search for.
limit: maximum number of results (default 10, max 50).
"""
from api_app.models import Job # heavy/circular imports stay function-local

errors = []
limit = clamp_limit(limit, errors)
# Scope to the user: visible_for_user matches the REST viewsets / UI.
qs = Job.objects.visible_for_user(user).filter(analyzable__name__icontains=query)[:limit]
return MyToolResultSerializer({"errors": errors, "results": qs}).to_json()

return my_tool
```

Conventions to follow (the maintainers enforce them):

- **Scope every query to `user`** with `visible_for_user(user)` (or the appropriate owner/org
filter). Treat all arguments as **untrusted** — they come from the LLM: validate them against the
enums in `api_app/choices.py` and clamp limits with `clamp_limit` (`agent/tools/_common.py`).
- **Return a JSON string** with the same `{"errors": [...], "<payload>": ...}` envelope via a DRF
serializer's `.to_json()` — never hand-build a dict. LangChain feeds the returned string back to
the model as the tool observation.
- Use named constants and top-level imports (keep only heavy/circular imports function-local, as the
existing tools do).

### 2. Add the result serializer

Add `api_app/chatbot_manager/serializers/<your_tool>.py` producing that envelope (build on the
shared base in `serializers/base.py`, like the other tools). One serializer module per tool keeps
parallel PRs from colliding on a shared file.

### 3. Register the tool

Add it to `build_tools()` in `api_app/chatbot_manager/agent/tools/__init__.py`:

```python
from .my_tool import make_my_tool


def build_tools(user) -> list:
return [
# ... existing tools ...
make_my_tool(user),
]
```

### 4. Tell the model when to use it

Add a one-line entry under `[Tools — when to use each]` in
`api_app/chatbot_manager/agent/system_prompt.txt`. The agent binds the tools through
`create_tool_calling_agent`; that line is how the model learns when to reach for yours. See the
[Fine-tuning & Prompting](./chatbot_tuning.md) guide for the prompt structure.

### 5. Test it

Add a per-tool test under `tests/api_app/chatbot_manager/tools/test_<your_tool>.py`. **Mock Ollama
and any HTTP** — tests must never hit a real model or network. Cover the scoping (a second user must
not see the first user's data) and the error/empty branches. For a tool that reads the database, also
keep its **query count invariant to result size** with a query-count guard (an `assertNumQueries` /
`CaptureQueriesContext` test that stays constant as the result set grows), so a future un-prefetched
relation cannot introduce an N+1.

Run the chatbot tests (rebuild the test image first if dependencies changed):

```bash
./start test build && ./start test up
docker exec intelowl_uwsgi python manage.py test tests.api_app.chatbot_manager --keepdb
```

## How to modify a plugin

If the changes that you have to make should stay local, you can just change the configuration inside the `Django admin` page.
Expand Down
26 changes: 26 additions & 0 deletions docs/IntelOwl/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,32 @@ docker compose --project-directory docker -f docker/default.yml -f docker/postgr
```
</div>

### Chatbot (Ollama)

IntelOwl ships an optional, locally-hosted LLM chatbot (see the [Chatbot](./chatbot.md) user guide).
It is disabled by default and enabled with the `--ollama` flag, which adds the
`docker/ollama.override.yml` compose file. That file starts two extra containers:

- **`ollama`** — the local LLM runtime (image `ollama/ollama:0.30.7`), reachable in-cluster at
`http://ollama:11434`; no data ever leaves the deployment.
- **`celery_worker_chatbot`** — a dedicated Celery worker for the chatbot queue, so chatbot tasks
stay isolated from the main analyzer/connector workers.

```bash
./start prod up --ollama
```

On first start the Ollama entrypoint **pulls the configured model** (`OLLAMA_MODEL`, default
`qwen2.5:3b`); the first pull downloads a few GB and can take several minutes — the chatbot reports
itself unavailable until it completes.

**Hardware.** The default `qwen2.5:3b` is chosen to run on **CPU** with usable latency, so no GPU is
required. Ensure the host has enough free RAM for the model (a few GB for the 3B default; more for
larger models). **GPU passthrough is not yet supported** out of the box (tracked in
[issue #3717](https://github.com/intelowlproject/IntelOwl/issues/3717)). For model selection,
context window and packaging see the [Fine-tuning & Prompting](./chatbot_tuning.md) guide; for the
chatbot environment variables see the [advanced configuration](./advanced_configuration.md#chatbot).

### Stop

To stop the application you have to:
Expand Down
Binary file added docs/IntelOwl/static/chatbot/confirm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/IntelOwl/static/chatbot/drawer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/IntelOwl/static/chatbot/quick_actions.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/IntelOwl/static/chatbot/turn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading