Skip to content

Commit b4be070

Browse files
KI7MTclaude
andcommitted
Add llm-stack documentation — Docker Compose reference stack
Local LLM + 42 ham radio MCP tools in a browser. Includes GPU compatibility table (Turing through Blackwell), tool configuration guide, validated performance numbers (138 tok/s on RTX 5080), and Blackwell open-dkms driver requirements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e8c372b commit b4be070

4 files changed

Lines changed: 284 additions & 1 deletion

File tree

docs/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Ask your AI assistant to look up a callsign, check your LoTW confirmations, find
66

77
---
88

9-
## 12 Packages
9+
## 13 Packages
1010

1111
### Foundation
1212

@@ -39,6 +39,7 @@ Ask your AI assistant to look up a callsign, check your LoTW confirmations, find
3939
| Package | What It Does |
4040
|---------|--------------|
4141
| [qsp-mcp](servers/qsp-mcp.md) | QSP — relay MCP tools to any local LLM (llama.cpp, Ollama, vLLM, SGLang) |
42+
| [llm-stack](servers/llm-stack.md) | Docker Compose — Open WebUI + llama.cpp + MCP tools in a browser |
4243

4344
---
4445

docs/servers/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ QSO-Graph provides 12 MCP packages covering amateur radio logging, confirmations
3737
| Package | Purpose | Auth Pattern |
3838
|---------|---------|-------------|
3939
| [qsp-mcp](qsp-mcp.md) | QSP — relay MCP tools to any local LLM endpoint | None (local) |
40+
| [llm-stack](llm-stack.md) | Docker Compose — Open WebUI + llama.cpp + MCP tools in a browser | None (local) |
4041

4142
---
4243

docs/servers/llm-stack.md

Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
# llm-stack
2+
3+
**Local LLM + 42 ham radio MCP tools in a browser. No cloud, no API keys, no subscriptions.**
4+
5+
A Docker Compose reference stack that wires together Open WebUI, llama.cpp (GPU-accelerated), and 5 qso-graph MCP servers. Clone, configure, launch — ask your local LLM about propagation conditions, POTA spots, WSPR data, and more.
6+
7+
```bash
8+
git clone https://github.com/qso-graph/llm-stack.git
9+
```
10+
11+
[GitHub](https://github.com/qso-graph/llm-stack)
12+
13+
---
14+
15+
## What It Does
16+
17+
llm-stack bundles three services into a single `docker compose up -d`:
18+
19+
1. **llm-engine** — llama.cpp with CUDA GPU acceleration, serving a quantized LLM
20+
2. **open-webui** — browser chat interface with tool-calling support
21+
3. **mcp-tools** — 5 qso-graph MCP servers exposed as OpenAPI endpoints via [mcpo](https://github.com/open-webui/mcpo)
22+
23+
```
24+
┌─────────────────────────────────────────────┐
25+
│ Docker: ai-net network │
26+
│ │
27+
│ ┌──────────┐ ┌──────────┐ │
28+
│ │llm-engine│◄─────│ open-webui│ :3000 │
29+
│ │ :8000 │ │ (browser) │ │
30+
│ │ (GPU) │ └────┬──────┘ │
31+
│ └──────────┘ │ OpenAPI calls │
32+
│ ▼ │
33+
│ ┌──────────────────────────────────────┐ │
34+
│ │ mcp-tools container │ │
35+
│ │ │ │
36+
│ │ mcpo :8001 → solar-mcp (6 tools) │ │
37+
│ │ mcpo :8002 → pota-mcp (6 tools) │ │
38+
│ │ mcpo :8003 → wspr-mcp (8 tools) │ │
39+
│ │ mcpo :8004 → sota-mcp (4 tools) │ │
40+
│ │ mcpo :8005 → iota-mcp (6 tools) │ │
41+
│ │ mcpo :8006 → ionis-mcp (11 tools) │ │
42+
│ └──────────────────────────────────────┘ │
43+
└─────────────────────────────────────────────┘
44+
```
45+
46+
---
47+
48+
## Quick Start
49+
50+
```bash
51+
# 1. Clone and configure
52+
git clone https://github.com/qso-graph/llm-stack.git
53+
cd llm-stack
54+
cp .env.example .env # Defaults work for 16 GB VRAM
55+
56+
# 2. Download the LLM model (~5.5 GB)
57+
./scripts/download-model.sh
58+
59+
# 3. Launch
60+
docker compose up -d
61+
62+
# 4. Open browser
63+
# http://localhost:3000
64+
```
65+
66+
Create an account on first visit (local only, not shared anywhere).
67+
68+
---
69+
70+
## Requirements
71+
72+
- **NVIDIA GPU** with 8+ GB VRAM (16 GB recommended)
73+
- **Docker** with [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
74+
- ~8 GB disk for the default model + ~2 GB for container images
75+
76+
---
77+
78+
## GPU Compatibility
79+
80+
The default Docker image (`ghcr.io/ggml-org/llama.cpp:server-cuda`) supports Turing through Ada Lovelace GPUs. Blackwell GPUs need a local build.
81+
82+
| Architecture | GPUs | SM | Default Image | Notes |
83+
|-------------|------|:--:|:-------------:|-------|
84+
| Turing | RTX 2060–2080, T4 | 75 | Yes | |
85+
| Ampere | RTX 3060–3090, A100 | 80/86 | Yes | |
86+
| Ada Lovelace | RTX 4060–4090, L40 | 89 | Yes | |
87+
| Blackwell | RTX 5070–5090, B200 | 100/120 | **No** | Use `llm-engine/Dockerfile` |
88+
89+
### Blackwell Build (RTX 5070/5080/5090)
90+
91+
If you have a Blackwell GPU, build the engine locally:
92+
93+
```bash
94+
docker build -t ghcr.io/ggml-org/llama.cpp:server-cuda \
95+
-f llm-engine/Dockerfile llm-engine/
96+
docker compose up -d
97+
```
98+
99+
This compiles llama.cpp with SM 120 CUDA support. Build takes 10–20 minutes depending on CPU cores.
100+
101+
!!! warning "Blackwell NVIDIA Driver"
102+
RTX 5080/5090 GPUs **require the open kernel modules**. On RHEL/Rocky Linux:
103+
104+
```bash
105+
sudo dnf module enable nvidia-driver:open-dkms
106+
sudo dnf install kmod-nvidia-open-dkms
107+
```
108+
109+
The standard `nvidia-driver:latest-dkms` will **not work** — the GPU will appear in `lspci` but `nvidia-smi` will show "No devices found."
110+
111+
---
112+
113+
## GPU Sizing
114+
115+
| GPU VRAM | Model | Context | VRAM Used | Notes |
116+
|----------|-------|---------|-----------|-------|
117+
| 8 GB | Qwen2.5-3B Q5_K_M | 8K | ~3 GB | Basic tool calling, limited reasoning |
118+
| 16 GB | Qwen2.5-7B Q5_K_M (default) | 16K | ~6.4 GB | Good tool calling, tested on RTX 5080 |
119+
| 24 GB | Qwen2.5-14B Q5_K_M | 16K | ~12 GB | Better reasoning, fewer prompting issues |
120+
| 48+ GB | Qwen2.5-32B Q5_K_M | 32K | ~24 GB | Best quality, set `LLM_CTX_SIZE=32768` |
121+
122+
To use a different model, download the GGUF file into `models/` and update `LLM_MODEL` in `.env`.
123+
124+
---
125+
126+
## Configuring Tools in Open WebUI
127+
128+
After launching, register the MCP tool servers:
129+
130+
1. **Admin Panel → Settings → Tools** (or Connections → Tool Servers)
131+
2. Add each server as type **OpenAPI** (NOT "MCP Streamable HTTP"):
132+
133+
| Name | URL | Tools |
134+
|------|-----|-------|
135+
| Solar MCP | `http://mcp-tools:8001` | 6 — conditions, alerts, forecast, X-ray, solar wind, band outlook |
136+
| POTA MCP | `http://mcp-tools:8002` | 6 — spots, park info, stats, scheduled activations |
137+
| WSPR MCP | `http://mcp-tools:8003` | 8 — spots, band activity, propagation, grid activity, SNR trends |
138+
| SOTA MCP | `http://mcp-tools:8004` | 4 — spots, alerts, summit info, nearby summits |
139+
| IOTA MCP | `http://mcp-tools:8005` | 6 — island lookup, search, DXCC mapping, nearby groups |
140+
| IONIS MCP | `http://mcp-tools:8006` | 11 — propagation analytics (requires datasets) |
141+
142+
3. **Enable tools per chat** — click the wrench icon in the chat input area
143+
4. **Model settings** — in Advanced Params, set Function Calling to **Native**
144+
145+
!!! note "OpenAPI, not MCP"
146+
Use **OpenAPI** connection type, not "MCP Streamable HTTP." Open WebUI's native MCP support is broken as of v0.7.2. The mcpo proxy handles the translation.
147+
148+
---
149+
150+
## Available Tools
151+
152+
### Solar Weather (6 tools)
153+
Live space weather from NOAA SWPC — solar flux, Kp index, X-ray flux, solar wind, alerts, and HF band outlook.
154+
155+
### POTA (6 tools)
156+
Parks on the Air — live activator spots, park info, activator/hunter stats, scheduled activations, parks by location.
157+
158+
### WSPR (8 tools)
159+
Weak Signal Propagation Reporter — live spots, band activity, top beacons, top spotters, path propagation, grid activity, longest paths, SNR trends.
160+
161+
### SOTA (4 tools)
162+
Summits on the Air — live spots, activation alerts, summit info, nearby summits.
163+
164+
### IOTA (6 tools)
165+
Islands on the Air — group lookup, island search, DXCC mapping, nearby groups, programme statistics.
166+
167+
### IONIS (11 tools, optional)
168+
Propagation analytics from 175M+ signatures — band openings, path analysis, solar correlation, dark hour analysis, current conditions. Requires [IONIS datasets](https://sourceforge.net/projects/ionis-ai/files/v1.0/) (~15 GB).
169+
170+
---
171+
172+
## IONIS Datasets (Optional)
173+
174+
To enable the 11 IONIS propagation analytics tools:
175+
176+
1. Download datasets from [SourceForge](https://sourceforge.net/projects/ionis-ai/files/v1.0/) (~15 GB)
177+
2. Set `IONIS_DATA_DIR` in `.env` to the download directory
178+
3. Launch with the IONIS override:
179+
180+
```bash
181+
docker compose -f docker-compose.yaml -f docker-compose.ionis.yaml up -d
182+
```
183+
184+
Without IONIS datasets, the other 30 tools still work.
185+
186+
---
187+
188+
## Cloudflare Tunnel (Optional)
189+
190+
To expose your instance publicly:
191+
192+
1. Create a tunnel at [Cloudflare Zero Trust](https://one.dash.cloudflare.com/)
193+
2. Set `CLOUDFLARE_TUNNEL_TOKEN` in `.env`
194+
3. Launch with the tunnel profile:
195+
196+
```bash
197+
docker compose --profile tunnel up -d
198+
```
199+
200+
---
201+
202+
## Example Queries
203+
204+
Once tools are enabled, ask questions like:
205+
206+
- *"What are current solar conditions?"*
207+
- *"Show me live POTA activations in a table"*
208+
- *"What WSPR propagation is there on 20m right now?"*
209+
- *"Find SOTA summits near Denver"*
210+
- *"Look up IOTA group OC-001"*
211+
212+
!!! tip "Smaller models need guidance"
213+
7B models sometimes answer from training data instead of calling tools. Prefix your question with the tool name: *"Use solar-mcp — what are current conditions?"* or add a system prompt instructing the model to always use tools for real-time data.
214+
215+
---
216+
217+
## Updating
218+
219+
```bash
220+
# Pull latest MCP server versions from PyPI
221+
docker compose build --no-cache mcp-tools
222+
docker compose up -d mcp-tools
223+
```
224+
225+
---
226+
227+
## Troubleshooting
228+
229+
**GPU not detected in container:**
230+
Verify NVIDIA Container Toolkit is installed and configured:
231+
```bash
232+
sudo nvidia-ctk runtime configure --runtime=docker
233+
sudo systemctl restart docker
234+
docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu22.04 nvidia-smi
235+
```
236+
237+
**Tools not calling:**
238+
Enable tools via the wrench icon in the chat input. Set Function Calling to Native in model Advanced Params.
239+
240+
**Connection refused on tool servers:**
241+
Verify mcp-tools is on the same Docker network: `docker network inspect llm-stack_ai-net`
242+
243+
**Out of VRAM:**
244+
Reduce `LLM_CTX_SIZE` in `.env` (try 8192) or use a smaller quantization (Q4_K_M).
245+
246+
**Blackwell GPU — "No devices found":**
247+
Switch to open kernel modules. See the [Blackwell Build](#blackwell-build-rtx-507050805090) section.
248+
249+
---
250+
251+
## Port Map
252+
253+
| Port | Service | Purpose |
254+
|------|---------|---------|
255+
| 3000 | Open WebUI | Browser chat UI |
256+
| 8000 | llm-engine | LLM inference API (GPU) |
257+
| 8001–8006 | mcpo | MCP tool servers (OpenAPI proxy) |
258+
259+
---
260+
261+
## Performance (Tested)
262+
263+
Validated on EPYC 7302P + RTX 5080 (16 GB VRAM), Rocky Linux 9.7:
264+
265+
| Metric | Value |
266+
|--------|-------|
267+
| Model | Qwen2.5-7B-Instruct Q5_K_M |
268+
| VRAM used | 6.4 GB / 16.3 GB (39%) |
269+
| Prompt throughput | ~1,033 tokens/sec |
270+
| Generation speed | ~138 tokens/sec |
271+
| MCP tool latency | <1 sec (solar, POTA, WSPR) |
272+
273+
---
274+
275+
## Dependencies
276+
277+
- [llama.cpp](https://github.com/ggml-org/llama.cpp) — LLM inference engine (CUDA)
278+
- [Open WebUI](https://github.com/open-webui/open-webui) — browser chat interface
279+
- [mcpo](https://github.com/open-webui/mcpo) — MCP-to-OpenAPI proxy
280+
- [qso-graph MCP servers](https://github.com/qso-graph) — ham radio tool ecosystem

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ nav:
8383
- N1MM Logger+: servers/n1mm-mcp.md
8484
- Infrastructure:
8585
- QSP (Tool Relay): servers/qsp-mcp.md
86+
- LLM Stack: servers/llm-stack.md
8687
- Architecture: architecture.md
8788
- Testing: testing.md
8889
- Demo: https://qso-graph-demo.vercel.app/

0 commit comments

Comments
 (0)