|
| 1 | +# llm-stack |
| 2 | + |
| 3 | +**Local LLM + 42 ham radio MCP tools in a browser. No cloud, no API keys, no subscriptions.** |
| 4 | + |
| 5 | +A Docker Compose reference stack that wires together Open WebUI, llama.cpp (GPU-accelerated), and 5 qso-graph MCP servers. Clone, configure, launch — ask your local LLM about propagation conditions, POTA spots, WSPR data, and more. |
| 6 | + |
| 7 | +```bash |
| 8 | +git clone https://github.com/qso-graph/llm-stack.git |
| 9 | +``` |
| 10 | + |
| 11 | +[GitHub](https://github.com/qso-graph/llm-stack) |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## What It Does |
| 16 | + |
| 17 | +llm-stack bundles three services into a single `docker compose up -d`: |
| 18 | + |
| 19 | +1. **llm-engine** — llama.cpp with CUDA GPU acceleration, serving a quantized LLM |
| 20 | +2. **open-webui** — browser chat interface with tool-calling support |
| 21 | +3. **mcp-tools** — 5 qso-graph MCP servers exposed as OpenAPI endpoints via [mcpo](https://github.com/open-webui/mcpo) |
| 22 | + |
| 23 | +``` |
| 24 | +┌─────────────────────────────────────────────┐ |
| 25 | +│ Docker: ai-net network │ |
| 26 | +│ │ |
| 27 | +│ ┌──────────┐ ┌──────────┐ │ |
| 28 | +│ │llm-engine│◄─────│ open-webui│ :3000 │ |
| 29 | +│ │ :8000 │ │ (browser) │ │ |
| 30 | +│ │ (GPU) │ └────┬──────┘ │ |
| 31 | +│ └──────────┘ │ OpenAPI calls │ |
| 32 | +│ ▼ │ |
| 33 | +│ ┌──────────────────────────────────────┐ │ |
| 34 | +│ │ mcp-tools container │ │ |
| 35 | +│ │ │ │ |
| 36 | +│ │ mcpo :8001 → solar-mcp (6 tools) │ │ |
| 37 | +│ │ mcpo :8002 → pota-mcp (6 tools) │ │ |
| 38 | +│ │ mcpo :8003 → wspr-mcp (8 tools) │ │ |
| 39 | +│ │ mcpo :8004 → sota-mcp (4 tools) │ │ |
| 40 | +│ │ mcpo :8005 → iota-mcp (6 tools) │ │ |
| 41 | +│ │ mcpo :8006 → ionis-mcp (11 tools) │ │ |
| 42 | +│ └──────────────────────────────────────┘ │ |
| 43 | +└─────────────────────────────────────────────┘ |
| 44 | +``` |
| 45 | + |
| 46 | +--- |
| 47 | + |
| 48 | +## Quick Start |
| 49 | + |
| 50 | +```bash |
| 51 | +# 1. Clone and configure |
| 52 | +git clone https://github.com/qso-graph/llm-stack.git |
| 53 | +cd llm-stack |
| 54 | +cp .env.example .env # Defaults work for 16 GB VRAM |
| 55 | + |
| 56 | +# 2. Download the LLM model (~5.5 GB) |
| 57 | +./scripts/download-model.sh |
| 58 | + |
| 59 | +# 3. Launch |
| 60 | +docker compose up -d |
| 61 | + |
| 62 | +# 4. Open browser |
| 63 | +# http://localhost:3000 |
| 64 | +``` |
| 65 | + |
| 66 | +Create an account on first visit (local only, not shared anywhere). |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | +## Requirements |
| 71 | + |
| 72 | +- **NVIDIA GPU** with 8+ GB VRAM (16 GB recommended) |
| 73 | +- **Docker** with [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) |
| 74 | +- ~8 GB disk for the default model + ~2 GB for container images |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +## GPU Compatibility |
| 79 | + |
| 80 | +The default Docker image (`ghcr.io/ggml-org/llama.cpp:server-cuda`) supports Turing through Ada Lovelace GPUs. Blackwell GPUs need a local build. |
| 81 | + |
| 82 | +| Architecture | GPUs | SM | Default Image | Notes | |
| 83 | +|-------------|------|:--:|:-------------:|-------| |
| 84 | +| Turing | RTX 2060–2080, T4 | 75 | Yes | | |
| 85 | +| Ampere | RTX 3060–3090, A100 | 80/86 | Yes | | |
| 86 | +| Ada Lovelace | RTX 4060–4090, L40 | 89 | Yes | | |
| 87 | +| Blackwell | RTX 5070–5090, B200 | 100/120 | **No** | Use `llm-engine/Dockerfile` | |
| 88 | + |
| 89 | +### Blackwell Build (RTX 5070/5080/5090) |
| 90 | + |
| 91 | +If you have a Blackwell GPU, build the engine locally: |
| 92 | + |
| 93 | +```bash |
| 94 | +docker build -t ghcr.io/ggml-org/llama.cpp:server-cuda \ |
| 95 | + -f llm-engine/Dockerfile llm-engine/ |
| 96 | +docker compose up -d |
| 97 | +``` |
| 98 | + |
| 99 | +This compiles llama.cpp with SM 120 CUDA support. Build takes 10–20 minutes depending on CPU cores. |
| 100 | + |
| 101 | +!!! warning "Blackwell NVIDIA Driver" |
| 102 | + RTX 5080/5090 GPUs **require the open kernel modules**. On RHEL/Rocky Linux: |
| 103 | + |
| 104 | + ```bash |
| 105 | + sudo dnf module enable nvidia-driver:open-dkms |
| 106 | + sudo dnf install kmod-nvidia-open-dkms |
| 107 | + ``` |
| 108 | + |
| 109 | + The standard `nvidia-driver:latest-dkms` will **not work** — the GPU will appear in `lspci` but `nvidia-smi` will show "No devices found." |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +## GPU Sizing |
| 114 | + |
| 115 | +| GPU VRAM | Model | Context | VRAM Used | Notes | |
| 116 | +|----------|-------|---------|-----------|-------| |
| 117 | +| 8 GB | Qwen2.5-3B Q5_K_M | 8K | ~3 GB | Basic tool calling, limited reasoning | |
| 118 | +| 16 GB | Qwen2.5-7B Q5_K_M (default) | 16K | ~6.4 GB | Good tool calling, tested on RTX 5080 | |
| 119 | +| 24 GB | Qwen2.5-14B Q5_K_M | 16K | ~12 GB | Better reasoning, fewer prompting issues | |
| 120 | +| 48+ GB | Qwen2.5-32B Q5_K_M | 32K | ~24 GB | Best quality, set `LLM_CTX_SIZE=32768` | |
| 121 | + |
| 122 | +To use a different model, download the GGUF file into `models/` and update `LLM_MODEL` in `.env`. |
| 123 | + |
| 124 | +--- |
| 125 | + |
| 126 | +## Configuring Tools in Open WebUI |
| 127 | + |
| 128 | +After launching, register the MCP tool servers: |
| 129 | + |
| 130 | +1. **Admin Panel → Settings → Tools** (or Connections → Tool Servers) |
| 131 | +2. Add each server as type **OpenAPI** (NOT "MCP Streamable HTTP"): |
| 132 | + |
| 133 | +| Name | URL | Tools | |
| 134 | +|------|-----|-------| |
| 135 | +| Solar MCP | `http://mcp-tools:8001` | 6 — conditions, alerts, forecast, X-ray, solar wind, band outlook | |
| 136 | +| POTA MCP | `http://mcp-tools:8002` | 6 — spots, park info, stats, scheduled activations | |
| 137 | +| WSPR MCP | `http://mcp-tools:8003` | 8 — spots, band activity, propagation, grid activity, SNR trends | |
| 138 | +| SOTA MCP | `http://mcp-tools:8004` | 4 — spots, alerts, summit info, nearby summits | |
| 139 | +| IOTA MCP | `http://mcp-tools:8005` | 6 — island lookup, search, DXCC mapping, nearby groups | |
| 140 | +| IONIS MCP | `http://mcp-tools:8006` | 11 — propagation analytics (requires datasets) | |
| 141 | + |
| 142 | +3. **Enable tools per chat** — click the wrench icon in the chat input area |
| 143 | +4. **Model settings** — in Advanced Params, set Function Calling to **Native** |
| 144 | + |
| 145 | +!!! note "OpenAPI, not MCP" |
| 146 | + Use **OpenAPI** connection type, not "MCP Streamable HTTP." Open WebUI's native MCP support is broken as of v0.7.2. The mcpo proxy handles the translation. |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +## Available Tools |
| 151 | + |
| 152 | +### Solar Weather (6 tools) |
| 153 | +Live space weather from NOAA SWPC — solar flux, Kp index, X-ray flux, solar wind, alerts, and HF band outlook. |
| 154 | + |
| 155 | +### POTA (6 tools) |
| 156 | +Parks on the Air — live activator spots, park info, activator/hunter stats, scheduled activations, parks by location. |
| 157 | + |
| 158 | +### WSPR (8 tools) |
| 159 | +Weak Signal Propagation Reporter — live spots, band activity, top beacons, top spotters, path propagation, grid activity, longest paths, SNR trends. |
| 160 | + |
| 161 | +### SOTA (4 tools) |
| 162 | +Summits on the Air — live spots, activation alerts, summit info, nearby summits. |
| 163 | + |
| 164 | +### IOTA (6 tools) |
| 165 | +Islands on the Air — group lookup, island search, DXCC mapping, nearby groups, programme statistics. |
| 166 | + |
| 167 | +### IONIS (11 tools, optional) |
| 168 | +Propagation analytics from 175M+ signatures — band openings, path analysis, solar correlation, dark hour analysis, current conditions. Requires [IONIS datasets](https://sourceforge.net/projects/ionis-ai/files/v1.0/) (~15 GB). |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +## IONIS Datasets (Optional) |
| 173 | + |
| 174 | +To enable the 11 IONIS propagation analytics tools: |
| 175 | + |
| 176 | +1. Download datasets from [SourceForge](https://sourceforge.net/projects/ionis-ai/files/v1.0/) (~15 GB) |
| 177 | +2. Set `IONIS_DATA_DIR` in `.env` to the download directory |
| 178 | +3. Launch with the IONIS override: |
| 179 | + |
| 180 | +```bash |
| 181 | +docker compose -f docker-compose.yaml -f docker-compose.ionis.yaml up -d |
| 182 | +``` |
| 183 | + |
| 184 | +Without IONIS datasets, the other 30 tools still work. |
| 185 | + |
| 186 | +--- |
| 187 | + |
| 188 | +## Cloudflare Tunnel (Optional) |
| 189 | + |
| 190 | +To expose your instance publicly: |
| 191 | + |
| 192 | +1. Create a tunnel at [Cloudflare Zero Trust](https://one.dash.cloudflare.com/) |
| 193 | +2. Set `CLOUDFLARE_TUNNEL_TOKEN` in `.env` |
| 194 | +3. Launch with the tunnel profile: |
| 195 | + |
| 196 | +```bash |
| 197 | +docker compose --profile tunnel up -d |
| 198 | +``` |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## Example Queries |
| 203 | + |
| 204 | +Once tools are enabled, ask questions like: |
| 205 | + |
| 206 | +- *"What are current solar conditions?"* |
| 207 | +- *"Show me live POTA activations in a table"* |
| 208 | +- *"What WSPR propagation is there on 20m right now?"* |
| 209 | +- *"Find SOTA summits near Denver"* |
| 210 | +- *"Look up IOTA group OC-001"* |
| 211 | + |
| 212 | +!!! tip "Smaller models need guidance" |
| 213 | + 7B models sometimes answer from training data instead of calling tools. Prefix your question with the tool name: *"Use solar-mcp — what are current conditions?"* or add a system prompt instructing the model to always use tools for real-time data. |
| 214 | + |
| 215 | +--- |
| 216 | + |
| 217 | +## Updating |
| 218 | + |
| 219 | +```bash |
| 220 | +# Pull latest MCP server versions from PyPI |
| 221 | +docker compose build --no-cache mcp-tools |
| 222 | +docker compose up -d mcp-tools |
| 223 | +``` |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## Troubleshooting |
| 228 | + |
| 229 | +**GPU not detected in container:** |
| 230 | +Verify NVIDIA Container Toolkit is installed and configured: |
| 231 | +```bash |
| 232 | +sudo nvidia-ctk runtime configure --runtime=docker |
| 233 | +sudo systemctl restart docker |
| 234 | +docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu22.04 nvidia-smi |
| 235 | +``` |
| 236 | + |
| 237 | +**Tools not calling:** |
| 238 | +Enable tools via the wrench icon in the chat input. Set Function Calling to Native in model Advanced Params. |
| 239 | + |
| 240 | +**Connection refused on tool servers:** |
| 241 | +Verify mcp-tools is on the same Docker network: `docker network inspect llm-stack_ai-net` |
| 242 | + |
| 243 | +**Out of VRAM:** |
| 244 | +Reduce `LLM_CTX_SIZE` in `.env` (try 8192) or use a smaller quantization (Q4_K_M). |
| 245 | + |
| 246 | +**Blackwell GPU — "No devices found":** |
| 247 | +Switch to open kernel modules. See the [Blackwell Build](#blackwell-build-rtx-507050805090) section. |
| 248 | + |
| 249 | +--- |
| 250 | + |
| 251 | +## Port Map |
| 252 | + |
| 253 | +| Port | Service | Purpose | |
| 254 | +|------|---------|---------| |
| 255 | +| 3000 | Open WebUI | Browser chat UI | |
| 256 | +| 8000 | llm-engine | LLM inference API (GPU) | |
| 257 | +| 8001–8006 | mcpo | MCP tool servers (OpenAPI proxy) | |
| 258 | + |
| 259 | +--- |
| 260 | + |
| 261 | +## Performance (Tested) |
| 262 | + |
| 263 | +Validated on EPYC 7302P + RTX 5080 (16 GB VRAM), Rocky Linux 9.7: |
| 264 | + |
| 265 | +| Metric | Value | |
| 266 | +|--------|-------| |
| 267 | +| Model | Qwen2.5-7B-Instruct Q5_K_M | |
| 268 | +| VRAM used | 6.4 GB / 16.3 GB (39%) | |
| 269 | +| Prompt throughput | ~1,033 tokens/sec | |
| 270 | +| Generation speed | ~138 tokens/sec | |
| 271 | +| MCP tool latency | <1 sec (solar, POTA, WSPR) | |
| 272 | + |
| 273 | +--- |
| 274 | + |
| 275 | +## Dependencies |
| 276 | + |
| 277 | +- [llama.cpp](https://github.com/ggml-org/llama.cpp) — LLM inference engine (CUDA) |
| 278 | +- [Open WebUI](https://github.com/open-webui/open-webui) — browser chat interface |
| 279 | +- [mcpo](https://github.com/open-webui/mcpo) — MCP-to-OpenAPI proxy |
| 280 | +- [qso-graph MCP servers](https://github.com/qso-graph) — ham radio tool ecosystem |
0 commit comments