An offline Retrieval-Augmented Generation (RAG) app for developer documentation. It runs locally with Foundry Local, SQLite, and JavaScript, so it can answer questions from your own docs without cloud APIs or outbound network calls.
- Answers questions from local markdown documents
- Retrieves the most relevant chunks before generation
- Streams responses in the browser
- Lets you upload new
.mdor.txtfiles at runtime - Works fully offline after the model is available locally
npm run ingestreads the markdown files indocs/- Each document is split into chunks and stored in
data/rag.db - When a question is asked, the app searches for matching chunks
- The local model uses that context to generate the answer
- Node.js 20 or newer
- Foundry Local
- The
phi-3.5-minimodel, which is downloaded on first run if needed - First run may take several minutes while the model downloads and loads
git clone https://github.com/leestott/local-rag.git
cd local-rag
npm install
npm run ingest
npm startOpen http://127.0.0.1:3000 in your browser.
If the app starts slowly the first time, that is usually the model download. Later runs are much faster because the model is cached locally.
local-rag/
├── docs/ # Source documents used by the RAG pipeline
├── public/ # Single-file web UI
├── src/ # Server, ingestion, vector store, and chat engine
├── data/ # SQLite database created at runtime
├── test/ # Node.js test files
└── package.json
| Script | Command | Description |
|---|---|---|
| Ingest | npm run ingest |
Chunk and index all docs into SQLite |
| Start | npm start |
Start the server |
| Dev | npm run dev |
Start with file watching |
| Test | npm test |
Run the test suite |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/chat |
Non-streaming chat response |
POST |
/api/chat/stream |
Streaming chat response via SSE |
POST |
/api/upload |
Upload and index a document |
GET |
/api/docs |
List indexed documents |
GET |
/api/health |
Check model and database status |
MIT




