A research-oriented Retrieval-Augmented Generation (RAG) system comparing three retrieval approaches: Vector (FAISS + sentence-transformers), Keyword (BM25), and Hybrid (FAISS + BM25 + Reciprocal Rank Fusion).
User → Frontend (React/TypeScript/Vite)
↓
Backend (FastAPI)
↓
┌────────┬──────────┬────────┐
│ Vector │ Keyword │ Hybrid │
|(FAISS) │ (BM25) │ (RRF) │
└────────┴──────────┴────────┘
↓
LLM Generation (Groq)
↓
Answer + Citations
capstone_RAG/
├── frontend/ # React + TypeScript UI (Patricia)
├── backend/ # FastAPI orchestration + ingestion (Khalid)
├── vector_retrieval/ # FAISS semantic retrieval (Collins)
├── keyword_retrieval/ # BM25 keyword retrieval (Olivier)
├── hybrid_retrieval/ # FAISS + BM25 + RRF hybrid (Nathan)
└── shared_data/ # Schemas, contracts, evaluation
├── schemas/ # JSON schemas (chunk, request, response, answer)
└── api_contracts/ # Integration contracts
- Python 3.11+
- Node.js 18+
- PostgreSQL running locally
- Groq API key (free at https://console.groq.com)
Backend — create backend/.env:
POSTGRE_URL= postgresql://<user:password>@localhost:5432/<your_local_rag_dbname>
JWT_SECRET= <run: openssl rand -hex 32>
JWT_ALGORITHM= HS256
JWT_EXPIRE_MINUTES= 60
STORAGE_PATH= ./storage/documents
CHROMA_PERSIST_DIR= ./chroma_storage
LLM_BACKEND= groq
GROQ_API_KEY= your_groq_api_key_here
RAG_PROJECT_ROOT= /absolute/path/to/capstone_RAGFrontend — frontend/.env (already in repo):
VITE_API_URL=http://localhost:8000# Backend
cd backend
pip install -r requirements.txt
# Frontend
cd frontend
npm install ( for first time frontend setup )
# Retrieval modules (install each)
cd vector_retrieval && pip install -r requirements.txt && cd ..
cd keyword_retrieval && pip install -r requirements.txt && cd ..
cd hybrid_retrieval && pip install -r requirements.txt && cd ..Open three terminals:
Terminal 1 — Backend:
cd backend
uvicorn app.main:app --reload --port 8000Terminal 2 — Frontend:
cd frontend
npm run devTerminal 3 — (optional) watch logs
- Open http://localhost:5173
- Login:
admin@admin.com/admin1234 - Click the Controls button (hamburger) to open the upload panel
- Upload a PDF, DOCX, TXT, or MD file
- Select a retrieval method in the chat input: Vector, Keyword, or Hybrid
- Ask a question
All three retrieval modules implement the same interface. See shared_data/api_contracts/.
Retrieval request:
{ "query": "...", "top_k": 5, "method": "vector|keyword|hybrid" }Retrieval response:
{ "query": "...", "method": "...", "results": [...], "latency_ms": 42.9 }Each module exposes:
ingest(file_paths, chunk_size, chunk_overlap) → dictretrieve(query, top_k) → dict
| Name | Role | Module |
|---|---|---|
| Patricia | Frontend | frontend/ |
| Khalid | Backend | backend/ |
| Collins | Vector Retrieval | vector_retrieval/ |
| Olivier | Keyword Retrieval | keyword_retrieval/ |
| Nathan | Hybrid Retrieval | hybrid_retrieval/ |
Supervisor: Dr. Fuat Uyguroğlu — ENGI401, Cyprus International University