Query PDFs Β· Spreadsheets Β· Images β all in one intelligent pipeline
PolyRAG is a modular, multi-agent Retrieval-Augmented Generation (RAG) system that answers natural language queries by intelligently retrieving information from heterogeneous data sources β including PDF documents, Word files, Excel spreadsheets, CSVs, and images.
At its core, a Coordinator Agent receives the user query, determines which data modality is most relevant, and routes the query to the appropriate specialized sub-agent. Retrieved context is then passed to an Aggregator that synthesizes a grounded, context-aware response using either Groq (LLaMA) or Google Gemini.
The system includes a full evaluation framework with retrieval, generation, and system-level metrics β giving it the rigor of a research-grade implementation.
π Live Demo: polyrag.streamlit.app
- Coordinator Agent β classifies the query and routes it to the right sub-agent
- Document Agent β handles PDFs,
.txt, and.docxfiles using PyMuPDF and python-docx - Excel Agent β processes
.xlsxand.csvfiles with pandas and openpyxl - Image Agent β extracts text from images via Tesseract OCR and handles visual queries using Groq's vision model (LLaMA 4 Scout)
- Aggregator β synthesizes context from one or multiple agents into a final answer
- Semantic chunking with configurable
CHUNK_SIZEandCHUNK_OVERLAP - Dense embeddings via
all-MiniLM-L6-v2(Sentence Transformers) - Vector storage and similarity search with ChromaDB β separate collections per modality
- Top-K context retrieval with distance scoring
- Groq (LLaMA 3.3 70B) β default fast inference for text queries
- Groq Vision (LLaMA 4 Scout) β multimodal understanding for image inputs
- Google Gemini 2.0 Flash β fallback and alternative LLM
- Ollama β local model support for offline/private deployments
- Persistent conversational context across turns (up to 10 messages)
- Enables follow-up questions and coherent multi-turn dialogue
Built-in benchmarking with evaluate.py β runs a test bench of Q&A pairs through the full pipeline and reports:
| Metric | Description |
|---|---|
| Context Precision | Fraction of retrieved chunks from the expected source |
| Answer Similarity | Cosine similarity between generated and expected answer embeddings |
| Faithfulness | Whether the answer is grounded in retrieved documents |
| Routing Accuracy | Whether the coordinator selected the correct agent |
| Retrieval Latency | Time to retrieve relevant chunks (ms) |
| Generation Latency | LLM response time (ms) |
| Time to First Token (TTFT) | Streaming responsiveness (ms) |
| Tokens/sec | Approximate generation throughput |
| End-to-End Latency | Total retrieval + generation time (ms) |
Evaluated across 10 test cases (1 runs each) β routing accuracy: 1.0 Β· answer similarity: 0.883 Β· avg E2E latency: 0.992s π View Full Evaluation Report β
| Layer | Technology |
|---|---|
| Language | Python 3.10+ |
| LLM Orchestration | LangChain 0.2 |
| LLMs | Groq (LLaMA 3.3 70B, LLaMA 4 Scout), Google Gemini 2.0 Flash, Ollama |
| Embeddings | Sentence Transformers (all-MiniLM-L6-v2) |
| Vector Store | ChromaDB |
| Document Parsing | PyMuPDF (PDF), python-docx (DOCX) |
| Spreadsheet Parsing | pandas, openpyxl |
| Image / OCR | Pillow, Tesseract OCR (pytesseract) |
| Frontend / UI | Streamlit |
| Config & Secrets | python-dotenv |
PolyRAG/
βββ agents/
β βββ coordinator.py # Query classification & agent routing
β βββ document_agent.py # PDF, TXT, DOCX ingestion & retrieval
β βββ excel_agent.py # XLSX, CSV ingestion & retrieval
β βββ image_agent.py # Image OCR & vision-based retrieval
β βββ aggregator.py # Multi-source synthesis & LLM generation
βββ core/
β βββ vector_store.py # ChromaDB client & collection management
β βββ embeddings.py # Sentence Transformer embedding wrapper
β βββ memory.py # Conversational memory buffer
βββ data/
β βββ eval_samples/ # Sample files for evaluation test cases
βββ results/ # Evaluation reports (JSON + Markdown)
βββ app.py # Streamlit application entry point
βββ config.py # Centralized config (models, paths, chunking)
βββ evaluate.py # Evaluation benchmarking engine
βββ test_bench.json # Q&A test cases for evaluation
βββ requirements.txt
βββ packages.txt # System-level dependencies (Tesseract)
- Python 3.10+
- Tesseract OCR installed on your system
- API keys for Groq and/or Google Gemini
Install Tesseract:
# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
# Windows β download from: https://github.com/UB-Mannheim/tesseract/wikigit clone https://github.com/sidharth-vijayan/PolyRAG.git
cd PolyRAGpip install -r requirements.txtCreate a .env file in the project root:
GROQ_API_KEY=your_groq_api_key
GEMINI_API_KEY=your_gemini_api_keyOn Streamlit Cloud, add these as Secrets in the dashboard instead.
streamlit run app.pyOpen http://localhost:8501 in your browser.
Run the full evaluation bench against the provided test cases:
python evaluate.py --bench test_bench.json --output results/ --runs 3Arguments:
| Flag | Default | Description |
|---|---|---|
--bench |
test_bench.json |
Path to the Q&A test bench JSON |
--output |
results/ |
Directory to save reports |
--runs |
3 |
Number of runs per test case (for mean Β± std) |
Reports are saved as results/eval_report.json and results/eval_report.md.
All key settings are in config.py:
# LLM Models
GROQ_MODEL = "llama-3.3-70b-versatile"
GROQ_VISION_MODEL = "meta-llama/llama-4-scout-17b-16e-instruct"
GEMINI_MODEL = "gemini-2.0-flash"
# Embeddings
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
# Chunking
CHUNK_SIZE = 500
CHUNK_OVERLAP = 50
# Retrieval
TOP_K_RESULTS = 4
# Memory
MEMORY_MAX_MESSAGES = 10User Query
β
βΌ
βββββββββββββββββββββββ
β Coordinator Agent β β Classifies query β selects agent(s)
βββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
Document Excel Image
Agent Agent Agent
β β β
ββββββ¬βββββββββββββββ
βΌ
ChromaDB Vector Store
(Semantic similarity search)
β
βΌ
βββββββββββββββ
β Aggregator β β Synthesizes context β calls LLM
βββββββββββββββ
β
βΌ
Final Answer (streamed)
Contributions are welcome! To get started:
- Fork the repository
- Create a branch:
git checkout -b feature/your-feature-name - Commit your changes:
git commit -m 'feat: add your feature' - Push to the branch:
git push origin feature/your-feature-name - Open a Pull Request
Please follow Conventional Commits for commit messages.
This project is licensed under the MIT License.
Sidharth Vijayan
B.Tech CSE (AI & DS) | MIT World Peace University
- GitHub: @sidharth-vijayan
- Live Demo: polyrag.streamlit.app
β If you found this project useful, consider giving it a star!