Skip to content

sidharth-vijayan/PolyRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 PolyRAG

Multi-Agent Multimodal Retrieval-Augmented Generation System

Query PDFs Β· Spreadsheets Β· Images β€” all in one intelligent pipeline

Python LangChain ChromaDB Streamlit Groq Gemini

Live Demo License: MIT


πŸ“Œ Overview

PolyRAG is a modular, multi-agent Retrieval-Augmented Generation (RAG) system that answers natural language queries by intelligently retrieving information from heterogeneous data sources β€” including PDF documents, Word files, Excel spreadsheets, CSVs, and images.

At its core, a Coordinator Agent receives the user query, determines which data modality is most relevant, and routes the query to the appropriate specialized sub-agent. Retrieved context is then passed to an Aggregator that synthesizes a grounded, context-aware response using either Groq (LLaMA) or Google Gemini.

The system includes a full evaluation framework with retrieval, generation, and system-level metrics β€” giving it the rigor of a research-grade implementation.

πŸ”— Live Demo: polyrag.streamlit.app


✨ Features

πŸ€– Multi-Agent Architecture

  • Coordinator Agent β€” classifies the query and routes it to the right sub-agent
  • Document Agent β€” handles PDFs, .txt, and .docx files using PyMuPDF and python-docx
  • Excel Agent β€” processes .xlsx and .csv files with pandas and openpyxl
  • Image Agent β€” extracts text from images via Tesseract OCR and handles visual queries using Groq's vision model (LLaMA 4 Scout)
  • Aggregator β€” synthesizes context from one or multiple agents into a final answer

πŸ” Advanced Retrieval Pipeline

  • Semantic chunking with configurable CHUNK_SIZE and CHUNK_OVERLAP
  • Dense embeddings via all-MiniLM-L6-v2 (Sentence Transformers)
  • Vector storage and similarity search with ChromaDB β€” separate collections per modality
  • Top-K context retrieval with distance scoring

🧠 Multi-LLM Support

  • Groq (LLaMA 3.3 70B) β€” default fast inference for text queries
  • Groq Vision (LLaMA 4 Scout) β€” multimodal understanding for image inputs
  • Google Gemini 2.0 Flash β€” fallback and alternative LLM
  • Ollama β€” local model support for offline/private deployments

πŸ’Ύ Conversation Memory

  • Persistent conversational context across turns (up to 10 messages)
  • Enables follow-up questions and coherent multi-turn dialogue

πŸ§ͺ Evaluation Framework

Built-in benchmarking with evaluate.py β€” runs a test bench of Q&A pairs through the full pipeline and reports:

Metric Description
Context Precision Fraction of retrieved chunks from the expected source
Answer Similarity Cosine similarity between generated and expected answer embeddings
Faithfulness Whether the answer is grounded in retrieved documents
Routing Accuracy Whether the coordinator selected the correct agent
Retrieval Latency Time to retrieve relevant chunks (ms)
Generation Latency LLM response time (ms)
Time to First Token (TTFT) Streaming responsiveness (ms)
Tokens/sec Approximate generation throughput
End-to-End Latency Total retrieval + generation time (ms)

Evaluated across 10 test cases (1 runs each) β€” routing accuracy: 1.0 Β· answer similarity: 0.883 Β· avg E2E latency: 0.992s πŸ“Š View Full Evaluation Report β†’


πŸ› οΈ Tech Stack

Layer Technology
Language Python 3.10+
LLM Orchestration LangChain 0.2
LLMs Groq (LLaMA 3.3 70B, LLaMA 4 Scout), Google Gemini 2.0 Flash, Ollama
Embeddings Sentence Transformers (all-MiniLM-L6-v2)
Vector Store ChromaDB
Document Parsing PyMuPDF (PDF), python-docx (DOCX)
Spreadsheet Parsing pandas, openpyxl
Image / OCR Pillow, Tesseract OCR (pytesseract)
Frontend / UI Streamlit
Config & Secrets python-dotenv

πŸ“ Project Structure

PolyRAG/
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ coordinator.py       # Query classification & agent routing
β”‚   β”œβ”€β”€ document_agent.py    # PDF, TXT, DOCX ingestion & retrieval
β”‚   β”œβ”€β”€ excel_agent.py       # XLSX, CSV ingestion & retrieval
β”‚   β”œβ”€β”€ image_agent.py       # Image OCR & vision-based retrieval
β”‚   └── aggregator.py        # Multi-source synthesis & LLM generation
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ vector_store.py      # ChromaDB client & collection management
β”‚   β”œβ”€β”€ embeddings.py        # Sentence Transformer embedding wrapper
β”‚   └── memory.py            # Conversational memory buffer
β”œβ”€β”€ data/
β”‚   └── eval_samples/        # Sample files for evaluation test cases
β”œβ”€β”€ results/                 # Evaluation reports (JSON + Markdown)
β”œβ”€β”€ app.py                   # Streamlit application entry point
β”œβ”€β”€ config.py                # Centralized config (models, paths, chunking)
β”œβ”€β”€ evaluate.py              # Evaluation benchmarking engine
β”œβ”€β”€ test_bench.json          # Q&A test cases for evaluation
β”œβ”€β”€ requirements.txt
└── packages.txt             # System-level dependencies (Tesseract)

πŸš€ Getting Started

Prerequisites

  • Python 3.10+
  • Tesseract OCR installed on your system
  • API keys for Groq and/or Google Gemini

Install Tesseract:

# Ubuntu/Debian
sudo apt-get install tesseract-ocr

# macOS
brew install tesseract

# Windows β€” download from: https://github.com/UB-Mannheim/tesseract/wiki

1. Clone the Repository

git clone https://github.com/sidharth-vijayan/PolyRAG.git
cd PolyRAG

2. Install Python Dependencies

pip install -r requirements.txt

3. Set Up Environment Variables

Create a .env file in the project root:

GROQ_API_KEY=your_groq_api_key
GEMINI_API_KEY=your_gemini_api_key

On Streamlit Cloud, add these as Secrets in the dashboard instead.

4. Run the Application

streamlit run app.py

Open http://localhost:8501 in your browser.


πŸ§ͺ Running Evaluations

Run the full evaluation bench against the provided test cases:

python evaluate.py --bench test_bench.json --output results/ --runs 3

Arguments:

Flag Default Description
--bench test_bench.json Path to the Q&A test bench JSON
--output results/ Directory to save reports
--runs 3 Number of runs per test case (for mean Β± std)

Reports are saved as results/eval_report.json and results/eval_report.md.


βš™οΈ Configuration

All key settings are in config.py:

# LLM Models
GROQ_MODEL = "llama-3.3-70b-versatile"
GROQ_VISION_MODEL = "meta-llama/llama-4-scout-17b-16e-instruct"
GEMINI_MODEL = "gemini-2.0-flash"

# Embeddings
EMBEDDING_MODEL = "all-MiniLM-L6-v2"

# Chunking
CHUNK_SIZE = 500
CHUNK_OVERLAP = 50

# Retrieval
TOP_K_RESULTS = 4

# Memory
MEMORY_MAX_MESSAGES = 10

πŸ—ΊοΈ How It Works

User Query
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Coordinator Agent β”‚  ← Classifies query β†’ selects agent(s)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚         β”‚         β”‚
    β–Ό         β–Ό         β–Ό
Document    Excel     Image
 Agent      Agent     Agent
    β”‚         β”‚         β”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–Ό
  ChromaDB Vector Store
  (Semantic similarity search)
         β”‚
         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Aggregator β”‚  ← Synthesizes context β†’ calls LLM
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
  Final Answer (streamed)

🀝 Contributing

Contributions are welcome! To get started:

  1. Fork the repository
  2. Create a branch: git checkout -b feature/your-feature-name
  3. Commit your changes: git commit -m 'feat: add your feature'
  4. Push to the branch: git push origin feature/your-feature-name
  5. Open a Pull Request

Please follow Conventional Commits for commit messages.


πŸ“„ License

This project is licensed under the MIT License.


πŸ‘€ Author

Sidharth Vijayan
B.Tech CSE (AI & DS) | MIT World Peace University


⭐ If you found this project useful, consider giving it a star!

About

A Multi-Agent Multimodal Retrieval-Augmented Generation (RAG) system that can answer user queries by retrieving information from multiple data sources such as PDF/text documents, Excel spreadsheets, and images. Functions using a Coordinator Agent which receives the user query and routes it to specialized sub agents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages