This repository provides a framework for benchmarking different embedding models in a Retrieval-Augmented Generation (RAG) system using ChromaDB and Maximal Marginal Relevance (MMR).
The system benchmarks the following open-source embedding models:
- all-MiniLM-L6-v2 - A lightweight sentence transformer model
- all-mpnet-base-v2 - A more powerful sentence transformer model
- E5-large - A state-of-the-art embedding model from Microsoft
- Instructor-Large - An instruction-tuned embedding model
- Contriever - Facebook's retrieval-focused embedding model
embedding-model-rag-poc/
├── src/
│ ├── embedding_models/ # Embedding model implementations
│ ├── vector_db/ # ChromaDB integration with MMR
│ ├── data/ # Dataset handling
│ ├── evaluation/ # Benchmarking tools
│ └── rag/ # RAG system implementation
├── run_benchmark.py # Main script to run benchmarks
├── create_rag_app.py # Create a RAG application with the best model
├── interactive_rag_demo.py # Interactive demo for the RAG system
└── requirements.txt # Project dependencies
- Clone the repository:
git clone https://github.com/yourusername/embedding-model-rag-poc.git
cd embedding-model-rag-poc- Install dependencies:
pip install -r requirements.txt- Set up environment variables (for LLM API keys):
# Create a .env file with your API keys
echo "OPENAI_API_KEY=your_openai_key" > .envTo run the benchmark with default settings:
python run_benchmark.py --create-sampleThis will:
- Create a sample dataset
- Initialize each embedding model
- Create vector stores in ChromaDB
- Run retrieval tests with different MMR settings
- Generate benchmark results
After benchmarking, create a RAG application using the best model:
python create_rag_app.py --from-benchmark ./benchmark_results --documents-dir ./your_documentsTest your RAG system interactively:
python interactive_rag_demo.py--dataset-name: Use a HuggingFace dataset--dataset-path: Use a local dataset--models: Specify which models to benchmark--k-values: List of k values to test for retrieval--lambda-values: List of lambda values to test for MMR
--embedding-model: Manually specify an embedding model--llm-provider: Choose LLM provider (openai, anthropic, etc.)--llm-model: Specify LLM model name--mmr-lambda: Set MMR diversity parameter
The benchmark evaluates models on:
- NDCG@k: Normalized Discounted Cumulative Gain
- Precision@k: Precision at k retrieved documents
- Recall@k: Recall at k retrieved documents
- Diversity: Diversity of retrieved documents (for MMR)
- Retrieval Time: Time taken to retrieve documents