Embedding Model RAG Benchmark

This repository provides a framework for benchmarking different embedding models in a Retrieval-Augmented Generation (RAG) system using ChromaDB and Maximal Marginal Relevance (MMR).

Models Being Benchmarked

The system benchmarks the following open-source embedding models:

all-MiniLM-L6-v2 - A lightweight sentence transformer model
all-mpnet-base-v2 - A more powerful sentence transformer model
E5-large - A state-of-the-art embedding model from Microsoft
Instructor-Large - An instruction-tuned embedding model
Contriever - Facebook's retrieval-focused embedding model

Project Structure

embedding-model-rag-poc/
├── src/
│   ├── embedding_models/  # Embedding model implementations
│   ├── vector_db/         # ChromaDB integration with MMR
│   ├── data/              # Dataset handling
│   ├── evaluation/        # Benchmarking tools
│   └── rag/               # RAG system implementation
├── run_benchmark.py       # Main script to run benchmarks
├── create_rag_app.py      # Create a RAG application with the best model
├── interactive_rag_demo.py # Interactive demo for the RAG system
└── requirements.txt       # Project dependencies

Getting Started

Installation

Clone the repository:

git clone https://github.com/yourusername/embedding-model-rag-poc.git
cd embedding-model-rag-poc

Install dependencies:

pip install -r requirements.txt

Set up environment variables (for LLM API keys):

# Create a .env file with your API keys
echo "OPENAI_API_KEY=your_openai_key" > .env

Running the Benchmark

To run the benchmark with default settings:

python run_benchmark.py --create-sample

This will:

Create a sample dataset
Initialize each embedding model
Create vector stores in ChromaDB
Run retrieval tests with different MMR settings
Generate benchmark results

Creating a RAG Application

After benchmarking, create a RAG application using the best model:

python create_rag_app.py --from-benchmark ./benchmark_results --documents-dir ./your_documents

Interactive Demo

Test your RAG system interactively:

python interactive_rag_demo.py

Customization Options

Benchmark Options

--dataset-name: Use a HuggingFace dataset
--dataset-path: Use a local dataset
--models: Specify which models to benchmark
--k-values: List of k values to test for retrieval
--lambda-values: List of lambda values to test for MMR

RAG Application Options

--embedding-model: Manually specify an embedding model
--llm-provider: Choose LLM provider (openai, anthropic, etc.)
--llm-model: Specify LLM model name
--mmr-lambda: Set MMR diversity parameter

Evaluation Metrics

The benchmark evaluates models on:

NDCG@k: Normalized Discounted Cumulative Gain
Precision@k: Precision at k retrieved documents
Recall@k: Recall at k retrieved documents
Diversity: Diversity of retrieved documents (for MMR)
Retrieval Time: Time taken to retrieve documents

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embedding Model RAG Benchmark

Models Being Benchmarked

Project Structure

Getting Started

Installation

Running the Benchmark

Creating a RAG Application

Interactive Demo

Customization Options

Benchmark Options

RAG Application Options

Evaluation Metrics

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmark_results		benchmark_results
chroma_db		chroma_db
data		data
results/complex_benchmark		results/complex_benchmark
src		src
README.md		README.md
create_rag_app.py		create_rag_app.py
generate_complex_dataset.py		generate_complex_dataset.py
interactive_rag_demo.py		interactive_rag_demo.py
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py

Folders and files

Latest commit

History

Repository files navigation

Embedding Model RAG Benchmark

Models Being Benchmarked

Project Structure

Getting Started

Installation

Running the Benchmark

Creating a RAG Application

Interactive Demo

Customization Options

Benchmark Options

RAG Application Options

Evaluation Metrics

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages