This project implements a Retrieval-Augmented Generation (RAG) Scientific Chatbot that provides insightful and contextually relevant responses to queries based on scientific documents. The chatbot leverages a language model to retrieve and synthesize information from a pre-constructed vector database of scientific knowledge.
- Retrieval-Augmented Generation (RAG): Combines information from a set of relevant documents to answer questions.
- Experiment Tracking: Follow ongoing experiments and log updates.
- Frontend: User interface for interaction.
- Backend: Handles data retrieval and processing using RAG architecture.
- Database: Stores scientific documents and experiment data.
- Deployment: The Streamlit app is deployed on an Amazon EC2 instance for scalable access.
- Python 3.7 or higher
- Streamlit (for the frontend)
- Transformers (for the language model)
- FAISS (for efficient similarity search)
- EC2 Gpu based instance to deploy the app and interact with the chatbot
- Install Nvidia
- Clone the repository:
git clone https://github.com/your_username/rag-scientific-chatbot.git cd rag-scientific-chatbot - Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install the required packages:
pip install -r requirements.txt
- Set up the environment variables if necessary (specify any required environment variables).
-
Find the EC2 Instance’s Public DNS or IP Address and verify that the type is g4dn.xlarge
-
Connect to the EC2 Instance via SSH
ssh -i /path/to/your-key.pem ec2-user@your-public-dns
-
Accept the Security Warning : yes
-
You may need to set Permissions for the Key Pair as follows:
chmod 400 /path/to/your-key.pem
-
You need to install CUDA Toolkit to utilize GPU resources for PyTorch.
nvcc --version # Check if nvcc is installed sudo apt install nvidia-cuda-toolkit # Install CUDA Toolkit
-
After cloning the Project Repository inside the EC2 instance and installing the required packages you need to run the streamlit app file
streamlit run app.pyand acess the ngrok tunells links or you can sart the application with EC2 instance's public IP address -
Interact with the chatbot through the provided interface.
-
The retrieved documents:
If you'd like to contribute to this project, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/YourFeature) - Commit your Changes (
git commit -m 'Add some feature') - Push to the Branch (
git push origin feature/YourFeature) - Open a Pull Request
For further information or questions, feel free to reach out:
- Email: najmaelboutaheri@gmail.com




