CurioVeda - RAG-Based Chatbot for Efficient Analysis of Articles, News, Reports

Overview

This project is my final-year major project that implements a chatbot capable of efficiently answering user queries based on articles, reports, and news scraped from user-provided URLs. The chatbot offers multilingual support (Currently working on it) and provides graphical analysis(Currently working on it) of tabular data to facilitate better insights.

The primary objective of this project is to significantly improve article analysis by extracting critical insights quickly and accurately. The project demonstrates advanced capabilities in web scraping, natural language processing (NLP), and data visualization.

Key Features

Efficient Web Scraping:
- Scrapes content from user-provided URLs, including JavaScript-heavy websites, using Selenium.
- Extracts text from images in articles using OCR with Tesseract.
Data Preprocessing:
- Basic preprocessing of scraped data for optimal analysis.
- Recursive text splitting with LangChain for manageable chunk sizes.
Generative AI for Querying:
- Uses Google Generative AI embeddings to convert text into fixed-length vectors.
- Stores processed content in a FAISS Vector Store for efficient similarity searches.
- Employs LLaMA text generation model for accurate and context-relevant answers based on similarity search results.
Multilingual Support:
- Provides responses in multiple languages, enabling accessibility for diverse users.
Graphical Analysis:
- Analyzes tabular data and generates graphical visualizations to present insights in an intuitive format.

Thought Process Behind the Project

This project was designed with the following considerations:

Enable efficient and comprehensive data extraction from user-provided URLs, including:
- Dynamic content (JavaScript-heavy websites).
- Embedded image-based text using OCR techniques.
Enhance the quality of responses by applying advanced preprocessing techniques and leveraging LangChain for effective text chunking.
Utilize powerful AI models (e.g., Google Generative AI embeddings and LLaMA) for robust similarity search and context-aware responses.
Empower users with multilingual interactions and graphical insights for tabular data, making the chatbot a versatile tool for analysis.

Tech Stack

Languages and Frameworks:

Python
Streamlit (for hosting and front-end interface)

Libraries:

Web Scraping: Selenium, BeautifulSoup
OCR: Tesseract
NLP: LangChain, LLaMA, FAISS Vector Store, Google Generative AI
Data Visualization: Matplotlib, Seaborn, Plotly

Tools:

GitHub (Version Control)
Streamlit Cloud (Hosting)

Setup Instructions

Clone the Repository:

git clone https://github.com/SMPY2002/CurioVeda---Powered-by-AI.git
cd CurioVeda---Powered-by-AI

Install Dependencies:
```
pip install -r requirements.txt
```
Run the Application:
```
streamlit run app.py
```
Preview
Usage:
- Input the URLs containing articles/reports/news.
- Query the chatbot in your preferred language.
- View graphical insights for any tabular data provided.

Future Scope

Add support for more advanced AI models and embedding techniques.
Expand multilingual capabilities to include more languages.
Integrate real-time streaming data analysis.
Enhance graphical analysis features to include predictive insights.
Optimize the backend for faster query responses and lower resource usage.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute this project as per the license terms.

Contact

For any queries or suggestions, please reach out via:

Email: smpy1405@gmail.com
LinkedIn: Shivam Pandey

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.devcontainer		.devcontainer
static		static
LICENSE		LICENSE
README.md		README.md
app.py		app.py
graph_ai.py		graph_ai.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CurioVeda - RAG-Based Chatbot for Efficient Analysis of Articles, News, Reports

Overview

Key Features

Thought Process Behind the Project

Tech Stack

Languages and Frameworks:

Libraries:

Tools:

Setup Instructions

Future Scope

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CurioVeda - RAG-Based Chatbot for Efficient Analysis of Articles, News, Reports

Overview

Key Features

Thought Process Behind the Project

Tech Stack

Languages and Frameworks:

Libraries:

Tools:

Setup Instructions

Future Scope

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages