Skip to content

willyoung21/Happiness-Countries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Happiness Prediction Project

This project focuses on predicting the happiness score of different countries using Machine Learning techniques, exploratory data analysis (EDA), feature selection, real-time data streaming with Kafka, and storage in PostgreSQL.


Table of Contents

  1. Project Description
  2. Prerequisites
  3. Project Structure
  4. Environment Setup
  5. Data Preparation
  6. Model Training
  7. Kafka and PostgreSQL Implementation
  8. Kafka Producer and Consumer Execution
  9. [Evidence of Results
  10. Conclusion

1. Project Description

This project seeks to predict the happiness index of different countries based on characteristics such as GDP per capita, healthy life expectancy, personal freedom, among other factors. To do this:

  • We process happiness data from different years.
  • We apply feature selection techniques.
  • We train a regression model.
  • We use Apache Kafka to transmit the predictions in real time.
  • We store the results in PostgreSQL.

2. Prerequisites

  • Python 3.8+
  • Docker Desktop for container deployment.
  • Kafka and PostgreSQL.
  • Jupyter Notebook for EDA and model training.
  • Install the following Python packages:
pip install pandas scikit-learn sqlalchemy kafka-python python-dotenv

All dependencies used are in the requirements.txt file.

3. Project Structure

The structure of this project is as follows:

📁 data/

Contains the data files at different stages of processing.

  • raw/: Original data for each year.
  • processed/: Processed data.
  • clean/: Cleaned data ready for analysis.

🤖 models/

Contains the files related to the trained prediction model.

  • final_happiness_model.pkl: Trained prediction model that predicts the happiness score.

💻 src/

Contains the source code for data processing and streaming.

  • kafka_producer.py: Kafka producer to send predictions through Kafka.
  • kafka_consumer.py: Kafka consumer to store predictions in PostgreSQL.

📓 notebooks/

Contains the Jupyter notebooks used for model analysis and training.

  • eda.ipynb: Exploratory analysis of data from 2015 to 2019.
  • model_training.ipynb: Training of the regression model to predict the happiness score.

🔧 Configuration files and dependencies

  • README.md: This file contains the project documentation.
  • docker-compose.yml: Configuration of the services for the Kafka broker and ZooKeeper.
  • requirements.txt: Dependencies required to run the project (includes libraries such as pandas, scikit-learn, kafka-python, among others).

4. create virtual env

python -m venv venv

activate with :

source venv/scripts/activate

5. Environment Setup

Create a .env file in the project root for the PostgreSQL database credentials:

DB_HOST=localhost DB_NAME=Happiness DB_USER=postgres DB_PASSWORD=root DATABASE_URL=postgresql://user:password@localhost/database_name

6. Data Preparation

The original data is in data/raw. We clean and standardize country names, remove null values.

We open the following notebooks in order and run the cells to perform the cleaning process and save the clean csv files:

notebooks/EDA_2015.ipynb notebooks/EDA_2016.ipynb notebooks/EDA_2017.ipynb notebooks/EDA_2018.ipynb notebooks/EDA_2019.ipynb

After running them, you should have the clean files in the data/clean folder. Now we run the following notebook to add the region column to the 2017, 2018, and 2019 datasets to later concatenate the data:

notebooks/merge.ipynb

7. Model Training

Open the notebook notebooks/model_training.ipynb and run the cells to:

Perform feature selection. Train a regression model to predict the happiness score. Evaluate and save the model with a satisfactory R² (at least 0.80). Save the trained model to models/final_happiness_model.pkl.

8. Kafka Setup with Docker

Running the Container

We need to have the Docker desktop application open on our computer and run the command in a git bash in our project

docker-compose up -d

This code will start the container, we can verify it with:

docker ps

Now we will run the following command to create a topic called happiness_predictions:

docker exec -it happiness-countries-kafka-1 kafka-topics.sh \
--create \
--topic happiness_predictions \
--bootstrap-server localhost:9092 \
--partitions 1 \
--replication-factor 1

Now we make sure that the topic was created with this command:

docker exec -it happiness-countries-kafka-1 kafka-topics.sh \
--list \
--bootstrap-server localhost:9092

9. Running the Kafka Producer and Consumer

We run the producer script to send predictions to the happiness_predictions topic:

python src/kafka_producer.py

In another terminal, we run the consumer to read the messages and save them in PostgreSQL:

python src/kafka_consumer.py

10. Evidence of Results

Verify the predictions in PostgreSQL:

SELECT * FROM happiness_predictions;

11. Conclusion

This project provides a complete solution to predict and store happiness scores at a global level, integrating Machine Learning and real-time streaming systems. The architecture built is scalable and allows continuous analysis based on updated happiness data.

Authors

William Alejandro Botero Florez

This README.md has detailed and well-structured instructions that make it easy to navigate and execute the project, covering everything from prerequisites to final implementation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors