Skip to content
View justmetro's full-sized avatar

Block or report justmetro

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
justmetro/README.md

๐Ÿ‘‹ Pedro Rocha de Oliveira

Statistics Student | Data Analysis | Predictive Modeling | Generative AI & LLMs

I'm a Statistics undergraduate student at ENCE/IBGE, focused on Data Analysis, Exploratory Data Analysis, Predictive Modeling, Data Visualization, Machine Learning, and Generative AI applications.

I build projects using Python, R, SQL, FastAPI, Streamlit, LangChain, ChromaDB, LLMs, RAG pipelines, automated tests, and CI/CD workflows, combining statistical thinking with practical software development.


๐Ÿš€ Featured Projects

REST API for income prediction and statistical analysis using socioeconomic data, machine learning concepts, automated validation, tests, and continuous integration.

Main features:

  • API built with FastAPI
  • Predictive modeling workflow with Python
  • Input validation with schemas
  • Automated tests with Pytest
  • CI/CD pipeline with GitHub Actions
  • Organized backend structure with routes, schemas, security and prediction logic
  • Practical project focused on data, APIs and model deployment concepts

Semantic search application for public IBGE datasets using RAG, Gemini API, ChromaDB, Streamlit, local embeddings, metadata filtering, query expansion, and public deployment.

Main features:

  • Semantic search over IBGE XLS/CSV tables
  • Local embeddings and vector storage with ChromaDB
  • Gemini API integration with fallback demo mode
  • Dark mode interface with Streamlit
  • Query expansion to improve retrieval
  • Metadata filtering for indicators and coefficients of variation
  • Retriever evaluation with Precision@k, Recall proxy, MRR and NDCG
  • Automated tests with Pytest
  • CI/CD with GitHub Actions

๐Ÿ“š Academic & Statistical Projects

๐Ÿ“Š Technical Report: Drug Experimentation โ€” ENCE/IBGE

Statistical analysis project focused on social data, correlations, data visualization, and reproducible technical documentation using R, R Markdown, and Quarto.


๐Ÿ›๏ธ Academic Projects โ€” Statistics in Public Policy

Academic studies applying statistics to public policy problems, social indicators, data visualization, and IBGE microdata analysis.


๐Ÿ Python Scripts and Automations

Personal scripts and small applications built with Python, Pandas, NumPy, Matplotlib, Object-Oriented Programming, and automation workflows.


๐Ÿง  Tech Stack

Data Analysis & Statistics

Python R SQL Pandas NumPy Matplotlib Scikit--learn

APIs, Machine Learning & Back-End

FastAPI Pytest Machine Learning Predictive Modeling JavaScript

Generative AI & LLMs

LangChain RAG ChromaDB Gemini API Prompt Engineering

Tools & Development

Git GitHub GitHub Actions Streamlit Excel


๐ŸŽฏ Currently Improving

  • Advanced statistical modeling
  • Machine learning with Python and R
  • API development and backend architecture
  • RAG pipelines and LLM applications
  • Data engineering fundamentals
  • Testing, documentation and CI/CD practices

๐Ÿ“š Education

B.Sc. in Statistics
ENCE/IBGE โ€” Escola Nacional de Ciรชncias Estatรญsticas
2023 โ€“ 2028 expected

Production Engineering
UNESA
2022 โ€“ 2023


๐Ÿ“œ Courses & Certificates

  • Back-End Programmer โ€” SENAI
  • Object-Oriented Programming with Python โ€” ENEP
  • Introduction to Data Science โ€” ENEP
  • Concepts and Applications of the Demographic Census in Public Policy โ€” IBGE
  • Plain Language Fundamentals โ€” ENEP

๐ŸŒ Contact Me


Turning data into insight, and insight into useful solutions.

Pinned Loading

  1. ibge-rag-chatbot ibge-rag-chatbot Public

    Aplicaรงรฃo RAG com Python, Streamlit, ChromaDB e embeddings locais para explorar indicadores sociais brasileiros a partir de tabelas pรบblicas do IBGE.

    Python

  2. rendimento-predictor-api rendimento-predictor-api Public

    API + frontend para previsรฃo de rendimento/hora com FastAPI, Streamlit e XGBoost treinado com microdados reais da PNAD Contรญnua 2023.

    Python