ps: this is v2. I decided to better document the old process and add new perspectives, hence the new repository. I'm always using my GitHub as a personal document for advancing technology in the data universe.
In this repository, I documented an analytical engineering framework designed for public sector financial data. The architecture uses SQL and dbt (Data Build Tool) to transform raw financial records into a dimensional model. Working with data engineering, I developed and applied this methodology to monitor public spending, detect financial anomalies, and optimize processes in the Budget Planning Management of the State Government of São Paulo.
During the execution of this project, the main processing was performed in an Oracle database environment. I used advanced SQL filtering techniques to process the data and subsequently translated the selected datasets into XML format, necessary for creating data spreadsheets and performing the final extractions.
The project follows modern analytical engineering practices, using a multi-layered approach to ensure data reliability and financial compliance:
- Staging Layer: Extracts raw data from the data warehouse, converting data types and standardizing column naming conventions.
- Marts Layer: Aggregates financial transactions into fact tables, creating the semantic layer consumed by subsequent Business Intelligence platforms.
- Automated Testing: Implements primary key uniqueness and non-nullability constraints to ensure absolute fidelity of public metrics.
- XML Data Extraction: Utilizes Oracle XML DB functionalities to generate well-formed XML structures directly from relational tables for later consumption.
sp-gov-budget-analytics/
├── models/
│ ├── staging/
│ │ ├── schema.yml
│ │ └── stg_budget_transactions.sql
│ └── marts/
│ └── fct_budget_execution.sql
├── scripts/
│ ├── xml_extraction.sql
│ └── advanced_anomaly_detection.sql
├── src/
│ └── extract_metrics_to_excel.py
├── dbt_project.yml
├── requirements.txt
├── .gitignore
└── README.md
- Python 3.9+
- dbt 1.5+
- Oracle Database 19c+
- Oracle Python client (oracledb)
- Clone the repository:
git clone https://github.com/bellDataSc/sp-gov-budget-analytics.git
cd sp-gov-budget-analytics- Install dependencies:
pip install -r requirements.txt- Configure Oracle connection in
~/.dbt/profiles.yml:
sp_gov_oracle:
target: dev
outputs:
dev:
type: oracle
user: [your_user]
pass: [your_password]
database: [your_database]
schema: analytics
threads: 4- Set environment variables:
export ORACLE_USER=your_user
export ORACLE_PASSWORD=your_password
export ORACLE_DSN=your_dsn_string- Execute dbt models:
dbt run
dbt test- Extract metrics to Excel:
python src/extract_metrics_to_excel.pyOutput files will be generated in the output/ directory.
stg_budget_transactions: Staging layer with data type conversions and standardizationfct_budget_execution: Fact table aggregating monthly budget execution metrics
xml_extraction.sql: Native Oracle XML generation from dimensional tablesadvanced_anomaly_detection.sql: Window functions for anomaly detection with XML output
extract_metrics_to_excel.py: Modular extraction and Excel export functionality
This project demonstrates advanced SQL capabilities:
- Window functions (moving averages, row ranking)
- Common Table Expressions (CTEs) for complex aggregations
- Native Oracle XML functions (XMLELEMENT, XMLFOREST, XMLAGG)
- Dynamic SQL generation with dbms_xmlgen
- Anomaly detection using statistical methods
All metrics are tested for:
- Primary key uniqueness
- Non-null constraints on critical dimensions
- Referential integrity across staging and marts layers
- Data freshness and completeness
Made with coffee and data by Isabel Cruz
Talk: From Oracle to Insight: How I Built an AI-Augmented Budget Intelligence Pipeline for the São Paulo State Government
This repository includes a fully runnable demo built for AI Tinkerers SP + Banco BMG (March 26, 2026). The demo shows how to layer Gemini LLM on top of a real government data pipeline — no fine-tuning, pure context engineering.
| Beat | What happens |
|---|---|
| Beat 1 | Raw Oracle/SIGEO data structure — the original problem |
| Beat 2 | dbt transformation output (fct_budget_execution) — anomaly flags, moving averages |
| Beat 3 | Gemini 1.5 Flash generates economic narratives and anomaly alerts in Portuguese |
Prerequisites:
pip install -r requirements.txt
cp .env.example .env
# Add your GEMINI_API_KEY to .env
# Get a free key at: https://aistudio.google.com/app/apikeyFull demo (interactive, 3 beats):
python demo/run_demo.pySingle beat:
python demo/run_demo.py --beat 1 # Raw data structure
python demo/run_demo.py --beat 2 # dbt output
python demo/run_demo.py --beat 3 # Gemini analysisAnomaly alerts only:
python demo/run_demo.py --anomaly-onlyThe reusable pattern (works with any relational database):
Oracle/SIGEO -> dbt (transform) -> Python dict -> Gemini prompt -> Economic narrative
src/llm_narrative.py— core LLM layer withgenerate_narrative()andbatch_analyze()demo/sample_data.py— anonymized budget records mirroring Oracle schemademo/run_demo.py— interactive demo runner
Key insight: structured context + clear analyst role = domain-specific LLM analysis without fine-tuning.
Made with coffee and data by Isabel Cruz