Terra Rara

AI-powered copper futures price prediction platform combining XGBoost ML, TFT-ASRO deep learning, LLM-based sentiment analysis, and cross-asset market data.

Overview

Terra Rara predicts COMEX copper futures (HG=F) returns using two parallel models: an XGBoost regression model for next-day point estimates, and a Temporal Fusion Transformer with Adaptive Sharpe Ratio Optimization (TFT-ASRO) for multi-day probabilistic forecasts. The system ingests daily news via Google News RSS, scores article sentiment using an LLM (Arcee Trinity Large Preview via OpenRouter), extracts 768-dimensional FinBERT embeddings reduced to 32 dimensions via PCA, and combines these with 250+ technical features computed from 17 correlated assets. LME warehouse stock data and futures curve signals provide physical-market context. A scheduled pipeline runs daily to refresh sentiment and generate AI-driven market commentary.

XGBoost target: Next-day simple return: (close[t+1] / close[t]) - 1

TFT-ASRO target: 5-day daily return forecasts with 7-quantile probabilistic output (q0.02 through q0.98)

Target users: Traders, analysts, and developers building commodity forecasting tools.

Non-goals: This system does not provide trading signals or financial advice. Predictions are for informational purposes only.

Features

Predict next-day copper futures returns using XGBoost regression trained on 250+ features
Forecast 5-day returns with probabilistic confidence bands using TFT-ASRO deep learning
Score news sentiment using LLM (Arcee Trinity Large Preview) with FinBERT fallback when API is unavailable
Extract FinBERT CLS embeddings (768-dim → 32-dim PCA) for deep semantic encoding
Track 17 correlated assets via configurable symbol sets (active, champion, challenger)
Aggregate daily sentiment using time-weighted exponential decay
Compute advanced sentiment features: momentum, surprise (Z-score), volume-weighted scores
Ingest LME warehouse stock data and futures curve signals for physical-market context
Generate AI-powered market commentary with stance classification (BULLISH/NEUTRAL/BEARISH)
Display real-time prices for dashboard symbols via yfinance
Visualize historical price, sentiment, and TFT forecast bands over 180 days
Optimize TFT hyperparameters via Optuna TPE search with variance ratio penalty
Trigger manual pipeline execution via authenticated API endpoint
Monitor pipeline health via pipeline_run_metrics table

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│  FRONTEND (Vercel)                                                      │
│  React 18 + TypeScript + Vite + TailwindCSS                             │
│  ├── TradingView widget (lazy loaded)                                   │
│  ├── Price & Sentiment chart with TFT forecast bands (Recharts)         │
│  ├── Deep Learning Forecast card (T+1 return, risk, weekly trend)       │
│  ├── Deep Learning Metrics card (DA, Sharpe, variance ratio)            │
│  ├── XGBoost Forecast card with AI Commentary                           │
│  ├── XGBoost feature importance display                                 │
│  └── Market grid showing 14 dashboard symbols                           │
└─────────────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  BACKEND (HuggingFace Spaces - Docker)                                  │
│  FastAPI + Python 3.11 + Uvicorn + APScheduler                          │
│                                                                         │
│  REST API                                                               │
│  ├── GET /api/analysis          → XGBoost prediction + metrics          │
│  ├── GET /api/analysis/tft/{s}  → TFT-ASRO quantile forecast (5m TTL)  │
│  ├── GET /api/history           → Historical price & sentiment (180d)   │
│  ├── GET /api/market-prices     → Real-time quotes (14 symbols)         │
│  ├── GET /api/commentary        → AI market analysis                    │
│  ├── GET /api/health            → Health check                          │
│  └── POST /api/pipeline/trigger → Manual pipeline execution             │
│                                                                         │
│  ML PIPELINE (Daily @ 02:00 Istanbul)                                   │
│  ├── 16 strategic queries → Google News RSS                             │
│  ├── LLM sentiment scoring with FinBERT fallback                        │
│  ├── FinBERT CLS embedding extraction + PCA (768→32 dim)               │
│  ├── 250+ feature engineering across 17 training symbols                │
│  ├── XGBoost training with early stopping (when train_model=True)       │
│  ├── TFT-ASRO inference (quantile forecasts from HF Hub checkpoint)    │
│  ├── AI commentary generation                                           │
│  └── Pipeline metrics saved to database                                 │
│                                                                         │
│  DEEP LEARNING (Separate Training)                                      │
│  ├── Feature store: price + sentiment + embeddings + LME + calendar    │
│  ├── TFT-ASRO model (pytorch-forecasting + Lightning)                  │
│  ├── ASRO loss: Sharpe + quantile calibration + volatility matching    │
│  ├── Optuna hyperparameter search with variance ratio penalty          │
│  └── Checkpoint persistence via HuggingFace Hub                        │
└─────────────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  DATA LAYER                                                             │
│  ├── Supabase PostgreSQL                                                │
│  │   ├── news_raw / news_processed  (reproducible news pipeline)        │
│  │   ├── news_sentiments_v2         (commodity-aware LLM scores)        │
│  │   ├── news_embeddings            (FinBERT CLS + PCA vectors)        │
│  │   ├── daily_sentiments_v2        (aggregated index)                  │
│  │   ├── price_bars                 (OHLCV data)                        │
│  │   ├── lme_warehouse_data         (LME copper stock levels)           │
│  │   ├── analysis_snapshots         (cached predictions)                │
│  │   ├── ai_commentaries            (market commentary)                 │
│  │   ├── model_metadata             (XGBoost artifacts)                 │
│  │   ├── tft_model_metadata         (TFT-ASRO artifacts)               │
│  │   └── pipeline_run_metrics       (monitoring)                        │
│  ├── HuggingFace Hub (TFT checkpoint + PCA model persistence)           │
│  ├── yfinance (OHLCV price data)                                        │
│  ├── Nasdaq Data Link (LME warehouse stocks, optional)                  │
│  ├── TwelveData (backup live copper price)                              │
│  └── Google News RSS (news source)                                      │
└─────────────────────────────────────────────────────────────────────────┘

Symbol Sets

The system separates symbol discovery, testing, and production into distinct phases.

How Symbols Are Found

The Screener module identifies symbols correlated with COMEX Copper (HG=F):

Universe Builder: Loads ~3000 candidate tickers from seed files (ETFs, miners, macro indices)
Data Probe: Fetches 2 years of weekly returns for each candidate via yfinance
Correlation Screening: Computes Pearson and partial correlations with HG=F
Lead-Lag Analysis: Tests if symbol leads or lags copper by 0-4 weeks
Stability Check: Validates correlation holds in both in-sample and out-of-sample periods

Output: A ranked list of symbols with stable, significant correlations.

Champion / Challenger Testing

Symbol sets are managed via JSON files in backend/config/symbol_sets/:

File	Purpose
`active.json`	Production set - Currently used for training. This is the "winner".
`champion.json`	Previous best - Backup of last known good set before changes.
`challenger.json`	Candidate set - New symbols being tested. Not used in production.

Flow:

Run screener → produces candidate symbols
Add candidates to challenger.json
Train model with SYMBOL_SET=challenger, compare MAE/RMSE to active
If challenger outperforms → promote to active.json, demote old active to champion.json
If not → discard or iterate

This allows risk-free experimentation without breaking production.

Current Active Set (17 symbols)

Used for XGBoost feature engineering. Loaded from backend/config/symbol_sets/active.json.

Category	Symbols	Rationale
Target	HG=F	COMEX Copper Futures - prediction target
Macro	DX-Y.NYB, CL=F	USD strength inversely correlated; oil = industrial demand proxy
Precious	GC=F, SI=F, PL=F	Safe-haven flows; silver/platinum share industrial use
ETFs	FXI, COPX	China demand (FXI); copper miner basket (COPX)
Majors	BHP, FCX, SCCO, RIO, TECK	Large-cap producers with copper exposure
Regional	IVN.TO, LUN.TO, FM.TO, 2899.HK	Mid-cap miners; Zijin (2899) = China stockpiling signal

Dashboard Symbols (14) - Separate List

Used for real-time price display only. Configured via YFINANCE_SYMBOLS env var. Does not affect model training.

Category	Symbols
Target	HG=F
Macro	DX-Y.NYB, CL=F
ETFs	FXI, COPX, COPJ
Miners	BHP, FCX, SCCO, RIO, TECK, IVN.TO, LUN.TO, 2899.HK

Switching Symbol Sets

# Default: uses active.json
SYMBOL_SET=active

# Test challenger set without changing production
SYMBOL_SET=challenger python -m app.ai_engine --train-only

# Compare metrics, then promote if better
cp config/symbol_sets/active.json config/symbol_sets/champion.json
cp config/symbol_sets/challenger.json config/symbol_sets/active.json

Audit Trail

Each model training run records:

symbol_set_name: Which set was used (active/champion/challenger)
training_symbols: Full list of symbols
training_symbols_hash: SHA256 hash for reproducibility

Model Details

Target Definition

The model predicts next-day simple return, not price:

target = (close[t+1] / close[t]) - 1

This is recorded in model metadata as:

target_type: "simple_return"
target_shift_days: 1
target_definition: "simple_return(close,1).shift(-1)"
baseline_price_source: "yfinance_close"

Price Calculation

predicted_price = baseline_price × (1 + predicted_return)

Where baseline_price is the latest HG=F close from the database.

XGBoost Parameters

Parameter	Value	Purpose
`max_depth`	4	Shallow trees, prevents overfitting
`learning_rate`	0.05	Slow learning, better generalization
`subsample`	0.8	80% of data per tree
`colsample_bytree`	0.6	60% of features per tree
`reg_alpha` (L1)	0.5	Sparsity regularization
`reg_lambda` (L2)	2.0	Weight decay
`min_child_weight`	5	Minimum samples per leaf

These settings are conservative - the model avoids large predictions.

Feature Count

Approximately 250 features are generated for XGBoost:

Technical indicators per symbol (RSI, SMA, volatility, returns)
Lag features (1-5 day lags)
Cross-asset correlations
Sentiment features (index, news count)

TFT-ASRO Deep Learning

The TFT-ASRO (Temporal Fusion Transformer with Adaptive Sharpe Ratio Optimization) operates as a parallel model producing 5-day probabilistic forecasts. It runs independently from the XGBoost pipeline and uses a richer feature set.

Model Architecture

Parameter	Value	Notes
`hidden_size`	32	Reduced from 64 to prevent VSN overfitting on ~313 samples
`attention_head_size`	2	Fewer heads for single-series dataset
`dropout`	0.3	Heavy regularization for small sample size
`hidden_continuous_size`	16	Paired reduction with hidden_size
`learning_rate`	3e-4	Conservative to avoid overshooting narrow loss landscape
`max_encoder_length`	60	60 trading days of lookback
`max_prediction_length`	5	5-day forecast horizon
`gradient_clip_val`	1.0	Relaxed from 0.5; tanh-based Sharpe gradients are bounded
`batch_size`	16	~19 batches/epoch for stable gradient estimates

Quantile Output

The model produces 7 quantiles instead of a single point estimate:

q0.02, q0.10, q0.25, q0.50 (median), q0.75, q0.90, q0.98

This directly models 96% and 80% confidence intervals and captures tail risk.

ASRO Loss Function

The custom loss combines three objectives:

L = w_quantile × (quantile_loss + λ_vol × vol_calibration)
  + w_sharpe × (-tanh_sharpe)
  + amplitude_loss

Component	Weight	Purpose
Quantile calibration	λ_quantile = 0.4	Keeps TFT probabilistic
Sharpe component	w_sharpe = 0.6	Drives directional learning
Volatility calibration	λ_vol = 0.35	Q90-Q10 spread tracks 2× actual σ

The normalized sum-to-1 formulation prevents either component from silently dominating.

TFT Feature Categories

The TFT ingests ~340 features across three categories:

Time-varying unknown (observed in past, unknown in future):

Copper prices, returns, technical indicators (from all 17 training symbols)
FinBERT PCA embedding vectors (32 dimensions)
Sentiment momentum (5/10/30-day SMA/EMA), surprise Z-score, volume-weighted sentiment
Event-type intensity (supply_disruption, inventory_draw, etc.)
LME warehouse stock changes, depletion rate, cancelled warrant ratio
Futures curve spread, contango/backwardation flags

Time-varying known (known in future):

Day of week (one-hot), month (sinusoidal encoding)
US/China holiday calendars

Static:

Group identifier (single series: "copper")

Training and Optimization

TFT training is separate from the daily pipeline. It runs on-demand or via scheduled retraining:

cd backend

# Train TFT-ASRO model
python -m deep_learning.training.trainer --symbol HG=F

# Run Optuna hyperparameter search (50 trials)
python -m deep_learning.training.hyperopt --n-trials 50

# Backfill FinBERT embeddings for existing articles
python -m deep_learning.data.embeddings --backfill --days 180

Optuna search uses a composite objective with variance ratio penalty: trials producing flat predictions (VR < 0.5) or overconfident predictions (VR > 1.5) are penalized.

Financial Metrics

Metric	Description
Sharpe Ratio	Annualized risk-adjusted return of long/short strategy
Sortino Ratio	Sharpe variant penalizing only downside volatility
Directional Accuracy	Fraction of days with correct sign prediction
Tail Capture Rate	DA on days where\|return\| > 1.5%
Variance Ratio	pred_std / actual_std (healthy range: 0.5–1.5)
PI80/PI96 Coverage	Empirical coverage of 80% and 96% prediction intervals

Model Persistence

TFT artifacts are stored on HuggingFace Hub to survive ephemeral container restarts:

Artifact	Description
`best_tft_asro.ckpt`	Lightning checkpoint
`pca_finbert.joblib`	Fitted IncrementalPCA model
`optuna_results.json`	Best hyperparameters from search

Directory Structure

terra-rara/
├── backend/
│   ├── app/
│   │   ├── main.py           # FastAPI app, endpoints, scheduler
│   │   ├── ai_engine.py      # XGBoost training, LLM sentiment
│   │   ├── inference.py      # XGBoost live prediction
│   │   ├── features.py       # Technical indicator computation
│   │   ├── data_manager.py   # News ingestion, price fetching
│   │   ├── commentary.py     # AI commentary generation
│   │   ├── scheduler.py      # APScheduler daily pipeline
│   │   ├── models.py         # SQLAlchemy ORM models
│   │   ├── db.py             # Database connection
│   │   └── settings.py       # Pydantic settings
│   ├── deep_learning/        # TFT-ASRO deep learning module
│   │   ├── config.py         # All TFT hyperparameters and training config
│   │   ├── data/
│   │   │   ├── embeddings.py         # FinBERT CLS extraction + PCA
│   │   │   ├── sentiment_features.py # Momentum, surprise, volume-weighted
│   │   │   ├── lme_warehouse.py      # LME warehouse stock data + features
│   │   │   ├── futures_curve.py      # Contango/backwardation signals
│   │   │   ├── feature_store.py      # Centralized data fusion for TFT
│   │   │   └── dataset.py            # TimeSeriesDataSet builder
│   │   ├── models/
│   │   │   ├── tft_copper.py         # TFT model + ASROPFLoss
│   │   │   ├── losses.py             # ASRO + CombinedQuantileLoss
│   │   │   └── hub.py                # HuggingFace Hub upload/download
│   │   ├── training/
│   │   │   ├── trainer.py            # PyTorch Lightning training loop
│   │   │   ├── hyperopt.py           # Optuna TPE hyperparameter search
│   │   │   └── metrics.py            # Sharpe, Sortino, DA, tail capture
│   │   └── inference/
│   │       └── predictor.py          # TFT live prediction pipeline
│   ├── config/
│   │   └── symbol_sets/      # Training symbol configurations
│   │       ├── active.json   # Current training symbols
│   │       ├── champion.json # Best performing set
│   │       └── challenger.json
│   ├── pipelines/            # Faz 2 reproducible data processing
│   │   ├── ingestion/        # RSS/API → news_raw
│   │   ├── processing/       # news_raw → news_processed
│   │   └── cutoff.py         # Market cut-off calculation
│   ├── screener/             # Universe Builder + Feature Screener
│   │   ├── core/             # Config, fingerprint, cache
│   │   ├── contracts/        # Pydantic output models
│   │   ├── universe_builder/ # Seed loading, probing, categorization
│   │   └── feature_screener/ # Correlation analysis
│   ├── worker/               # arq + Redis async task worker
│   │   └── tasks.py          # Pipeline orchestration tasks
│   ├── backtest/             # Walk-forward backtesting
│   │   └── runner.py         # XGBoost + TFT backtest runner
│   ├── tests/                # pytest tests
│   │   ├── deep_learning/    # TFT-ASRO unit tests (7 files)
│   │   └── screener/         # Screener unit tests
│   ├── Dockerfile
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── App.tsx           # Main dashboard
│   │   ├── api.ts            # API client (XGBoost + TFT)
│   │   ├── types.ts          # TypeScript interfaces
│   │   └── components/       # React components
│   ├── index.html
│   ├── vite.config.ts
│   └── package.json
├── data/
│   └── models/               # Local model artifacts (gitignored)
├── .github/
│   └── workflows/
│       ├── hf-sync.yml       # GitHub → HuggingFace sync
│       └── screener-schedule.yml
├── docker-compose.yml
├── env.example
└── README.md

Screener Module

The screener module identifies symbols with stable correlations to COMEX Copper (HG=F) for feature engineering. It provides audit-first, reproducible analysis with full data lineage.

CLI Usage

cd backend

# Build universe from seed sources
python -m screener universe_builder --config config/screener_config.yaml

# Run correlation screening
python -m screener feature_screener --universe artifacts/universes/latest/universe.json

Dual Fingerprint System

The screener uses two fingerprints to ensure both reproducibility and auditability:

content_fingerprint: Deterministic hash of analysis content only. Same inputs + same config = same hash. Excludes timestamps and run IDs.
output_fingerprint: Hash of full output envelope including metadata. Changes with each run.

Key Features

Feature	Description
Weekly via 1d + resample	Downloads daily data, resamples to W-FRI for consistency
Pairwise dropna	Correlation uses per-pair intersection, not global dropna
Frozen lead-lag	Best lag discovered in IS, frozen for OOS evaluation
Partial correlation	Residual correlation with ^GSPC/UUP controls
Multi-source provenance	Each ticker tracks all sources it appeared in
Collision-proof cache	FetchParams fingerprint prevents cache key collisions

Getting Started

Prerequisites

Python 3.11+
Node.js 18+
PostgreSQL database (or Supabase account)
OpenRouter API key (free tier available at openrouter.ai)

Quickstart

# Clone repository
git clone https://github.com/ifieryarrows/terra-rara.git
cd terra-rara

# Start backend
cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp ../env.example .env
# Edit .env with your credentials
uvicorn app.main:app --reload --port 8000

# In another terminal: start frontend
cd frontend
npm install
npm run dev

Frontend runs at http://localhost:5173, backend at http://localhost:8000.

Installation

Backend

cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

Frontend

cd frontend
npm install

Docker (Alternative)

docker-compose up --build

Configuration

Copy env.example to backend/.env and configure:

Variable	Required	Default	Description
`DATABASE_URL`	Yes	-	PostgreSQL connection string
`OPENROUTER_API_KEY`	Yes	-	OpenRouter API key for LLM
`PIPELINE_TRIGGER_SECRET`	Yes	-	Secret token for POST /api/pipeline/trigger (32+ random chars)
`SYMBOL_SET`	No	`active`	Which symbol set to use (active/champion/challenger)
`OPENROUTER_MODEL_SCORING`	No	`arcee-ai/trinity-large-preview:free`	Primary model for sentiment scoring
`OPENROUTER_MODEL_COMMENTARY`	No	`arcee-ai/trinity-large-preview:free`	Primary model for commentary generation
`OPENROUTER_RPM`	No	`18`	Soft throttle target for OpenRouter calls
`OPENROUTER_MAX_RETRIES`	No	`3`	Max retry attempts for 429/5xx OpenRouter errors
`MAX_LLM_ARTICLES_PER_RUN`	No	`200`	Per-run LLM scoring budget before FinBERT overflow
`OPENROUTER_FALLBACK_MODELS`	No	empty	Optional comma-separated fallback model list
`OPENROUTER_MODEL`	No	-	Deprecated fallback model env (backward compatibility)
`LLM_SENTIMENT_MODEL`	No	-	Deprecated fallback scoring model env
`TWELVEDATA_API_KEY`	No	-	Backup live price source
`SCHEDULER_ENABLED`	No	`false`	Local-only scheduler flag (production uses external trigger)
`SCHEDULE_TIME`	No	`02:00`	Daily pipeline time (HH:MM)
`TZ`	No	`Europe/Istanbul`	Scheduler timezone
`YFINANCE_SYMBOLS`	No	(14 symbols)	Dashboard symbols (comma-separated)
`NEWS_LOOKBACK_DAYS`	No	`30`	Days of news to fetch
`SENTIMENT_DECAY_HALF_LIFE`	No	`7.0`	Sentiment decay half-life (days)
`HF_TOKEN`	No	-	HuggingFace token for TFT model upload/download
`NASDAQ_DATA_LINK_API_KEY`	No	-	Nasdaq Data Link API key for LME warehouse data
`MODEL_DIR`	No	`/data/models`	Base directory for model artifacts (XGBoost + TFT)
`HF_HUB_DISABLE_PROGRESS_BARS`	No	`1`	Disable Hugging Face progress bars
`TRANSFORMERS_VERBOSITY`	No	`error`	Reduce Transformers log noise
`TRANSFORMERS_NO_ADVISORY_WARNINGS`	No	`1`	Disable advisory warnings from Transformers

The env.example file includes PIPELINE_TRIGGER_SECRET= with no value. Generate a random secret before deploying.

Usage

Dashboard

Access the live dashboard at terra-rara.vercel.app:

Deep Learning Forecast displays the TFT-ASRO T+1 predicted return, direction (BULLISH/BEARISH/NEUTRAL), risk level, and 5-day weekly trend
Deep Learning Metrics shows directional accuracy, Sharpe ratio, variance ratio, and tail capture rate
XGBoost Forecast Card displays the predicted next-day copper price, predicted return percentage, and AI stance
Price Forecast Chart shows 180 days of historical copper prices with T+1 TFT forecast dot and Q10-Q90 confidence band
Market Drivers lists the top XGBoost feature importances
Market Grid shows real-time prices for all 14 dashboard symbols

API Endpoints

Interactive API documentation: ifieryarrows-copper-mind.hf.space/api/docs

# Get current XGBoost prediction
curl https://ifieryarrows-copper-mind.hf.space/api/analysis

# Get TFT-ASRO deep learning forecast
curl https://ifieryarrows-copper-mind.hf.space/api/analysis/tft/HG=F

# Get AI commentary
curl https://ifieryarrows-copper-mind.hf.space/api/commentary

# Get historical data
curl https://ifieryarrows-copper-mind.hf.space/api/history

# Trigger pipeline manually (requires authentication)
curl -X POST "https://ifieryarrows-copper-mind.hf.space/api/pipeline/trigger?fetch_data=true&train_model=true" \
  -H "Authorization: Bearer YOUR_PIPELINE_TRIGGER_SECRET"

API Reference

GET /api/analysis

Returns current prediction with model metrics.

{
  "symbol": "HG=F",
  "current_price": 4.2500,
  "baseline_price": 4.2350,
  "baseline_price_date": "2026-01-25",
  "predicted_return": 0.001500,
  "predicted_return_pct": 0.15,
  "predicted_price": 4.2414,
  "target_type": "simple_return",
  "price_source": "yfinance_db_close",
  "confidence_lower": 4.1800,
  "confidence_upper": 4.3000,
  "sentiment_index": 0.227,
  "sentiment_label": "Bullish",
  "top_influencers": [
    {"feature": "HG=F_vol_10", "importance": 0.1808, "description": "10-day volatility"},
    {"feature": "FXI_lag_ret1_2", "importance": 0.1019, "description": "China ETF 2-day lagged return"}
  ],
  "data_quality": {
    "coverage_pct": 98.5,
    "missing_features": []
  },
  "training_symbols_hash": "sha256:7b7dd017b79da296",
  "generated_at": "2026-01-17T09:00:00Z"
}

GET /api/commentary

Returns AI-generated market analysis.

{
  "symbol": "HG=F",
  "commentary": "Copper futures show bullish momentum...",
  "ai_stance": "BULLISH",
  "generated_at": "2026-01-17T09:15:00Z"
}

GET /api/market-prices

Returns real-time quotes for all dashboard symbols.

{
  "symbols": {
    "HG=F": {"price": 4.25, "change": 1.23},
    "BHP": {"price": 45.67, "change": -0.45}
  },
  "updated_at": "2026-01-17T15:30:00Z"
}

GET /api/history

Returns 180 days of historical price and sentiment data.

GET /api/health

Returns health status including database connectivity.

GET /api/analysis/tft/

Returns TFT-ASRO deep learning forecast with quantile predictions. Results are cached for 5 minutes.

{
  "symbol": "HG=F",
  "direction": "BULLISH",
  "risk_level": "MEDIUM",
  "weekly_trend": "BULLISH",
  "prediction": {
    "predicted_return_median": 0.0035,
    "predicted_return_q10": -0.0080,
    "predicted_return_q90": 0.0150,
    "predicted_price_median": 4.2650,
    "volatility_estimate": 0.0115,
    "daily_forecasts": [
      {"day": 1, "return_median": 0.0035, "return_q10": -0.0080, "return_q90": 0.0150}
    ]
  },
  "model_metadata": {
    "trained_at": "2026-03-10T02:30:00Z",
    "metrics": {
      "directional_accuracy": 0.58,
      "sharpe_ratio": 1.12,
      "variance_ratio": 0.85,
      "tail_capture_rate": 0.62
    }
  }
}

POST /api/pipeline/trigger (Privileged)

Manually triggers the ML pipeline. This is a privileged endpoint that consumes significant resources (LLM API calls, database writes, model training).

Authentication requirement: This endpoint requires a valid Authorization: Bearer <PIPELINE_TRIGGER_SECRET> header. Requests without a valid token receive 401 Unauthorized.

Expected responses:

Status	Condition
200	Pipeline triggered successfully
401 Unauthorized	Missing or invalid `Authorization` header
409 Conflict	Pipeline already running

Parameters:

Parameter	Type	Default	Description
`fetch_data`	boolean	`true`	Fetch new news and prices
`train_model`	boolean	`true`	Retrain XGBoost model

Daily vs Manual Pipeline:

Daily scheduler runs with train_model=False (uses existing model, refreshes sentiment only)
Manual trigger defaults to train_model=True (retrains model)

Development

Running Tests

cd backend

# All tests
pytest tests/ -v

# Deep learning tests only
pytest tests/deep_learning/ -v

# Screener tests only
pytest tests/screener/ -v

Linting

# Frontend
cd frontend
npm run lint

# Backend (if ruff is installed)
cd backend
ruff check .

Daily Pipeline

The pipeline is triggered daily by external scheduler automation (GitHub Actions cron). Local scheduler mode (SCHEDULER_ENABLED=true) is for development only:

Fetch news from 16 strategic Google News RSS queries
Process news through reproducible pipeline (news_raw → news_processed)
Fetch price data for 17 training symbols via yfinance
Score sentiment via LLM in batches of 20 articles
Compute FinBERT CLS embeddings and store PCA-reduced vectors
Aggregate daily sentiment with time-weighted decay (half-life: 7 days)
Generate XGBoost prediction using existing model (no retraining by default)
Generate TFT-ASRO inference from HF Hub checkpoint (quantile forecasts)
Generate AI commentary via OpenRouter
Cache prediction snapshots
Save pipeline metrics to pipeline_run_metrics table

TFT model training is separate from the daily pipeline and runs on-demand.

Pipeline Monitoring

Each pipeline run records metrics to the database:

SELECT run_id, run_started_at, duration_seconds, 
       symbols_failed, status, symbol_set_name,
       tft_trained, tft_sharpe, tft_directional_accuracy
FROM pipeline_run_metrics 
ORDER BY run_started_at DESC 
LIMIT 10;

Tracked metrics:

duration_seconds: Total pipeline runtime
symbols_requested / symbols_fetched_ok / symbols_failed: Data fetch stats
news_raw_inserted / news_raw_duplicates: News ingestion stats
news_processed_inserted / articles_scored_v2: Processing stats
tft_embeddings_computed: FinBERT embeddings generated
tft_trained / tft_val_loss / tft_sharpe / tft_directional_accuracy: TFT metrics
snapshot_generated / commentary_generated / tft_snapshot_generated: Output flags
status: success/failed
error_message: Error details if failed

Troubleshooting

Database connection refused

Symptom: psycopg2.OperationalError: could not connect to server

Cause: PostgreSQL is not running or DATABASE_URL is incorrect.

Fix: Verify PostgreSQL is running. Check DATABASE_URL format: postgresql://user:password@host:port/database. For Supabase, use the connection pooler URL.

LLM sentiment scoring fails, falling back to FinBERT

Symptom: Logs show LLM sentiment failed, using FinBERT fallback

Cause: OpenRouter API key is missing, invalid, or rate-limited.

Fix: Verify OPENROUTER_API_KEY is set correctly in .env. Check OpenRouter dashboard for rate limit status. The system will continue using FinBERT as a fallback.

Frontend cannot connect to backend

Symptom: Network errors in browser console, data not loading.

Cause: Backend is not running or CORS is blocking requests.

Fix: Ensure backend is running at the expected URL. For local development, verify http://localhost:8000 is accessible. Check that the frontend API base URL matches the backend.

Pipeline runs but predictions are stale

Symptom: generated_at timestamp in /api/analysis does not update.

Cause: Pipeline failed silently or scheduler is disabled.

Fix: Check backend logs for pipeline errors and queue health. Manually trigger pipeline via POST /api/pipeline/trigger to test execution end-to-end.

Invalid or missing target_type error

Symptom: /api/analysis returns null, logs show Invalid or missing target_type in model metadata

Cause: Model was trained before target_type field was added.

Fix: Retrain the model via POST /api/pipeline/trigger?train_model=true. New models include target_type: "simple_return" in metadata.

Feature mismatch error during prediction

Symptom: ValueError: feature_names mismatch in logs

Cause: Inference is using different symbols than training.

Fix: Ensure both training and inference use the same symbol set. The system automatically aligns features via reindex(columns=expected, fill_value=0).

TFT model unavailable or returns null

Symptom: /api/analysis/tft/HG=F returns null or {"status": "unavailable"}

Cause: No trained TFT checkpoint exists locally or on HuggingFace Hub.

Fix: Train the TFT model via python -m deep_learning.training.trainer --symbol HG=F. Ensure HF_TOKEN is set if using HF Hub persistence. The predictor will automatically download the checkpoint from Hub on first inference if available.

TFT predictions are flat or near-zero

Symptom: All quantile predictions cluster around zero with minimal spread.

Cause: Variance ratio collapse (VR < 0.5). The model learned to minimize loss by predicting near-constant values.

Fix: Run Optuna hyperparameter search with python -m deep_learning.training.hyperopt --n-trials 50. The search space includes variance ratio penalty to prevent this failure mode. Check that lambda_vol >= 0.25 and learning_rate <= 1e-3.

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Make changes and add tests if applicable
Run linting and tests locally before committing
Commit with descriptive messages: git commit -m "Add feature X"
Push to your fork: git push origin feature/your-feature
Open a Pull Request with a clear description of changes

Issues and feature requests are welcome via GitHub Issues.

Security

Credential Masking

The system masks sensitive credentials in logs:

Database passwords are replaced with ***:*** in connection URLs
API keys in httpx request logs are suppressed
Error messages containing credentials are sanitized

Privileged Endpoint: POST /api/pipeline/trigger

This endpoint triggers the full ML pipeline, which fetches news, calls the LLM API for sentiment scoring, computes FinBERT embeddings, retrains the XGBoost model, runs TFT-ASRO inference, and generates AI commentary. Unauthenticated access to this endpoint creates the following risks:

Request flooding: Repeated triggers can degrade service availability.
Quota and cost burn: Each pipeline run consumes OpenRouter API quota. Uncontrolled access can exhaust free-tier limits or incur costs.
Resource exhaustion: Model training and batch LLM calls are CPU and memory intensive.

Authentication: This endpoint requires a valid Authorization: Bearer <PIPELINE_TRIGGER_SECRET> header. Requests without a valid token receive 401 Unauthorized.

Configuration

Set PIPELINE_TRIGGER_SECRET in your .env file. Use a random string of 32 or more characters. Store this as a secret in your deployment platform.
Rotate the secret immediately if it is ever exposed in logs, commits, or third-party systems.

Abuse Prevention (Recommended Controls)

Even with authentication, apply additional safeguards:

Rate limiting: Limit to 5-10 requests per minute per IP or per token. Use a reverse proxy (nginx, Caddy) or middleware.
Idempotency lock: The endpoint already checks if the pipeline is running and returns 409 Conflict. Ensure this lock file mechanism is reliable across restarts.
Network restriction: If the endpoint is only used by internal automation (CI/CD, cron jobs), restrict access via IP allowlist, VPN, or deploy the backend on an internal network.
Monitoring and alerting: Log all calls to this endpoint. Alert on spikes in 401 responses, sudden trigger volume, or upstream API quota warnings.

Secret Management

Do not commit secrets to version control.
Use your deployment platform's secret management (HuggingFace Spaces secrets, Vercel environment variables, etc.).
The env.example file should include PIPELINE_TRIGGER_SECRET= with no value as a placeholder.

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
.github/workflows		.github/workflows
backend		backend
data/models		data/models
docs		docs
frontend		frontend
notebooks		notebooks
scripts		scripts
supabase/migrations		supabase/migrations
.gitignore		.gitignore
Bakır Fiyat Tahmini_ XGBoost ve Alternatifler.md		Bakır Fiyat Tahmini_ XGBoost ve Alternatifler.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
env.example		env.example
package-lock.json		package-lock.json

Folders and files

Latest commit

History

Repository files navigation

Terra Rara

Table of Contents

Overview

Features

Architecture

Symbol Sets

How Symbols Are Found

Champion / Challenger Testing

Current Active Set (17 symbols)

Dashboard Symbols (14) - Separate List

Switching Symbol Sets

Audit Trail

Model Details

Target Definition

Price Calculation

XGBoost Parameters

Feature Count

TFT-ASRO Deep Learning

Model Architecture

Quantile Output

ASRO Loss Function

TFT Feature Categories

Training and Optimization

Financial Metrics

Model Persistence

Directory Structure

Screener Module

CLI Usage

Dual Fingerprint System

Key Features

Getting Started

Prerequisites

Quickstart

Installation

Backend

Frontend

Docker (Alternative)

Configuration

Usage

Dashboard

API Endpoints

API Reference

GET /api/analysis

GET /api/commentary

GET /api/market-prices

GET /api/history

GET /api/health

GET /api/analysis/tft/

POST /api/pipeline/trigger (Privileged)

Development

Running Tests

Linting

Daily Pipeline

Pipeline Monitoring

Troubleshooting

Database connection refused

LLM sentiment scoring fails, falling back to FinBERT

Frontend cannot connect to backend

Pipeline runs but predictions are stale

Invalid or missing target_type error

Feature mismatch error during prediction

TFT model unavailable or returns null

TFT predictions are flat or near-zero

Contributing

Security

Credential Masking

Privileged Endpoint: POST /api/pipeline/trigger

Configuration

Abuse Prevention (Recommended Controls)

Secret Management

License

About

Resources

License

Uh oh!

Stars

Packages