Multi-frequency (5m / 15m / 1h / 1d) regime-detection experiment on the same event window.
Structure aligned with Regime Labels Are Not Representation-Invariant:
run.py: main entry (ensures 5m data via IB if missing; then runs the multi-frequency pipeline). Accepts optional CLI overrides.config.yaml:raw_dir,outputs_dir,assets,episode, andib(host, port, dates, future_expiry).data/: raw 5m CSVs (<symbol>_5m.csv).outputs/: generated results tables and plots.src/: code:data/data_ib.py(IB download),workflows/pipeline.py(analysis),commands/cli_main.py(CLI dispatcher).paper/: LaTeX manuscript sources (main.tex,refs.bib); build withpaper/build_paper.bat.
pip install -r requirements.txt
python run.py- Default: ensure 5m data (download via IB if any asset is missing), then run the pipeline and write outputs under
outputs/. --download: only download 5m data via IB and exit (for future multi-purpose: pipeline, report, etc.).
./reproduce_paper.sh # regenerate every paper Table / Figure under outputs/reproduce_paper.sh calls each top-level CLI command in order; the Experiments section below documents what every sub-experiment does, the method, the headline result, and the output files. Sub-experiments of the extended sweep are also individually CLI-registered so reviewers can re-run a single one (e.g. python run.py extended_bootstrap).
The notebooks/ directory is the exploratory view only. Each notebook reads from outputs/*.csv already produced by the CLI and renders figures / sanity checks for that single experiment. They do not regenerate the underlying CSVs and are not part of the reproduction pipeline. For paper-grade replication always run python run.py … (or ./reproduce_paper.sh); use the notebooks only to inspect intermediate artefacts after a CLI run.
The pipeline is layered. Tier 1 produces the headline numbers; Tier 2 supplies DGP baselines and parameter-sensitivity sweeps that contextualise them; Tier 3 is post-processing paper artefacts (referee-response material lives here too); Tier 4 is dev helpers not in the reproduce chain. The extended orchestrator (python run.py extended) runs every Tier 2/3 sub-experiment in sequence; each is also independently CLI-registered so reviewers can re-run a single one. The 2022 OOS episode shares the same commands; outputs land in outputs_2022/ when data_2022/ is present.
Headline result the layers support: empirical mean off-diagonal cross-frequency ARI is 0.11–0.39 across the current 2026 panel (0.13–0.15 in 2022 OOS), with a structural break at the 15m→1h boundary; the calibrated K=2 Markov-switching baseline is 0.113 (IQR 0.099–0.125). All five 2026 assets exceed the intraday-only calibrated upper bound (0.195 IQR [0.177, 0.213]) by +0.03 (USD/JPY) to +0.24 (CL).
| Command | Purpose | Method | Headline result | Outputs |
|---|---|---|---|---|
pipeline |
Produce the cross-frequency ARI/AMI/VI tables, GMM diagnostics, regime timelines, expanding-window estimates, block-permutation p-values, calendar-window robustness, HMM parallel, time-of-day adjustment, and rolling-7d trace for one episode. | Independent two-component GMMs on log(rolling RV) at 5m / 15m / 1h / 1d; align via forward-fill; ARI on every off-diagonal pair; default block size 50 5m bars for the permutation test. | 4×4 ARI matrices per asset; mean off-diag 0.11–0.39 (2026, five assets), 0.13–0.15 (2022 OOS); 15m→1h drop is the largest single-step fall in four of five 2026 assets; block-perm p < 0.002 at every block size in [25, 250]. Paper: Tables 1–2, Fig. 1, Tables A.1–A.5, A.8–A.10, A.13, A.14. | outputs/<asset>_*.csv (≈25 files/asset), <asset>_timeline.png, pipeline_summary.csv, run.log |
For the 2022 OOS replica run: python run.py pipeline --episode 2022_ukraine --raw-dir data_2022 --outputs-dir outputs_2022.
| Command | Purpose | Method | Headline result | Outputs |
|---|---|---|---|---|
extended_simulation |
Establish the K=2 calibrated reference ARI under a known regime-switching DGP. | Two-state Markov-Gaussian DGP at calibrated and three sensitivity persistence settings (P=0.005 / 0.003 / 0.001, durations ≈3.3 / 5.5 / 17 h); 200 reps × 100k 1-min bars; aggregate to 5m / 15m / 1h / 1d; fit GMMs; compute null (permuted) and alternative ARI. | Calibrated baseline all-4 ARI: 0.113 (IQR [0.099, 0.125]); intraday-only baseline 0.195 (IQR [0.177, 0.213]); empirical 0.11–0.39 range spans 0.11–0.39 vs calibrated 0.113 mean. Paper: §app:rss_sim. | outputs/simulation_rss.csv, simulation_detection_rate.csv |
extended_garch_ms |
Rule out i.i.d. Gaussian assumption as the load-bearing baseline driver. | Replace within-regime innovations with GARCH(1,1) (α=0.05, β=0.90); same Markov chain, same 200×100k spec. | Baseline tracks Gaussian within 0.02 (all-4) / 0.04 (intraday); volatility clustering is not load-bearing. Paper: §app:garchms_sim. | outputs/simulation_rss_garchms.csv |
extended_k3_baseline |
Test whether K=3 collapses the empirical-vs-baseline gap. | Refit K=3 GMM on the same 1-min paths, binarise via argmax-of-means; 500 reps; same persistence sweep. | K=3 baseline 0.25–0.33; empirical K=3 panel mean 0.21 still undershoots by 0.04–0.12 (vs K=2 gap 0.14–0.25); CL 2026 exceeds the K=3 upper bound (third state absorbs supply-shock bursts). Paper: Suppl Table tab:k3_baseline, §app:k3_baseline. |
outputs/simulation_rss_k3.csv |
extended_asym_baseline |
Address simulated R2 M2: symmetric P=P fixes stationary crisis share at 50%, but empirical 1h crisis shares span 16.3%–70.0%. | Per-asset DGP with τ=P+P fixed and stationary share π matched to empirical 1h share; 200 reps × 20k bars. | Per-asset baseline 0.111–0.134 (within or just above the symmetric ML-fit IQR). Per-asset 4-frequency gap: SPY +0.07, QQQ +0.04, USD/JPY -0.03, CL +0.28, GLD +0.04. Cross-frequency ARI is governed by 1/τ, not by π; relaxing symmetry does not absorb the empirical-vs-calibrated shortfall. Paper: Suppl Table tab:asym_baseline, §app:asym_baseline. |
outputs/simulation_rss_asym_persistence_1h_anchor.csv, outputs/simulation_rss_asym_persistence_1d_anchor.csv |
extended_kxw_sweep |
Bound the headline ARI under parameter perturbations of K and rolling-volatility window length. | Wraps pipeline.run_robustness with K ∈ {2,3} × window-scale ∈ {0.5×, 1×, 2×} for the five-asset 2026 panel and three-asset 2022 OOS panel. |
15m→1h structural break preserved at every cell; ARI envelope and worst-case bounds reported. Paper: Suppl Tables A.6 tab:robustness, A.7 tab:robustness_extremes, A.11 tab:window_sweep. |
outputs/robustness_summary.csv, robustness_ranges.csv, robustness_report.md (and outputs_2022/ mirrors) |
extended_block_sweep_gld |
Fill the GLD row of the block-size sweep table (other four rows ship with the main pipeline at default block size 50). | Block-permutation p-value on GLD aligned-regime labels at block sizes {25, 50, 100, 250} 5m bars. | Closes the 4×4 sweep grid that the pipeline starts. Paper: Suppl Table A.13 tab:block_sweep (GLD row). |
outputs/gld_block_sweep.csv |
extended_block_sweep_assets |
Fill the SPY/QQQ/CL/USD-JPY rows of the block-size sweep table at all four block sizes. | Block-permutation p-value on each asset's aligned-regime labels at block sizes {25, 50, 100, 250} 5m bars; seed=42, n_perm=500. | Completes the 5-asset × 4-block-size grid alongside extended_block_sweep_gld. Paper: Suppl Table A.13 tab:block_sweep (SPY/QQQ/CL/USD-JPY rows). |
outputs/block_sweep_assets.csv |
Consume main-pipeline outputs (or Tier-2 simulation outputs). Round 1 referee responses live here next to their peers; grouping by function avoids splitting peers across "round" buckets.
| Command | Purpose | Method | Headline result | Outputs |
|---|---|---|---|---|
extended_majority_vote |
Symmetric counterpart to the forward-fill baseline (does the empirical ARI level depend on aggregation direction?). | Fine→coarse majority-vote upward aggregation; ARI against the coarser-grid GMM labels. | Combined-episode range 0.09–0.42, comparable to forward-fill (0.11–0.39); aggregation direction does not change the empirical level. Paper: Suppl Table A.15 tab:majority_vote. |
outputs/<asset>_majority_vote_cross_freq_ari.csv, majority_vote_summary.csv |
extended_bootstrap |
Quantify sampling variability of the mean off-diag ARI; test whether a specific calm/stress 5-day window is unusual. | 1,000 random 5-day windows drawn uniformly from the calendar; GMM fitted once on full sample. | Across 16 calm/stress vs bootstrap comparisons (8 asset×episode), p ∈ [0.08, 0.99], median p = 0.59; only SPY 2026 stress drops below 0.10. Calm and stress windows are not statistically unusual draws. Paper: Suppl Table A.16 tab:bootstrap_5d. |
outputs/bootstrap_five_day_windows.csv |
extended_hypothesis_tests |
Test whether the cross-resolution ARI level differs by frequency-pair category. | KW + Mann–Whitney U on pooled off-diag ARI grouped into adjacent intraday / non-adjacent intraday / intraday-daily. | Significant difference across categories driven by the 15m→1h break; consistent with the structural-pair claim. Paper: Suppl §S.3 prose. | outputs/hypothesis_tests.csv |
extended_calm_subsample |
Referee R1 Q7: do results hold under "normal" market conditions? | Subset to days with daily RV ≤ in-sample median and outside the peak-stress window; recompute ARI with full-sample GMM boundaries fixed. | Calm-only ARI: 0.05–0.11 (2026, five assets), 0.05–0.12 (2022 OOS, 3 assets); cross-resolution shape (5m–15m high, 15m→1h drop, 1h–1d low) preserved. The 15m→1h break is not a stress artefact. Paper: Suppl Table A.17 tab:calm_day_subsample, §3.4. |
outputs/calm_day_subsample_ari.csv |
extended_var_uplift |
Referee R1 Q9: does the dissonance signal carry economic value? | Per (asset, regime) cell 99% empirical-VaR from 5m log-returns under the 1h and 1d classifiers; conservative-resolution rule posts max(|VaR_1h|, |VaR_1d|). | 2026 always-conservative uplift over 1d baseline: 4.2% (QQQ) – 21.0% (CL); 2022 OOS: 10.0%–12.7%. Disagreement-day-only ratio gives per-asset risk-budget tightening factors (e.g. SPY 1.33× → tighten by 0.75). Paper: Suppl Table A.18 tab:var_uplift, §4. |
outputs/var_uplift_by_resolution.csv |
extended_disagree_config |
Caption-level precision footnote for the VaR-uplift table. | Per (asset, episode), count disagreement days falling into each of the two configurations: 1h-crisis/1d-calm vs 1h-calm/1d-crisis. | Quantifies whether the "single-configuration concentration" framing is exact or approximate. Paper: Suppl Table tab:var_uplift notes. |
outputs/disagree_config_breakdown.csv |
stress_vs_calm |
Referee R1 Q7 follow-up: paired test of stress-window vs calm-window ARI. | Consume bootstrap CSVs from both episodes; pool 7 asset×episode pairs; paired t / Wilcoxon / sign tests on the difference. | Mean diff +0.014 (σ 0.060); paired t p = 0.54; Wilcoxon p = 0.58; sign p = 1.00; 4/7 pairs stress > calm. Cannot reject equality. Paper: Suppl Table A.19 tab:stress_vs_calm. |
outputs/stress_vs_calm_test.csv, stress_vs_calm_test_summary.csv, stress_vs_calm_test.txt |
cross_asset |
Cross-asset rolling-ARI resonance figure for the supplement. | Per-day mean off-diag ARI across the current 2026 panel; rolling 7-day trace. | Shows synchronised dissonance episodes across the 2026 panel. Paper: Supplementary figure. | outputs/cross_asset_resonance.png |
| Command | Purpose | Outputs |
|---|---|---|
summarize |
Console pivot of mean off-diag ARI by (asset, model, window) for quick eyeballing during development. | stdout only |
Event window: The default outbreak window is ~5–10 trading days to avoid large 5m data volume. Default in config.yaml is ~2 weeks (e.g. 2026-01-06 to 2026-01-20); T0 can be fixed from volatility/news.
Override dates/symbols/port (with or without --download):
python run.py --download --start 2026-01-06 --end 2026-01-20 --symbols SPY USDJPY CL
python run.py --download --port 7497 --raw-dir dataIB prerequisite: Gateway (or TWS) running; API connected, Historical Data Farm ON. Default: 127.0.0.1:4002 (Gateway paper). TWS paper = 7497.
Output: raw_dir/<symbol>_5m.csv with Date (America/New_York), Open, High, Low, Close, Volume.
raw_dir,outputs_dir: same as Representation-Invariant.assets: list of symbols (SPY, USDJPY, CL).episode: explicit event/calm window set used by analysis (2026_iranor2022_ukraine).ib:host,port,client_id,start_date,end_date,future_expiry_by_symbol(e.g. CL:"CONTFUT"for IB continuous front-month futures).
- CL (WTI): Recommended setting is
ib.future_expiry_by_symbol.CL: "CONTFUT"so IB returns continuous front-month futures rather than one fixed far-month contract. - Timezone: All bars are written in America/New_York.
- Bar count per day: SPY ≈78, CL ≈276, USDJPY 288; do not use a global fixed bar count when building rolling features.
- Paper figure sync: regenerate plots into
outputs/and copyoutputs/SPY_timeline.png,outputs/QQQ_timeline.png,outputs/CL_timeline.png,outputs/USDJPY_timeline.png, andoutputs/GLD_timeline.pngto the correspondingpaper/<asset>_timeline.pngpaths before building the manuscript.
MIT. See LICENSE.
If you use this repository, please cite the associated paper and/or this codebase. See CITATION.cff.