Skip to content

feat(kalshi): add ohlcv_hourly model#9551

Merged
jeff-dude merged 45 commits intomainfrom
feat/kalshi-ohlcv-hourly
Apr 17, 2026
Merged

feat(kalshi): add ohlcv_hourly model#9551
jeff-dude merged 45 commits intomainfrom
feat/kalshi-ohlcv-hourly

Conversation

@los-xyz
Copy link
Copy Markdown
Contributor

@los-xyz los-xyz commented Apr 10, 2026

Summary

Adds kalshi.ohlcv_hourly — hourly OHLCV candles for Kalshi prediction markets. Depends on #9549 (market_details + market_trades).

What it does

  • Sparse OHLCV aggregation from kalshi_market_trades with deterministic ordering (created_time, trade_id)
  • Bounded hour spine per market via utils.hours, forward-fill via ASOF LEFT JOIN
  • Resolution correction: close price set to 1.0/0.0 for settled markets past expiration
  • is_forward_filled flag, VWAP null on no-trade hours
  • Category extracted from product_metadata JSON
  • Full history (2021+)
  • Output schema aligned with polymarket_polygon.ohlcv_hourly for cross-venue consistency (BlackRock trial)

Kalshi vs Polymarket OHLCV differences

  • Kalshi has no condition_id/token_outcome — uses ticker as market ID
  • outcome is always 'Yes' (each Kalshi ticker is a single market; No price = 1 - Yes price)
  • Ordered by created_time + trade_id (off-chain, no block_time/evt_index)
  • Resolution via result field + expiration_time (vs Polymarket's market_outcome + market_end_time)

Test plan

  • dbt compile passes
  • CI builds and tests pass
  • Unique on (hour, market_id, outcome)
  • Spot-check OHLCV against raw trades for sample markets
  • Verify forward-fill and resolution correction

🤖 Generated with Claude Code

los-xyz and others added 6 commits April 10, 2026 11:32
Add two new Kalshi prediction market spells built from API bronze tables:

- kalshi.market_details: market reference table joining markets_0003
  with event metadata from market_details_0003. Filtered to markets
  with >= 100 contracts traded (6.5M of 39.8M markets, 99.7% of volume).
  Drops 12 universally null/constant columns (55 → 43).

- kalshi.market_trades: trade-level view enriched with market metadata
  via inner join to market_details, filtering out dust market trades.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename sources from _0003 to _raw (market_trades_raw, markets_raw, market_details_raw)
- Update contributor from dpettas to allelosi

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add two new Kalshi prediction market spells built from API bronze tables:

- kalshi.market_details: market reference table joining markets_0003
  with event metadata from market_details_0003. Filtered to markets
  with >= 100 contracts traded (6.5M of 39.8M markets, 99.7% of volume).
  Drops 12 universally null/constant columns (55 → 43).

- kalshi.market_trades: trade-level view enriched with market metadata
  via inner join to market_details, filtering out dust market trades.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename sources from _0003 to _raw (market_trades_raw, markets_raw, market_details_raw)
- Update contributor from dpettas to allelosi

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hourly OHLCV candles for Kalshi prediction markets:
- Built on yes_price_dollars from kalshi_market_trades
- Forward-fills no-trade hours via ASOF join on utils.hours spine
- Resolution-corrects close prices for settled markets (1.0/0.0)
- Extracts category from product_metadata JSON
- Full history coverage (2021+)
- Aligned output schema with polymarket_polygon.ohlcv_hourly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added WIP work in progress dbt: daily covers the Daily dbt subproject labels Apr 10, 2026
los-xyz and others added 5 commits April 10, 2026 13:45
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… in details

- market_trades: add amount_usd (yes_price_dollars * count_fp) and _updated_at
- market_details: extract category from product_metadata JSON

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… in details

- market_trades: add amount_usd (yes_price_dollars * count_fp) and _updated_at
- market_details: extract category from product_metadata JSON

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@los-xyz los-xyz requested review from jeff-dude April 10, 2026 12:21
@jeff-dude
Copy link
Copy Markdown
Member

pushed some changes to get the *ohlcv_hourly table to match the polymarket one, assuming that's our goal for now to be in sync across the two main providers. a static table of 2025 data only to give sample, prior to full release later. also removed the data explorer display for now.

we are also blocked here until we finalize the other PR which builds the upstream kalshi tables. they also exist in this PR, but not with latest changes.

@los-xyz los-xyz marked this pull request as ready for review April 12, 2026 13:21
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 12, 2026

PR Summary

Medium Risk
Changes Polymarket’s OHLCV model materialization and incremental logic, which can affect rebuild behavior and historical candle outputs (especially around window boundaries and settlement correction).

Overview
Adds a new incremental model kalshi.ohlcv_hourly that builds hourly OHLCV candles from kalshi_market_trades, forward-fills no-trade hours via an hour spine + ASOF join, and overwrites all OHLC fields to terminal 1.0/0.0 after market expiration when the market is settled; includes market metadata fields, is_forward_filled, _updated_at, and uniqueness tests on (block_month, hour, market_id, outcome).

Refactors polymarket_polygon.ohlcv_hourly from a static table into an incremental merge partitioned by block_month, adds a prior-window sparse anchor to keep forward-fill correct across incremental boundaries, applies settlement correction to all OHLC fields (not just close), and updates schema docs/tests accordingly (including block_month and _updated_at).

Reviewed by Cursor Bugbot for commit 72b2402. Configure here.

@github-actions github-actions Bot added ready-for-review this PR development is complete, please review and removed WIP work in progress labels Apr 12, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Resolution correction makes close exceed high/low bounds
    • Applied resolution correction to open, high, and low in addition to close to maintain the OHLCV invariant for post-expiration settled markets.

Create PR

Or push these changes by commenting:

@cursor push 4216e8e531
Preview (4216e8e531)
diff --git a/dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql b/dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql
--- a/dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql
+++ b/dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql
@@ -116,9 +116,6 @@
 		f.ticker,
 		f.market_name,
 		f.event_ticker,
-		f.open,
-		f.high,
-		f.low,
 		case
 			when m.expiration_time is not null
 				and f.hour > m.expiration_time
@@ -127,6 +124,42 @@
 				case
 					when m.result = 'yes' then 1.0
 					when m.result = 'no' then 0.0
+					else f.open
+				end
+			else f.open
+		end as open,
+		case
+			when m.expiration_time is not null
+				and f.hour > m.expiration_time
+				and m.result in ('yes', 'no')
+			then
+				case
+					when m.result = 'yes' then 1.0
+					when m.result = 'no' then 0.0
+					else f.high
+				end
+			else f.high
+		end as high,
+		case
+			when m.expiration_time is not null
+				and f.hour > m.expiration_time
+				and m.result in ('yes', 'no')
+			then
+				case
+					when m.result = 'yes' then 1.0
+					when m.result = 'no' then 0.0
+					else f.low
+				end
+			else f.low
+		end as low,
+		case
+			when m.expiration_time is not null
+				and f.hour > m.expiration_time
+				and m.result in ('yes', 'no')
+			then
+				case
+					when m.result = 'yes' then 1.0
+					when m.result = 'no' then 0.0
 					else f.close
 				end
 			else f.close

You can send follow-ups to the cloud agent here.

Comment thread dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql Outdated
Comment thread dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql Outdated
los-xyz and others added 2 commits April 12, 2026 18:07
Adds QUALIFY ROW_NUMBER() to keep only the latest row per event_ticker
from market_details_raw, preventing potential duplicate ticker rows if
the raw source ever contains multiple snapshots per event.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
los-xyz and others added 7 commits April 12, 2026 21:03
QUALIFY is not supported in Trino/DuneSQL. Rewrote event_details
deduplication as a subquery with ROW_NUMBER() + WHERE rn = 1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace JSON extraction `try(json_extract_scalar(product_metadata, '$.category'))`
  with the native `category` column now available in market_details_raw
  (100% populated, 19 distinct values).

- Add `mve_collection_ticker` from markets_raw to the gold layer. 81% of
  Kalshi markets are multivariate events (MVE); this column links sub-event
  markets to their parent MVE collection (e.g., KXMVECBCHAMPIONSHIP-R),
  enabling downstream grouping and filtering by collection.

- Update _schema.yml descriptions accordingly.
- Point all sources from *_raw to *_0004 tables
- Remove 2025 date constraint from OHLCV (full history)
- Cap hour spine at current hour (matches polymarket behavior)
- Use native category column from market_details_0004 instead of
  json_extract_scalar on product_metadata

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jeff-dude and others added 11 commits April 17, 2026 09:26
- kalshi_market_details: scope event_details dedupe via inner join to markets CTE
  (avoids full market_details_raw scan on incremental runs); switch to explicit
  column projection; rename watermark_ts -> source_updated_at to avoid confusion
  with pipeline-oriented _updated_at; add _updated_at = now() for operational
  freshness; drop post_hook.
- kalshi_market_trades: explicit column projection; leading commas; consistent
  inner-join pre-scoping; update reference to source_updated_at; add
  merge_skip_unchanged = true to skip no-op dimension refreshes (matches
  polymarket analog); drop post_hook.
- _schema.yml: rename watermark_ts column, add _updated_at to market_details,
  refresh descriptions.

Made-with: Cursor
…atic tag

- Drop kalshi_market_details.sql and kalshi_market_trades.sql — those
  land in PR #9549 instead; this PR depends on #9549 merging first.
- Revert sources/kalshi/_sources.yml (raw source declarations come from #9549).
- Trim _schema.yml to only the kalshi_ohlcv_hourly entry.
- Remove `static` tag from kalshi_ohlcv_hourly (config + schema) so the
  model refreshes with new data.
- Also remove `static` tag from polymarket_polygon.ohlcv_hourly for
  consistency.
Follow-up to 2e6d394 which landed only a partial snapshot. This commit
completes the review:

- kalshi_market_details: drop ci-stamp and post_hook; switch event_details
  pre-filter from `in (select ...)` to inner join (consistent with trades);
  add `now() as _updated_at` for pipeline-time freshness.
- kalshi_market_trades: drop ci-stamp and post_hook; reformat config block
  to single-line style; add merge_skip_unchanged = true; explicit column
  projection; leading commas throughout; newline after SQL keywords.
- _schema.yml: document _updated_at column on kalshi_market_details.

Made-with: Cursor
…s' into feat/kalshi-ohlcv-hourly

Made-with: Cursor

# Conflicts:
#	dbt_subprojects/daily_spellbook/models/_projects/kalshi/_schema.yml
…rd-fill anchor

Preserves main-branch logic while switching materialization from table to
incremental merge. Adds a prior_sparse CTE that re-reads real-trade aggregates
from {{ this }} outside the incremental window, unioned with new sparse
aggregates so market_bounds and the asof forward-fill stay correct across the
window boundary. Adds block_month partition, _updated_at, and destination
incremental_predicate for target pruning.

Made-with: Cursor
@jeff-dude
Copy link
Copy Markdown
Member

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Resolution correction applied to all OHLCV fields inconsistently
    • Removed resolution correction from open, high, and low fields in Kalshi model to match Polymarket's behavior of only correcting the close price.

Create PR

Or push these changes by commenting:

@cursor push ac418747a5
Preview (ac418747a5)
diff --git a/dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql b/dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql
--- a/dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql
+++ b/dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql
@@ -147,32 +147,14 @@
 		f.market_id,
 		f.outcome,
 		f.market_name,
+		f.open,
+		f.high,
+		f.low,
 		case
 			when m.expiration_time is not null
 				and f.hour > m.expiration_time
 				and m.result in ('yes', 'no')
 			then case when m.result = 'yes' then 1.0 else 0.0 end
-			else f.open
-		end as open,
-		case
-			when m.expiration_time is not null
-				and f.hour > m.expiration_time
-				and m.result in ('yes', 'no')
-			then case when m.result = 'yes' then 1.0 else 0.0 end
-			else f.high
-		end as high,
-		case
-			when m.expiration_time is not null
-				and f.hour > m.expiration_time
-				and m.result in ('yes', 'no')
-			then case when m.result = 'yes' then 1.0 else 0.0 end
-			else f.low
-		end as low,
-		case
-			when m.expiration_time is not null
-				and f.hour > m.expiration_time
-				and m.result in ('yes', 'no')
-			then case when m.result = 'yes' then 1.0 else 0.0 end
 			else f.close
 		end as close,
 		f.vwap,

You can send follow-ups to the cloud agent here.

Comment thread dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_ohlcv_hourly.sql Outdated
Made-with: Cursor

# Conflicts:
#	dbt_subprojects/daily_spellbook/models/_projects/kalshi/_schema.yml
#	dbt_subprojects/daily_spellbook/models/_projects/kalshi/kalshi_market_trades.sql
…eserve invariant

Polymarket previously corrected only close, which violated low <= {open, close}
<= high on post-expiration forward-filled hours (e.g. open=0.73, close=1.00).
Widen correction to open/high/low so every candle holds the OHLCV invariant,
aligning Polymarket's behavior with Kalshi. Factor the settlement value into a
single settled_price column consumed via coalesce; Kalshi refactored to the
same shape for structural symmetry (output unchanged).

Made-with: Cursor
@jeff-dude
Copy link
Copy Markdown
Member

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 72b2402. Configure here.

…ion fires

The prior `try_cast(substring(market_end_time from 1 for 19) as timestamp)`
silently returned null for values like `2025-12-31T00:00:00Z` (Trino's
timestamp cast does not accept the `T` separator), so `market_end_time_ts`
was always null and the resolution CASE never applied. Switch to
`try(from_iso8601_timestamp(market_end_time))` so ~8.9M post-expiration
rows on resolved markets are correctly forced to 0.0/1.0 across all four
OHLC fields.

Made-with: Cursor
@jeff-dude jeff-dude merged commit ff4d78b into main Apr 17, 2026
3 checks passed
@jeff-dude jeff-dude deleted the feat/kalshi-ohlcv-hourly branch April 17, 2026 20:21
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

dbt: daily covers the Daily dbt subproject ready-for-review this PR development is complete, please review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants