Skip to content

ohlcv_1d, ohlcv_1h, ohlcv_5m

Per-token OHLCV bars at three frequencies. Bars are emitted only for (token, time-bucket) pairs that had at least one trade — there is no forward-filling, no resolution rows, no synthetic flat bars.

Frequency Layout Open interest?
1d (daily) ohlcv_1d.parquet ✅ yes
1h (hourly) ohlcv_1h/year=YYYY/month=MM/day=DD/data.parquet ❌ no — no 1h position cache exists
5m (5-minute) ohlcv_5m/year=YYYY/month=MM/day=DD/data.parquet ❌ no — no 5m position cache exists

Load

import polars as pl

# Daily: single file
daily = pl.read_parquet("ohlcv_1d.parquet")

# Sub-daily: Hive-partitioned by date
hourly = pl.scan_parquet("ohlcv_1h/**/*.parquet", hive_partitioning=False)
five_min = pl.scan_parquet("ohlcv_5m/**/*.parquet", hive_partitioning=False)

Schema

All three frequencies share the same core columns:

Column Type Description
prediction_id str Conditional token identifier (matches predictions.prediction_id)
timestamp datetime[ns, UTC] Start of the bar, truncated to the bar's frequency (daily bars start at midnight UTC)
market_id str Parent market (matches markets.market_id, cast as string)
outcome str Outcome label
open float64 First trade price in the bar
high float64 Highest trade price in the bar
low float64 Lowest trade price in the bar
close float64 Last trade price in the bar
volume float64 Sum of trade quantities in the bar
trade_count int64 Number of trades in the bar

ohlcv_1d only adds:

Column Type Description
open_interest float64? Sum of strictly-positive user positions on this token at the close of the day (null if no positions reconstructed)

Why no open interest at sub-daily frequencies?

Open interest is derived from per-token position snapshots and requires a position cache at the same frequency as the bars. Today only a daily position cache is built; computing 1h or 5m position panels would significantly enlarge the underlying caches. If you need sub-daily OI, you can reconstruct it from trades (taker_bought is the side flag) plus the daily OI as a level anchor.

Volume notes

For binary markets, each "Yes" and "No" token has its own row (different prediction_id). The sum of volume across all outcomes of a market in the same bar is the market's total contract volume; multiply by the relevant price for dollar volume.

import polars as pl

bars = pl.scan_parquet("ohlcv_1h/**/*.parquet", hive_partitioning=False)

# Dollar volume per market-hour
dollar_volume = (
    bars
    .with_columns(
        (pl.col("volume") * (pl.col("open") + pl.col("close")) / 2)
        .alias("dollar_volume_approx")
    )
    .group_by(["market_id", "timestamp"])
    .agg(pl.col("dollar_volume_approx").sum())
    .collect()
)