ohlcv_1d, ohlcv_1h, ohlcv_5m¶
Per-token OHLCV bars at three frequencies. Bars are emitted only for (token, time-bucket) pairs that had at least one trade — there is no forward-filling, no resolution rows, no synthetic flat bars.
| Frequency | Layout | Open interest? |
|---|---|---|
1d (daily) |
ohlcv_1d.parquet |
✅ yes |
1h (hourly) |
ohlcv_1h/year=YYYY/month=MM/day=DD/data.parquet |
❌ no — no 1h position cache exists |
5m (5-minute) |
ohlcv_5m/year=YYYY/month=MM/day=DD/data.parquet |
❌ no — no 5m position cache exists |
Load¶
import polars as pl
# Daily: single file
daily = pl.read_parquet("ohlcv_1d.parquet")
# Sub-daily: Hive-partitioned by date
hourly = pl.scan_parquet("ohlcv_1h/**/*.parquet", hive_partitioning=False)
five_min = pl.scan_parquet("ohlcv_5m/**/*.parquet", hive_partitioning=False)
Schema¶
All three frequencies share the same core columns:
| Column | Type | Description |
|---|---|---|
prediction_id |
str |
Conditional token identifier (matches predictions.prediction_id) |
timestamp |
datetime[ns, UTC] |
Start of the bar, truncated to the bar's frequency (daily bars start at midnight UTC) |
market_id |
str |
Parent market (matches markets.market_id, cast as string) |
outcome |
str |
Outcome label |
open |
float64 |
First trade price in the bar |
high |
float64 |
Highest trade price in the bar |
low |
float64 |
Lowest trade price in the bar |
close |
float64 |
Last trade price in the bar |
volume |
float64 |
Sum of trade quantities in the bar |
trade_count |
int64 |
Number of trades in the bar |
ohlcv_1d only adds:
| Column | Type | Description |
|---|---|---|
open_interest |
float64? |
Sum of strictly-positive user positions on this token at the close of the day (null if no positions reconstructed) |
Why no open interest at sub-daily frequencies?¶
Open interest is derived from per-token position snapshots and requires a
position cache at the same frequency as the bars. Today only a daily
position cache is built; computing 1h or 5m position panels would
significantly enlarge the underlying caches. If you need sub-daily OI, you
can reconstruct it from trades (taker_bought is the side flag) plus
the daily OI as a level anchor.
Volume notes¶
For binary markets, each "Yes" and "No" token has its own row (different
prediction_id). The sum of volume across all outcomes of a market in
the same bar is the market's total contract volume; multiply by the
relevant price for dollar volume.
import polars as pl
bars = pl.scan_parquet("ohlcv_1h/**/*.parquet", hive_partitioning=False)
# Dollar volume per market-hour
dollar_volume = (
bars
.with_columns(
(pl.col("volume") * (pl.col("open") + pl.col("close")) / 2)
.alias("dollar_volume_approx")
)
.group_by(["market_id", "timestamp"])
.agg(pl.col("dollar_volume_approx").sum())
.collect()
)