trades¶
All reconciled end-user trades. End-user maker and taker addresses are
recovered from the CTF Exchange OrderFilled event stream — raw event
addresses are typically smart-contract operators, and reconciliation
attributes each fill to the actual wallet that initiated it.
Layout¶
Hive-partitioned by date — one file per day, coalesced from the underlying hourly cache:
Load¶
import polars as pl
# Scan the whole sample lazily
trades = pl.scan_parquet("trades/**/*.parquet", hive_partitioning=False)
# Or pull one specific day
day = pl.read_parquet("trades/year=2024/month=11/day=05/data.parquet")
Schema¶
| Column | Type | Description |
|---|---|---|
trade_id |
str |
Transaction hash on Polygon |
timestamp |
datetime[ns, UTC] |
Block timestamp |
market_id |
str |
Market identifier (matches markets.market_id, cast as string) |
event_id |
str |
Parent event identifier (matches events.event_id) |
prediction_id |
str |
The conditional token traded (matches predictions.prediction_id) |
outcome |
str |
Outcome label (e.g., "Yes", "Trump", team name, …) |
winner |
bool |
Whether this outcome resolved as the winner (null until resolution) |
category |
str |
Canonical category, inherited from the parent event |
category_original |
str |
Original Polymarket platform tag before mapping |
price |
float64 |
Trade price in USDC per conditional token share |
quantity |
float64 |
Trade quantity in conditional token shares |
maker_address |
str |
Reconciled end-user maker wallet |
taker_address |
str |
Reconciled end-user taker wallet |
taker_bought |
bool |
True if taker bought conditional tokens, False if sold |
Daily volumes¶
The full sample contains tens of millions of trades. Scanning lazily and projecting only the columns you need avoids materializing the whole dataset:
import polars as pl
# Daily volume per category, 2024-2026
volume = (
pl.scan_parquet("trades/**/*.parquet", hive_partitioning=False)
.with_columns(pl.col("timestamp").dt.date().alias("date"))
.group_by(["date", "category"])
.agg((pl.col("price") * pl.col("quantity")).sum().alias("volume_usd"))
.collect()
)