Skip to content

trades

All reconciled end-user trades. End-user maker and taker addresses are recovered from the CTF Exchange OrderFilled event stream — raw event addresses are typically smart-contract operators, and reconciliation attributes each fill to the actual wallet that initiated it.

Layout

Hive-partitioned by date — one file per day, coalesced from the underlying hourly cache:

trades/year=YYYY/month=MM/day=DD/data.parquet

Load

import polars as pl

# Scan the whole sample lazily
trades = pl.scan_parquet("trades/**/*.parquet", hive_partitioning=False)

# Or pull one specific day
day = pl.read_parquet("trades/year=2024/month=11/day=05/data.parquet")
from datasets import load_dataset
trades = load_dataset("vgregoire/polymarket-users", "trades")

Schema

Column Type Description
trade_id str Transaction hash on Polygon
timestamp datetime[ns, UTC] Block timestamp
market_id str Market identifier (matches markets.market_id, cast as string)
event_id str Parent event identifier (matches events.event_id)
prediction_id str The conditional token traded (matches predictions.prediction_id)
outcome str Outcome label (e.g., "Yes", "Trump", team name, …)
winner bool Whether this outcome resolved as the winner (null until resolution)
category str Canonical category, inherited from the parent event
category_original str Original Polymarket platform tag before mapping
price float64 Trade price in USDC per conditional token share
quantity float64 Trade quantity in conditional token shares
maker_address str Reconciled end-user maker wallet
taker_address str Reconciled end-user taker wallet
taker_bought bool True if taker bought conditional tokens, False if sold

Daily volumes

The full sample contains tens of millions of trades. Scanning lazily and projecting only the columns you need avoids materializing the whole dataset:

import polars as pl

# Daily volume per category, 2024-2026
volume = (
    pl.scan_parquet("trades/**/*.parquet", hive_partitioning=False)
    .with_columns(pl.col("timestamp").dt.date().alias("date"))
    .group_by(["date", "category"])
    .agg((pl.col("price") * pl.col("quantity")).sum().alias("volume_usd"))
    .collect()
)