Skip to content

Polymarket Users Dataset

trades

`trades`¶

All reconciled end-user trades. End-user maker and taker addresses are recovered from the CTF Exchange OrderFilled event stream — raw event addresses are typically smart-contract operators, and reconciliation attributes each fill to the actual wallet that initiated it.

Layout¶

Hive-partitioned by date — one file per day, coalesced from the underlying hourly cache:

trades/year=YYYY/month=MM/day=DD/data.parquet

Load¶

import polars as pl

# Scan the whole sample lazily
trades = pl.scan_parquet("trades/**/*.parquet", hive_partitioning=False)

# Or pull one specific day
day = pl.read_parquet("trades/year=2024/month=11/day=05/data.parquet")

from datasets import load_dataset
trades = load_dataset("vgregoire/polymarket-users", "trades")

Schema¶

Column	Type	Description
`trade_id`	`str`	Transaction hash on Polygon
`timestamp`	`datetime[ns, UTC]`	Block timestamp
`market_id`	`str`	Market identifier (matches `markets.market_id`, cast as string)
`event_id`	`str`	Parent event identifier (matches `events.event_id`)
`prediction_id`	`str`	The conditional token traded (matches `predictions.prediction_id`)
`outcome`	`str`	Outcome label (e.g., `"Yes"`, `"Trump"`, team name, …)
`winner`	`bool`	Whether this outcome resolved as the winner (null until resolution)
`category`	`str`	Canonical category, inherited from the parent event
`category_original`	`str`	Original Polymarket platform tag before mapping
`price`	`float64`	Trade price in USDC per conditional token share
`quantity`	`float64`	Trade quantity in conditional token shares
`maker_address`	`str`	Reconciled end-user maker wallet
`taker_address`	`str`	Reconciled end-user taker wallet
`taker_bought`	`bool`	`True` if taker bought conditional tokens, `False` if sold

Daily volumes¶

The full sample contains tens of millions of trades. Scanning lazily and projecting only the columns you need avoids materializing the whole dataset:

import polars as pl

# Daily volume per category, 2024-2026
volume = (
    pl.scan_parquet("trades/**/*.parquet", hive_partitioning=False)
    .with_columns(pl.col("timestamp").dt.date().alias("date"))
    .group_by(["date", "category"])
    .agg((pl.col("price") * pl.col("quantity")).sum().alias("volume_usd"))
    .collect()
)