Datasets¶
The Polymarket Users dataset is split into several related tables. Each
serves a different access pattern; they share the same user_address,
prediction_id, market_id, and event_id identifiers and can be joined
freely.
Map¶
metadata features PnL trade data
───────── ───────── ────── ──────────
markets user_features user_pnl_summary trades
events pnl_daily ohlcv_1d
pnl_category_daily ohlcv_1h
ohlcv_5m
Pick the right PnL table¶
Three tables expose PnL at different granularities. Pick the smallest one that satisfies your access pattern:
| If you want… | Use |
|---|---|
| A single number per user (terminal PnL across variants, ± portfolio/cash split for base) | user_pnl_summary |
| Per-user time-series PnL | pnl_daily — needs forward-fill |
| Per-(user, category) time-series PnL | pnl_category_daily |
Identifiers¶
| Column | Type | Source | Joins to |
|---|---|---|---|
user_address |
str |
Polygon wallet address (end user, after reconciliation) | All user-level tables |
prediction_id |
str |
CTF Exchange conditional token id | trades.prediction_id, ohlcv_*.prediction_id, predictions.prediction_id |
market_id |
int64 in markets / predictions, str in trades / ohlcv_* |
Polymarket internal market id | markets.market_id, predictions.market_id, trades.market_id, ohlcv_*.market_id — cast trades' string id to int64 when joining back to markets |
event_id |
str |
Polymarket internal event id | events.event_id, markets.event_id, trades.event_id |
category |
str |
Canonical category, tag-based (Sports, Crypto, Finance, Politics, Tech, Culture, Weather; or Untagged) |
markets, events, trades, pnl_category_daily* |
See the Methodology page for how categories are assigned and how end-user addresses are reconciled from the CTF Exchange event stream.