Skip to content

Datasets

The Polymarket Users dataset is split into several related tables. Each serves a different access pattern; they share the same user_address, prediction_id, market_id, and event_id identifiers and can be joined freely.

Map

metadata         features          PnL                    trade data
─────────        ─────────         ──────                 ──────────
markets          user_features     user_pnl_summary       trades
events                             pnl_daily              ohlcv_1d
                                   pnl_category_daily     ohlcv_1h
                                                          ohlcv_5m

Pick the right PnL table

Three tables expose PnL at different granularities. Pick the smallest one that satisfies your access pattern:

If you want… Use
A single number per user (terminal PnL across variants, ± portfolio/cash split for base) user_pnl_summary
Per-user time-series PnL pnl_daily — needs forward-fill
Per-(user, category) time-series PnL pnl_category_daily

Identifiers

Column Type Source Joins to
user_address str Polygon wallet address (end user, after reconciliation) All user-level tables
prediction_id str CTF Exchange conditional token id trades.prediction_id, ohlcv_*.prediction_id, predictions.prediction_id
market_id int64 in markets / predictions, str in trades / ohlcv_* Polymarket internal market id markets.market_id, predictions.market_id, trades.market_id, ohlcv_*.market_id — cast trades' string id to int64 when joining back to markets
event_id str Polymarket internal event id events.event_id, markets.event_id, trades.event_id
category str Canonical category, tag-based (Sports, Crypto, Finance, Politics, Tech, Culture, Weather; or Untagged) markets, events, trades, pnl_category_daily*

See the Methodology page for how categories are assigned and how end-user addresses are reconciled from the CTF Exchange event stream.