predictions
Per-conditional-token lookup table. One row per token (each multi-outcome
market has two or more tokens; a binary market has one Yes token and one
No token). Use this to join trades (which carries prediction_id) back
to market-level metadata.
Layout
| Path |
Format |
predictions.parquet |
Single parquet file |
Load
import polars as pl
predictions = pl.read_parquet("predictions.parquet")
from datasets import load_dataset
predictions = load_dataset("vgregoire/polymarket-users", "predictions")
Schema
| Column |
Type |
Description |
prediction_id |
str |
The conditional token id (matches trades.prediction_id) |
market_id |
int64 |
Parent market identifier (matches markets.market_id) |
outcome |
str |
Outcome label (e.g., "Yes", "Trump", team name, …) |
outcome_idx |
int |
Zero-based outcome index within the market |
n_outcomes |
int |
Total number of outcome tokens in the parent market |
negative_token |
str |
The complementary token id (the other side, for two-outcome markets) |
winner |
bool |
Whether this outcome resolved as the winner (nullable until resolution) |
Common joins
import polars as pl
trades = pl.scan_parquet("trades/**/*.parquet", hive_partitioning=False)
predictions = pl.read_parquet("predictions.parquet").lazy()
markets = pl.read_parquet("markets.parquet").lazy()
# Trade-level outcome label + parent market category
enriched = (
trades
.drop("market_id") # use the int64 market_id from predictions instead
.join(predictions, on="prediction_id", how="left")
.join(markets.select("market_id", "category"), on="market_id", how="left")
.collect()
)