Skip to content

predictions

Per-conditional-token lookup table. One row per token (each multi-outcome market has two or more tokens; a binary market has one Yes token and one No token). Use this to join trades (which carries prediction_id) back to market-level metadata.

Layout

Path Format
predictions.parquet Single parquet file

Load

import polars as pl
predictions = pl.read_parquet("predictions.parquet")
from datasets import load_dataset
predictions = load_dataset("vgregoire/polymarket-users", "predictions")

Schema

Column Type Description
prediction_id str The conditional token id (matches trades.prediction_id)
market_id int64 Parent market identifier (matches markets.market_id)
outcome str Outcome label (e.g., "Yes", "Trump", team name, …)
outcome_idx int Zero-based outcome index within the market
n_outcomes int Total number of outcome tokens in the parent market
negative_token str The complementary token id (the other side, for two-outcome markets)
winner bool Whether this outcome resolved as the winner (nullable until resolution)

Common joins

import polars as pl

trades = pl.scan_parquet("trades/**/*.parquet", hive_partitioning=False)
predictions = pl.read_parquet("predictions.parquet").lazy()
markets = pl.read_parquet("markets.parquet").lazy()

# Trade-level outcome label + parent market category
enriched = (
    trades
    .drop("market_id")  # use the int64 market_id from predictions instead
    .join(predictions, on="prediction_id", how="left")
    .join(markets.select("market_id", "category"), on="market_id", how="left")
    .collect()
)