Skip to content

pnl_daily and pnl_category_daily

Sparse delta-encoded daily PnL panels. Only rows where PnL changed that day are stored. To get a dense daily series, forward-fill from the last observation (see Recipes).

Layout

Hive-partitioned by date:

pnl_daily/year=YYYY/month=MM/day=DD/data.parquet
pnl_category_daily/year=YYYY/month=MM/day=DD/data.parquet

Load

import polars as pl

pnl = pl.scan_parquet("pnl_daily/**/*.parquet", hive_partitioning=False)
pnl_cat = pl.scan_parquet("pnl_category_daily/**/*.parquet", hive_partitioning=False)
from datasets import load_dataset

pnl = load_dataset("vgregoire/polymarket-users", "pnl_daily")

Schema

pnl_daily

Column Type Description
user_address str End-user wallet
snapshot_time datetime End-of-day timestamp (UTC, +1 day convention for ASOF joins)
pnl float64 Mark-to-market portfolio_value + usdc_balance
portfolio_value float64 Value of open positions at market mid
usdc_balance float64 USDC cash account

pnl_category_daily

Column Type Description
user_address str End-user wallet
snapshot_time datetime End-of-day timestamp
category str One of Sports, Crypto, Finance, Politics, Tech, Culture, Weather
portfolio_value float64 Value of open positions in this category
usdc_balance float64 Category-attributed USDC balance
pnl float64 portfolio_value + usdc_balance

Untagged markets excluded

pnl_category_daily excludes markets without a category label. As a result, summing pnl_category_daily across categories for a given user-day will not in general equal the same user-day's row in pnl_daily. The difference is PnL from untagged markets.

Variant panels

The same per-(user, day) panel and its per-(user, category, day) companion are also shipped restricted to three useful market subsets. Use these to reproduce the resolved-only, no-fee, or resolved-no-fee variants of the paper-profits exhibits — the per-(user, category) user set on the wide user_pnl_summary columns includes users with no positions in that variant for that category (filled with 0.0), which inflates denominators on concentration_by_category, probit_regression, and pnl_spread_decomposition.

Subset Description
pnl_daily_resolved pnl_daily restricted to markets whose close_time is on or before the sample end (2026-03-29).
pnl_daily_no_fee pnl_daily restricted to markets with no taker fees — predates the Q4 2024 fee introduction. Equivalent to filtering markets.has_fee = False.
pnl_daily_resolved_no_fee Intersection of the two filters above.
pnl_category_daily_resolved pnl_category_daily with the same close_time filter.
pnl_category_daily_no_fee pnl_category_daily restricted to markets with no taker fees.
pnl_category_daily_resolved_no_fee Intersection of the two filters above.

Schema and layout are identical to the corresponding base panel; only the subset of markets contributing to each row differs.

Snapshot time convention (+1 day right-boundary)

snapshot_time is labelled with the right boundary of the day it summarizes. A row with snapshot_time = 2024-03-30 00:00:00 UTC is the close of 2024-03-29 (equivalently, the open of 2024-03-30). The Hive partition path matches the column value (day=302024-03-30 00:00 UTC), so the partition is one day after the calendar day whose state is captured. The convention is chosen for compatibility with polars.join_asof against daily price grids — see the Time convention section on the home page for the full picture including the pnl_change_* panels.

Why sparse?

Storing PnL only when it changes shrinks the panel by an order of magnitude without information loss. A user who hasn't traded in 6 months gets one row at the start of the period and one when they resume; everything in between can be forward-filled. See the Recipes page for a one-shot polars idiom.