Skip to content

pnl_daily and pnl_category_daily

Sparse delta-encoded daily PnL panels. Only rows where PnL changed that day are stored. To get a dense daily series, forward-fill from the last observation (see Recipes).

Layout

Hive-partitioned by date:

pnl_daily/year=YYYY/month=MM/day=DD/data.parquet
pnl_category_daily/year=YYYY/month=MM/day=DD/data.parquet

Load

import polars as pl

pnl = pl.scan_parquet("pnl_daily/**/*.parquet", hive_partitioning=False)
pnl_cat = pl.scan_parquet("pnl_category_daily/**/*.parquet", hive_partitioning=False)
from datasets import load_dataset

pnl = load_dataset("vgregoire/polymarket-users", "pnl_daily")

Schema

pnl_daily

Column Type Description
user_address str End-user wallet
snapshot_time datetime End-of-day timestamp (UTC, +1 day convention for ASOF joins)
pnl float64 Mark-to-market portfolio_value + usdc_balance
portfolio_value float64 Value of open positions at market mid
usdc_balance float64 USDC cash account

pnl_category_daily

Column Type Description
user_address str End-user wallet
snapshot_time datetime End-of-day timestamp
category str One of Sports, Crypto, Finance, Politics, Tech, Culture, Weather
portfolio_value float64 Value of open positions in this category
usdc_balance float64 Category-attributed USDC balance
pnl float64 portfolio_value + usdc_balance

Untagged markets excluded

pnl_category_daily excludes markets without a category label. As a result, summing pnl_category_daily across categories for a given user-day will not in general equal the same user-day's row in pnl_daily. The difference is PnL from untagged markets.

Variant panels

The same per-(user, day) panel and its per-(user, category, day) companion are also shipped restricted to three useful market subsets. Use these to reproduce the resolved-only, no-fee, or resolved-no-fee variants of the paper-profits exhibits.

Subset Description
pnl_daily_resolved pnl_daily restricted to markets whose close_time is on or before the sample end (2026-03-29).
pnl_daily_no_fee pnl_daily restricted to markets with no taker fees — predates the Q4 2024 fee introduction. Equivalent to filtering markets.has_fee = False.
pnl_daily_resolved_no_fee Intersection of the two filters above.
pnl_category_daily_resolved pnl_category_daily with the same close_time filter.
pnl_category_daily_no_fee pnl_category_daily restricted to markets with no taker fees.
pnl_category_daily_resolved_no_fee Intersection of the two filters above.

Schema and layout are identical to the corresponding base panel.

Variant filtering happens at the position level (v1.1)

Starting in v1.1, the variant filter is applied to the underlying trades when positions are constructed — not at PnL aggregation time. A trade on a market outside the variant's subset contributes nothing to the variant: no token position, no USDC movement, no settlement, not even if the user later sold back to flat. This makes each variant's terminal PnL structurally zero-sum within its market subset (modulo platform-collected fees and non-user counterparties). Users who only traded outside a variant's subset correctly do not appear in that variant's panels.

Snapshot time convention (+1 day right-boundary)

snapshot_time is labelled with the right boundary of the day it summarizes. A row with snapshot_time = 2024-03-30 00:00:00 UTC is the close of 2024-03-29 (equivalently, the open of 2024-03-30). The Hive partition path matches the column value (day=302024-03-30 00:00 UTC), so the partition is one day after the calendar day whose state is captured. The convention is chosen for compatibility with polars.join_asof against daily price grids — see the Time convention section on the home page for the full picture including the pnl_change_* panels.

Why sparse?

Storing PnL only when it changes shrinks the panel by an order of magnitude without information loss. A user who hasn't traded in 6 months gets one row at the start of the period and one when they resume; everything in between can be forward-filled. See the Recipes page for a one-shot polars idiom.