Skip to content

pnl_change_daily and pnl_change_monthly

Per-user PnL change (delta), not the level. Each row is the change in mark-to-market PnL over the calendar day (or month) for one user. Useful for return-style analyses where you'd otherwise have to first-difference the level series in pnl_daily yourself.

Layout

Subset Path
pnl_change_daily pnl_change_daily/year=YYYY/month=MM/day=DD/data.parquet
pnl_change_monthly pnl_change_monthly.parquet

Load

import polars as pl

daily   = pl.scan_parquet("pnl_change_daily/**/*.parquet", hive_partitioning=False)
monthly = pl.read_parquet("pnl_change_monthly.parquet")
from datasets import load_dataset

daily   = load_dataset("vgregoire/polymarket-users", "pnl_change_daily")
monthly = load_dataset("vgregoire/polymarket-users", "pnl_change_monthly")

Schema

pnl_change_daily

Column Type Description
user_address str End-user wallet
day datetime Calendar day (UTC)
pnl_change float64 Day-over-day change in mark-to-market PnL

pnl_change_monthly

Column Type Description
user_address str End-user wallet
month datetime First day of the calendar month (UTC)
pnl_change float64 Month-over-month change in mark-to-market PnL

+1 day right-boundary labelling

day and month use the same +1-day right-boundary convention as pnl_daily.snapshot_time — see the Time convention section on the home page. A row labelled day = 2025-06-15 00:00 UTC is the change accumulated during 2025-06-14, not during 2025-06-15. The Hive partition path always matches the column value.

Relation to pnl_daily

pnl_daily stores the cumulative PnL level, sparsely (only rows where it changed). pnl_change_daily stores the daily delta. By construction pnl_change_daily(day = X) = pnl_daily(snapshot_time = X) - pnl_daily(snapshot_time = X − 1) (treating absent rows as forward-filled). Summing pnl_change_daily for a user from the start of the sample reconstructs the level series.