pnl_change_daily and pnl_change_monthly¶
Per-user PnL change (delta), not the level. Each row is the change in
mark-to-market PnL over the calendar day (or month) for one user. Useful
for return-style analyses where you'd otherwise have to first-difference
the level series in pnl_daily yourself.
Layout¶
| Subset | Path |
|---|---|
pnl_change_daily |
pnl_change_daily/year=YYYY/month=MM/day=DD/data.parquet |
pnl_change_monthly |
pnl_change_monthly.parquet |
Load¶
import polars as pl
daily = pl.scan_parquet("pnl_change_daily/**/*.parquet", hive_partitioning=False)
monthly = pl.read_parquet("pnl_change_monthly.parquet")
from datasets import load_dataset
daily = load_dataset("vgregoire/polymarket-users", "pnl_change_daily")
monthly = load_dataset("vgregoire/polymarket-users", "pnl_change_monthly")
Schema¶
pnl_change_daily¶
| Column | Type | Description |
|---|---|---|
user_address |
str |
End-user wallet |
day |
datetime |
Calendar day (UTC) |
pnl_change |
float64 |
Day-over-day change in mark-to-market PnL |
pnl_change_monthly¶
| Column | Type | Description |
|---|---|---|
user_address |
str |
End-user wallet |
month |
datetime |
First day of the calendar month (UTC) |
pnl_change |
float64 |
Month-over-month change in mark-to-market PnL |
+1 day right-boundary labelling
day and month use the same +1-day right-boundary convention as
pnl_daily.snapshot_time — see the
Time convention section on the home
page. A row labelled day = 2025-06-15 00:00 UTC is the change
accumulated during 2025-06-14, not during 2025-06-15. The Hive
partition path always matches the column value.
Relation to pnl_daily
pnl_daily stores the cumulative PnL level, sparsely (only rows
where it changed). pnl_change_daily stores the daily delta.
By construction pnl_change_daily(day = X) = pnl_daily(snapshot_time = X) - pnl_daily(snapshot_time = X − 1) (treating absent rows
as forward-filled). Summing pnl_change_daily for a user from the
start of the sample reconstructs the level series.