pnl_daily and pnl_category_daily¶
Sparse delta-encoded daily PnL panels. Only rows where PnL changed that day are stored. To get a dense daily series, forward-fill from the last observation (see Recipes).
Layout¶
Hive-partitioned by date:
pnl_daily/year=YYYY/month=MM/day=DD/data.parquet
pnl_category_daily/year=YYYY/month=MM/day=DD/data.parquet
Load¶
import polars as pl
pnl = pl.scan_parquet("pnl_daily/**/*.parquet", hive_partitioning=False)
pnl_cat = pl.scan_parquet("pnl_category_daily/**/*.parquet", hive_partitioning=False)
Schema¶
pnl_daily¶
| Column | Type | Description |
|---|---|---|
user_address |
str |
End-user wallet |
snapshot_time |
datetime |
End-of-day timestamp (UTC, +1 day convention for ASOF joins) |
pnl |
float64 |
Mark-to-market portfolio_value + usdc_balance |
portfolio_value |
float64 |
Value of open positions at market mid |
usdc_balance |
float64 |
USDC cash account |
pnl_category_daily¶
| Column | Type | Description |
|---|---|---|
user_address |
str |
End-user wallet |
snapshot_time |
datetime |
End-of-day timestamp |
category |
str |
One of Sports, Crypto, Finance, Politics, Tech, Culture, Weather |
portfolio_value |
float64 |
Value of open positions in this category |
usdc_balance |
float64 |
Category-attributed USDC balance |
pnl |
float64 |
portfolio_value + usdc_balance |
Untagged markets excluded
pnl_category_daily excludes markets without a category label. As a
result, summing pnl_category_daily across categories for a given
user-day will not in general equal the same user-day's row in
pnl_daily. The difference is PnL from untagged markets.
Variant panels¶
The same per-(user, day) panel and its per-(user, category, day)
companion are also shipped restricted to three useful market subsets.
Use these to reproduce the resolved-only, no-fee, or resolved-no-fee
variants of the paper-profits exhibits — the per-(user, category) user
set on the wide user_pnl_summary columns includes users with no
positions in that variant for that category (filled with 0.0), which
inflates denominators on concentration_by_category,
probit_regression, and pnl_spread_decomposition.
| Subset | Description |
|---|---|
pnl_daily_resolved |
pnl_daily restricted to markets whose close_time is on or before the sample end (2026-03-29). |
pnl_daily_no_fee |
pnl_daily restricted to markets with no taker fees — predates the Q4 2024 fee introduction. Equivalent to filtering markets.has_fee = False. |
pnl_daily_resolved_no_fee |
Intersection of the two filters above. |
pnl_category_daily_resolved |
pnl_category_daily with the same close_time filter. |
pnl_category_daily_no_fee |
pnl_category_daily restricted to markets with no taker fees. |
pnl_category_daily_resolved_no_fee |
Intersection of the two filters above. |
Schema and layout are identical to the corresponding base panel; only the subset of markets contributing to each row differs.
Snapshot time convention (+1 day right-boundary)
snapshot_time is labelled with the right boundary of the day it
summarizes. A row with snapshot_time = 2024-03-30 00:00:00 UTC is
the close of 2024-03-29 (equivalently, the open of 2024-03-30).
The Hive partition path matches the column value (day=30 ↔
2024-03-30 00:00 UTC), so the partition is one day after the
calendar day whose state is captured. The convention is chosen for
compatibility with polars.join_asof against daily price grids — see
the Time convention section on the
home page for the full picture including the pnl_change_* panels.
Why sparse?¶
Storing PnL only when it changes shrinks the panel by an order of magnitude without information loss. A user who hasn't traded in 6 months gets one row at the start of the period and one when they resume; everything in between can be forward-filled. See the Recipes page for a one-shot polars idiom.