pnl_daily and pnl_category_daily¶
Sparse delta-encoded daily PnL panels. Only rows where PnL changed that day are stored. To get a dense daily series, forward-fill from the last observation (see Recipes).
Layout¶
Hive-partitioned by date:
pnl_daily/year=YYYY/month=MM/day=DD/data.parquet
pnl_category_daily/year=YYYY/month=MM/day=DD/data.parquet
Load¶
import polars as pl
pnl = pl.scan_parquet("pnl_daily/**/*.parquet", hive_partitioning=False)
pnl_cat = pl.scan_parquet("pnl_category_daily/**/*.parquet", hive_partitioning=False)
Schema¶
pnl_daily¶
| Column | Type | Description |
|---|---|---|
user_address |
str |
End-user wallet |
snapshot_time |
datetime |
End-of-day timestamp (UTC, +1 day convention for ASOF joins) |
pnl |
float64 |
Mark-to-market portfolio_value + usdc_balance |
portfolio_value |
float64 |
Value of open positions at market mid |
usdc_balance |
float64 |
USDC cash account |
pnl_category_daily¶
| Column | Type | Description |
|---|---|---|
user_address |
str |
End-user wallet |
snapshot_time |
datetime |
End-of-day timestamp |
category |
str |
One of Sports, Crypto, Finance, Politics, Tech, Culture, Weather |
portfolio_value |
float64 |
Value of open positions in this category |
usdc_balance |
float64 |
Category-attributed USDC balance |
pnl |
float64 |
portfolio_value + usdc_balance |
Untagged markets excluded
pnl_category_daily excludes markets without a category label. As a
result, summing pnl_category_daily across categories for a given
user-day will not in general equal the same user-day's row in
pnl_daily. The difference is PnL from untagged markets.
Variant panels¶
The same per-(user, day) panel and its per-(user, category, day) companion are also shipped restricted to three useful market subsets. Use these to reproduce the resolved-only, no-fee, or resolved-no-fee variants of the paper-profits exhibits.
| Subset | Description |
|---|---|
pnl_daily_resolved |
pnl_daily restricted to markets whose close_time is on or before the sample end (2026-03-29). |
pnl_daily_no_fee |
pnl_daily restricted to markets with no taker fees — predates the Q4 2024 fee introduction. Equivalent to filtering markets.has_fee = False. |
pnl_daily_resolved_no_fee |
Intersection of the two filters above. |
pnl_category_daily_resolved |
pnl_category_daily with the same close_time filter. |
pnl_category_daily_no_fee |
pnl_category_daily restricted to markets with no taker fees. |
pnl_category_daily_resolved_no_fee |
Intersection of the two filters above. |
Schema and layout are identical to the corresponding base panel.
Variant filtering happens at the position level (v1.1)
Starting in v1.1, the variant filter is applied to the underlying trades when positions are constructed — not at PnL aggregation time. A trade on a market outside the variant's subset contributes nothing to the variant: no token position, no USDC movement, no settlement, not even if the user later sold back to flat. This makes each variant's terminal PnL structurally zero-sum within its market subset (modulo platform-collected fees and non-user counterparties). Users who only traded outside a variant's subset correctly do not appear in that variant's panels.
Snapshot time convention (+1 day right-boundary)
snapshot_time is labelled with the right boundary of the day it
summarizes. A row with snapshot_time = 2024-03-30 00:00:00 UTC is
the close of 2024-03-29 (equivalently, the open of 2024-03-30).
The Hive partition path matches the column value (day=30 ↔
2024-03-30 00:00 UTC), so the partition is one day after the
calendar day whose state is captured. The convention is chosen for
compatibility with polars.join_asof against daily price grids — see
the Time convention section on the
home page for the full picture including the pnl_change_* panels.
Why sparse?¶
Storing PnL only when it changes shrinks the panel by an order of magnitude without information loss. A user who hasn't traded in 6 months gets one row at the start of the period and one when they resume; everything in between can be forward-filled. See the Recipes page for a one-shot polars idiom.