Skip to content

About

Disclaimer

This is an independent academic research dataset. The authors are not affiliated with, endorsed by, or sponsored by Polymarket. "Polymarket" is a trademark of its respective owner; it is referenced here only to identify the source platform of the underlying public on-chain data.

Citation

If you use this dataset, please cite the companion paper:

@unpublished{akey2026prediction,
  title  = {Who Wins and Who Loses In Prediction Markets? Evidence from Polymarket},
  author = {Akey, Pat and Gr{\'e}goire, Vincent and Harvie, Nicolas and Martineau, Charles},
  note   = {Working Paper},
  year   = {2026},
  url    = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6443103}
}

Authors

  • Pat Akey — ESSEC Business School, CEPR, ECGI
  • Vincent Grégoire — HEC Montréal
  • Nicolas Harvie — Rotman School of Management, University of Toronto
  • Charles Martineau — Rotman School of Management & UTSC Management, University of Toronto

License

The processed data is released under CC-BY 4.0 — use, modify, and redistribute freely (including commercially); please cite the paper.

Scope. CC-BY 4.0 covers the authors' contribution: cleaning, end-user reconciliation, category classification, computed PnL, behavioral feature engineering, OHLCV aggregation, and the layout of the parquet files in this release.

It does not cover fields that originate from the Polymarket API — market question text, descriptions, slugs, raw platform tags, and lifecycle timestamps. Those fields are included for convenience and reproducibility but remain subject to Polymarket's terms of use. If you plan to redistribute those fields (e.g., re-publish a derivative dataset), consult Polymarket's terms directly.

No warranty. As CC-BY 4.0 §5 states, this dataset is provided "AS IS" without warranty of any kind, express or implied.

Source code

The processing pipeline that produced this dataset is maintained in a private research repository. Researchers needing access for reproducibility purposes can contact the authors via the Discussions tab on the dataset page or by email.

The full column-level schema for every subset in this release is documented in the Datasets section of this site — those pages are the canonical data dictionary.

Reporting issues

Please open a thread in the Discussions tab on the dataset page.

Acknowledgments

This research was undertaken, in part, thanks to funding from the Canada Research Chairs Program and from the Social Sciences and Humanities Research Council, Rotman FinHub, and the Ontario Securities Commission.