Skip to content

BaseMax/data-polars-performance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data-polars-performance

Production-grade, end-to-end benchmarking framework for Polars at scale.

Purpose

  • Provide reproducible, realistic benchmarks for Polars across 1e7–1e9 rows.
  • Focus on OLAP aggregations, time-series joins, and windowed analytics with Parquet-backed IO and lazy execution.

Quickstart

  1. Install dependencies (Poetry recommended):
poetry install
  1. Generate synthetic dataset (edit config/dataset.toml first):
poetry run python scripts/generate_data.py
  1. Run benchmarks (edit config/benchmark.toml to tune):
poetry run python scripts/run_benchmarks.py
  1. Run tests and type checks:
poetry run pytest
poetry run mypy src/

Repository layout

config/           # dataset and benchmark configs (TOML)
data/             # generated and parquet files
src/              # core library: dataset, benchmarks, engine, metrics
scripts/          # small CLI helpers to generate and run benchmarks
results/          # raw numeric outputs and reports
tests/             # unit tests
pyproject.toml
Makefile
README.md
LICENSE

Design notes

  • Data is written to Parquet (Snappy/ZSTD) and read with Polars lazy scans to measure realistic IO + CPU behavior.
  • Benchmarks measure wall-time and peak memory; each benchmark repeats multiple times and writes raw results to results/raw/.
  • The codebase is modular so you can add new workloads under src/benchmarks/ and new data generators under src/dataset/.

Hardware guidance

  • Target: Linux/WSL2, 8+ cores, 32–128GB RAM, NVMe SSD. Benchmarks will scale across machines without code changes.

Contributing

  • Open an issue or PR on GitHub. Add tests for new benchmarks and run mypy.

License

This project is released under the MIT License — see LICENSE.

Copyright

2025 Seyyed Ali Mohammadiyeh

About

Production-grade, end-to-end benchmarking framework for Polars at scale.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published