A fast Python package for reading NielsenIQ data from the Kilts Center and saving it as .parquet files.
- RetailReader — Retail Scanner Data (store-level weekly sales)
- PanelReader — Consumer Panel Data (household purchases)
Built on PyArrow (>= 17.0). All data is returned as PyArrow Tables for speed and memory efficiency. Convert to pandas anytime with .to_pandas().
uv pip install git+https://github.com/chrisconlon/kiltsnielsen
Requires Python >= 3.9. Installs pyarrow, pandas, and numpy automatically.
Get access through your institution at the Kilts Center, then download:
- Scanner data: Build extracts via the Kilts File Selection System. Available as
.tgzby group/module/year. - Panel data: Download from Globus. Available as
.tgzby year.
You can either extract the .tgz files or use them directly. If extracting, preserve the original directory structure.
Performance tip: For large scanner archives (5GB+), extracting is ~100x faster than reading from
.tgz. Panel archives (~500MB each) show little difference. Extract for repeated use; use.tgzdirectly for one-off reads only if storage is an issue.
from kiltsreader import RetailReader
from pathlib import Path
rr = RetailReader(Path('/path/to/scanner/data'))
rr.filter_years(drop=[2006, 2019])
rr.read_stores()
rr.filter_stores(keep_dmas=[506, 517], keep_channels=['F'])
rr.read_products(keep_modules=[1344])
rr.read_sales()
rr.write_data(Path('output'), stub='cereal')from kiltsreader import PanelReader
from pathlib import Path
pr = PanelReader(Path('/path/to/panel/data'))
pr.filter_years(keep=range(2007, 2014))
pr.read_retailers()
pr.read_products(keep_groups=[5002])
pr.read_annual(keep_states=['CT'])
pr.write_data(Path('output'), stub='ct_liquor')See Example.py for a more detailed walkthrough, and the API Guide for full method documentation.