An ELT (Extract-Load-Transform) pipeline that retrieves historical daily temperature data from the Open-Meteo API, loads it into a DuckDB database, and optionally exports it to a .parquet file for analysis in tools like Power BI.
This project is ideal for climate analysts, researchers, and data scientists investigating weather trends around the world and comparing recent years with past decades.
- Extracts weather data (min/max temperatures) using city name and year
- Automatically geocodes cities using Open-Meteo's geocoding API
- Loads data into DuckDB
- Exports all collected data as a Parquet file for analysis
- CLI-powered: simple and scriptable
- Minimal setup required
This project uses [uv](https://github.com/astral-sh/uv) for managing Python dependencies.
Install `uv` (only once):
pip install uv
Create a virtual environment:
uv venv
Activate the virtual environment:
# On Windows: .venv\Scripts\activate # On macOS/Linux: source .venv/bin/activate
Install dependencies from `pyproject.toml`:
uv pip install -r pyproject.toml
Use the CLI to extract weather data for any city and year.
Extract + Load (to DB):
python . get <city> <year>Exampl
python . get Lisbon 2024This:
- Geocodes the city
- Extracts daily weather data for all months in that year
- Loads the results into data/weather.duckdb
If data for that city/year already exists in the database, it will be skipped.
Once you’ve collected data for all the cities and years you want:
python . exportThis exports all contents of the DuckDB database to:
data/export/weather.parquet
You can now load this file directly into Power BI or any other analysis tool that supports Parquet format.
- License: MIT
- Created with Cookiecutter and the audreyr/cookiecutter-pypackage template