Skip to content

EslamMohamed365/weather-data-pipeline

Repository files navigation

Weather Data Pipeline

Production-ready ETL system for weather data using Open-Meteo API, PostgreSQL, and Streamlit

Python PostgreSQL Polars Streamlit Docker

OverviewQuick StartDashboardDocumentation


Overview

A complete ETL pipeline that extracts real-time weather data from the Open-Meteo API, transforms it using Polars DataFrames, loads it into PostgreSQL, and provides an interactive Streamlit dashboard for visualization.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Weather Data Pipeline                         │
└─────────────────────────────────────────────────────────────────┘

  EXTRACT              TRANSFORM              LOAD              VISUALIZE
     │                     │                    │                    │
┌────▼────┐          ┌────▼────┐         ┌────▼────┐         ┌─────▼────┐
│ Open-  │   JSON    │ Polars  │  Batch  │Postgre- │  Query  │Streamlit │
│ Meteo  │──────────►│ Engine  │────────►│ SQL 15  │────────►│Dashboard │
│  API   │           │         │         │         │         │          │
└─────────┘           └─────────┘         └─────────┘         └──────────┘
• Retry logic      • Validation        • Connection pool    • Plotly charts
• Rate limiting    • Type safety       • Idempotent writes  • Smart caching

Features

  • High Performance: Polars DataFrames process data 5-10x faster than pandas
  • Reliable: Automatic retry logic with exponential backoff
  • Secure: Parameterized queries prevent SQL injection
  • Scalable: Connection pooling supports 100+ cities
  • Interactive Dashboard: Three-page Streamlit app with filtering and visualizations

Quick Start

Prerequisites

  • Python 3.11+
  • Docker and Docker Compose

Setup

  1. Clone and install dependencies:
git clone <repository-url>
cd weather-data-pipeline

# With uv (recommended)
uv sync

# Or with pip
python -m venv .venv
source .venv/bin/activate
pip install -e .
  1. Start the database:
docker-compose up -d
  1. Run the pipeline:
uv run python src/pipeline.py
  1. Launch the dashboard:
uv run streamlit run dashboard/app.py

Tip

Access the dashboard at http://localhost:8501 and pgAdmin at http://localhost:5050

Default Cities

The pipeline fetches weather data for 5 cities by default:

  • Cairo, London, Tokyo, New York, Sydney

Environment Variables

Default values work for local development. Create a .env file if needed:

cp .env.example .env

Key variables:

  • DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD - Database connection
  • API_BASE_URL - Open-Meteo API endpoint (default provided)
  • DASHBOARD_PORT - Streamlit port (default: 8501)

Dashboard

The Streamlit dashboard provides three pages:

Page Description
Current Conditions Real-time weather with temperature, humidity, wind, precipitation
Historical Trends Time-series charts over custom date ranges
City Comparison Side-by-side metrics across multiple cities

Interactive Controls:

  • Multi-city selection filter
  • Date range picker
  • Temperature unit toggle (°C / °F)
  • 5-minute automatic data caching

Documentation

Detailed guides available in docs/:

Tech Stack

Component Technology
Language Python 3.11+
Data Processing Polars 0.20+
Database PostgreSQL 15
Dashboard Streamlit 1.35+
Visualization Plotly
Containerization Docker Compose

Troubleshooting

Database connection failed

docker-compose ps           # Check container status
docker-compose logs db      # View error logs

Pipeline fails with API errors

curl -I https://api.open-meteo.com  # Check connectivity

Dashboard shows no data

uv run python src/pipeline.py  # Run pipeline first

Built by Eslam Mohamed

About

Production-ready ETL pipeline: Open-Meteo API → Polars → PostgreSQL → Streamlit. Features connection pooling, retry logic, input validation, and comprehensive monitoring.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors