Toetrandro-etl is a full-stack ETL and analytics pipeline that collects, processes, and visualizes weather data to answer a real-world question:
πΊοΈ When is the best time to visit a city based on weather conditions?
This project combines automation, data modeling, and interactive dashboards to deliver actionable travel recommendations based on real-time and historical climate data.
- π¦ Automate daily ETL workflows using Apache Airflow
- π Integrate real-time and historical weather data
- π§Ό Clean and model data for climate-based travel scoring
- π Visualize insights through an interactive dashboard
Can we recommend the best times to visit a city based on weather comfort?
- β Ideal temperature range (e.g., 22Β°Cβ28Β°C)
- π§οΈ Low precipitation and wind speed
- π Monthly comfort scores and ideal day counts
| Layer | Tools/Technologies Used |
|---|---|
| Automation | Apache Airflow |
| Data Handling | Python, Pandas, GeoCoder |
| Data Sources | OpenWeather API, CSV/OpenMeteo |
| Visualization | Jupyter Notebooks, Metabase |
| Orchestration | DAGs with task-based architecture |
- π‘ Daily automated extraction of weather data
- π Historical dataset integration (CSV, APIs)
- π ETL pipeline with modular Airflow tasks:
extract,transform,merge,migrate - π§½ Data cleaning & normalization for schema consistency
- π Star schema modeling for analytics-ready structure
- π Interactive dashboard with filters by city, month, and metric
toetrandro-etl/
βββ workflows/
β βββ dags/ # Airflow DAGs
β βββ scripts/ # Task logic
β βββ config/ # Airflow variables/settings
βββ data/
β βββ raw/ # Raw extracted data
β βββ merged/ # Final merged dataset
β βββ processed/ # Cleaned, transformed data
βββ notebooks/ # Jupyter Notebooks for EDA & modeling
βββ migration/ # PostgreSQL, the database table set-up
βββ src/
β βββ api/ # OpenWeather API client
β βββ core/ # ETL logic
β βββ utils/ # Logging, helpers
βββ tests/ # Unit tests
βββ requirements.txt # Dependencies
βββ README.md
- establish_city_config β Defines cities and config
- extract_weather_data β Pulls real-time weather from OpenWeather API
- transform_enriched_data β Cleans and enriches the dataset
- merge_processed_files β Combines historical and real-time data
- migrate_data_to_postgres β Loads data into a star schema in PostgreSQL
Detailed documentation is available in the doc folder:
- π§± Pipeline Process β Detailed breakdown of each ETL step
- π¬οΈ Airflow Configuration β Airflow variables and environment setup
- π Model Documentation β About how the model is design (Star schema model)
The Toetrandro dashboard is designed to answer two complementary questions:
This view provides a high-level comparison across all cities and time periods. It answers:
- π Which city has the highest annual comfort score?
- π How many ideal days were recorded across all cities?
- π Which cities are best to visit overall?
- π Which months are most comfortable for travel?
- βοΈ How does seasonal comfort vary by city?
This global perspective helps travelers compare destinations and choose the best months to travel based on aggregated climate comfort.
This city-specific dashboard allows users to select a city, month, and year to explore detailed comfort trends. It answers:
- π What is the most ideal month to visit this city?
- π How many ideal days occurred in the selected year?
- π What proportion of days were ideal vs. not ideal?
- π How has the comfort score evolved over the years?
- π How does the number of ideal days change month by month?
- π°οΈ Is the city becoming more or less comfortable over time?
For example, selecting Mahajanga in 2025 reveals:
- β June is the most ideal month
- π 26 ideal days recorded that year
- π 40.4% of days were ideal
- π A steady increase in comfort score from 2020 to 2025
- π Monthly breakdown showing June peaking with 12 ideal days
- β Unit tests for all ETL components
- π Retry logic and logging in Airflow tasks
- π Secure API key handling via Airflow Variables
-
Clone the repository
git clone https://github.com/Abega1642/toetrandro-etl.git
-
Install dependencies
pip install -r requirements.txt
-
Set environment variables
Follow the instructions in airflow_env.md
β οΈ Important Note: Before running the DAG, ensure that your PostgreSQL database is properly configured:
- β The database must be created and accessible with the correct credentials.
- π§± All required tables must be initialized using the SQL script provided below.
- π The database user must have sufficient privileges (e.g.,
CREATE,INSERT,SELECT,REFERENCES) to execute all operations.π Initialization script: toetrandro_db_script.sql
-
Initialize Airflow
airflow db init airflow users create --username admin ...
-
Launch Airflow
airflow scheduler & airflow api-server & airflow dag-processor
- π Add more cities and weather APIs
- πΊοΈ Enhance dashboard with maps and geospatial filters
- π³ Dockerize the pipeline for easier deployment
- AbegΓ Razafindratelo
This project is licensed under the MIT License.

