Skip to content

Abega1642/toetrandro-etl

Repository files navigation

🌀️ Toetrandro-etl β€” Travel Recommendation Based on Climate

🧭 Project Overview

Toetrandro-etl is a full-stack ETL and analytics pipeline that collects, processes, and visualizes weather data to answer a real-world question:

πŸ—ΊοΈ When is the best time to visit a city based on weather conditions?

This project combines automation, data modeling, and interactive dashboards to deliver actionable travel recommendations based on real-time and historical climate data.


🎯 Project Goals

  • πŸ“¦ Automate daily ETL workflows using Apache Airflow
  • 🌍 Integrate real-time and historical weather data
  • 🧼 Clean and model data for climate-based travel scoring
  • πŸ“Š Visualize insights through an interactive dashboard

🌐 Use Case: Climate & Tourism

❓ Problem Statement

Can we recommend the best times to visit a city based on weather comfort?

πŸ“ˆ Key Metrics

  • βœ… Ideal temperature range (e.g., 22Β°C–28Β°C)
  • 🌧️ Low precipitation and wind speed
  • πŸ“… Monthly comfort scores and ideal day counts

βš™οΈ Technical Stack

Layer Tools/Technologies Used
Automation Apache Airflow
Data Handling Python, Pandas, GeoCoder
Data Sources OpenWeather API, CSV/OpenMeteo
Visualization Jupyter Notebooks, Metabase
Orchestration DAGs with task-based architecture

πŸ› οΈ Key Features

  • πŸ“‘ Daily automated extraction of weather data
  • πŸ“‚ Historical dataset integration (CSV, APIs)
  • πŸ”„ ETL pipeline with modular Airflow tasks: extract, transform, merge, migrate
  • 🧽 Data cleaning & normalization for schema consistency
  • 🌟 Star schema modeling for analytics-ready structure
  • πŸ“Š Interactive dashboard with filters by city, month, and metric

πŸ“ Repository Structure

toetrandro-etl/
β”œβ”€β”€ workflows/
β”‚   β”œβ”€β”€ dags/                   # Airflow DAGs
β”‚   β”œβ”€β”€ scripts/                # Task logic
β”‚   └── config/                 # Airflow variables/settings
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                    # Raw extracted data
β”‚   β”œβ”€β”€ merged/                 # Final merged dataset
β”‚   └── processed/              # Cleaned, transformed data
β”œβ”€β”€ notebooks/                  # Jupyter Notebooks for EDA & modeling
β”œβ”€β”€ migration/                  # PostgreSQL, the database table set-up
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api/                    # OpenWeather API client
β”‚   β”œβ”€β”€ core/                   # ETL logic
β”‚   └── utils/                  # Logging, helpers
β”œβ”€β”€ tests/                      # Unit tests
β”œβ”€β”€ requirements.txt            # Dependencies
└── README.md

πŸ” Pipeline Logic (Airflow DAG)

  1. establish_city_config – Defines cities and config
  2. extract_weather_data – Pulls real-time weather from OpenWeather API
  3. transform_enriched_data – Cleans and enriches the dataset
  4. merge_processed_files – Combines historical and real-time data
  5. migrate_data_to_postgres – Loads data into a star schema in PostgreSQL

πŸ“š Additional Documentation

Detailed documentation is available in the doc folder:


πŸ“Š Dashboard Overview

The Toetrandro dashboard is designed to answer two complementary questions:


🌍 Global View: Where and when is the best place to travel?

Global_Toetrandro Dashboard

This view provides a high-level comparison across all cities and time periods. It answers:

  • πŸ† Which city has the highest annual comfort score?
  • πŸ“… How many ideal days were recorded across all cities?
  • 🌟 Which cities are best to visit overall?
  • πŸ“† Which months are most comfortable for travel?
  • ❄️ How does seasonal comfort vary by city?

This global perspective helps travelers compare destinations and choose the best months to travel based on aggregated climate comfort.


πŸ™οΈ Local View: What’s the best time to visit a specific city?

Local_Toetrandro_Dashboard

This city-specific dashboard allows users to select a city, month, and year to explore detailed comfort trends. It answers:

  • πŸ“Œ What is the most ideal month to visit this city?
  • πŸ“… How many ideal days occurred in the selected year?
  • πŸ“Š What proportion of days were ideal vs. not ideal?
  • πŸ“ˆ How has the comfort score evolved over the years?
  • πŸ”„ How does the number of ideal days change month by month?
  • πŸ•°οΈ Is the city becoming more or less comfortable over time?

For example, selecting Mahajanga in 2025 reveals:

  • βœ… June is the most ideal month
  • πŸ“… 26 ideal days recorded that year
  • πŸ“Š 40.4% of days were ideal
  • πŸ“ˆ A steady increase in comfort score from 2020 to 2025
  • πŸ”„ Monthly breakdown showing June peaking with 12 ideal days

πŸ§ͺ Testing & Reliability

  • βœ… Unit tests for all ETL components
  • πŸ” Retry logic and logging in Airflow tasks
  • πŸ” Secure API key handling via Airflow Variables

πŸš€ Getting Started

  1. Clone the repository

    git clone https://github.com/Abega1642/toetrandro-etl.git
  2. Install dependencies

    pip install -r requirements.txt
  3. Set environment variables
    Follow the instructions in airflow_env.md


⚠️ Important Note: Before running the DAG, ensure that your PostgreSQL database is properly configured:

  • βœ… The database must be created and accessible with the correct credentials.
  • 🧱 All required tables must be initialized using the SQL script provided below.
  • πŸ” The database user must have sufficient privileges (e.g., CREATE, INSERT, SELECT, REFERENCES) to execute all operations.

πŸ“„ Initialization script: toetrandro_db_script.sql


  1. Initialize Airflow

    airflow db init
    airflow users create --username admin ...
  2. Launch Airflow

    airflow scheduler &
    airflow api-server &
    airflow dag-processor

πŸ“Œ Future Improvements

  • 🌍 Add more cities and weather APIs
  • πŸ—ΊοΈ Enhance dashboard with maps and geospatial filters
  • 🐳 Dockerize the pipeline for easier deployment

πŸ‘₯ Author

  • AbegΓ  Razafindratelo

πŸ“„ License

This project is licensed under the MIT License.

About

🌀️ Toetrandro-etl β€” an automated pipeline that helps you plan your vacations or travels by analyzing weather patterns and recommending the best times to visit your favorite cities.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors