🌤️ Toetrandro-etl — Travel Recommendation Based on Climate

🧭 Project Overview

Toetrandro-etl is a full-stack ETL and analytics pipeline that collects, processes, and visualizes weather data to answer a real-world question:

🗺️ When is the best time to visit a city based on weather conditions?

This project combines automation, data modeling, and interactive dashboards to deliver actionable travel recommendations based on real-time and historical climate data.

🎯 Project Goals

📦 Automate daily ETL workflows using Apache Airflow
🌍 Integrate real-time and historical weather data
🧼 Clean and model data for climate-based travel scoring
📊 Visualize insights through an interactive dashboard

🌐 Use Case: Climate & Tourism

❓ Problem Statement

Can we recommend the best times to visit a city based on weather comfort?

📈 Key Metrics

✅ Ideal temperature range (e.g., 22°C–28°C)
🌧️ Low precipitation and wind speed
📅 Monthly comfort scores and ideal day counts

⚙️ Technical Stack

Layer	Tools/Technologies Used
Automation	Apache Airflow
Data Handling	Python, Pandas, GeoCoder
Data Sources	OpenWeather API, CSV/OpenMeteo
Visualization	Jupyter Notebooks, Metabase
Orchestration	DAGs with task-based architecture

🛠️ Key Features

📡 Daily automated extraction of weather data
📂 Historical dataset integration (CSV, APIs)
🔄 ETL pipeline with modular Airflow tasks: extract, transform, merge, migrate
🧽 Data cleaning & normalization for schema consistency
🌟 Star schema modeling for analytics-ready structure
📊 Interactive dashboard with filters by city, month, and metric

📁 Repository Structure

toetrandro-etl/
├── workflows/
│   ├── dags/                   # Airflow DAGs
│   ├── scripts/                # Task logic
│   └── config/                 # Airflow variables/settings
├── data/
│   ├── raw/                    # Raw extracted data
│   ├── merged/                 # Final merged dataset
│   └── processed/              # Cleaned, transformed data
├── notebooks/                  # Jupyter Notebooks for EDA & modeling
├── migration/                  # PostgreSQL, the database table set-up
├── src/
│   ├── api/                    # OpenWeather API client
│   ├── core/                   # ETL logic
│   └── utils/                  # Logging, helpers
├── tests/                      # Unit tests
├── requirements.txt            # Dependencies
└── README.md

🔁 Pipeline Logic (Airflow DAG)

establish_city_config – Defines cities and config
extract_weather_data – Pulls real-time weather from OpenWeather API
transform_enriched_data – Cleans and enriches the dataset
merge_processed_files – Combines historical and real-time data
migrate_data_to_postgres – Loads data into a star schema in PostgreSQL

📚 Additional Documentation

Detailed documentation is available in the doc folder:

🧱 Pipeline Process — Detailed breakdown of each ETL step
🌬️ Airflow Configuration — Airflow variables and environment setup
📊 Model Documentation — About how the model is design (Star schema model)

📊 Dashboard Overview

The Toetrandro dashboard is designed to answer two complementary questions:

🌍 Global View: Where and when is the best place to travel?

This view provides a high-level comparison across all cities and time periods. It answers:

🏆 Which city has the highest annual comfort score?
📅 How many ideal days were recorded across all cities?
🌟 Which cities are best to visit overall?
📆 Which months are most comfortable for travel?
❄️ How does seasonal comfort vary by city?

This global perspective helps travelers compare destinations and choose the best months to travel based on aggregated climate comfort.

🏙️ Local View: What’s the best time to visit a specific city?

This city-specific dashboard allows users to select a city, month, and year to explore detailed comfort trends. It answers:

📌 What is the most ideal month to visit this city?
📅 How many ideal days occurred in the selected year?
📊 What proportion of days were ideal vs. not ideal?
📈 How has the comfort score evolved over the years?
🔄 How does the number of ideal days change month by month?
🕰️ Is the city becoming more or less comfortable over time?

For example, selecting Mahajanga in 2025 reveals:

✅ June is the most ideal month

📅 26 ideal days recorded that year

📊 40.4% of days were ideal

📈 A steady increase in comfort score from 2020 to 2025

🔄 Monthly breakdown showing June peaking with 12 ideal days

🧪 Testing & Reliability

✅ Unit tests for all ETL components
🔁 Retry logic and logging in Airflow tasks
🔐 Secure API key handling via Airflow Variables

🚀 Getting Started

Clone the repository

git clone https://github.com/Abega1642/toetrandro-etl.git

Install dependencies
```
pip install -r requirements.txt
```
Set environment variables
Follow the instructions in airflow_env.md

⚠️ Important Note: Before running the DAG, ensure that your PostgreSQL database is properly configured:

✅ The database must be created and accessible with the correct credentials.

🧱 All required tables must be initialized using the SQL script provided below.

🔐 The database user must have sufficient privileges (e.g., CREATE, INSERT, SELECT, REFERENCES) to execute all operations.

📄 Initialization script: toetrandro_db_script.sql

Initialize Airflow

airflow db init
airflow users create --username admin ...

Launch Airflow

airflow scheduler &
airflow api-server &
airflow dag-processor

📌 Future Improvements

🌍 Add more cities and weather APIs
🗺️ Enhance dashboard with maps and geospatial filters
🐳 Dockerize the pipeline for easier deployment

👥 Author

Abegà Razafindratelo

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
config		config
data		data
doc		doc
notebooks		notebooks
resources/db/migration		resources/db/migration
src		src
tests		tests
workflows		workflows
.env.template		.env.template
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENCE		LICENCE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌤️ Toetrandro-etl — Travel Recommendation Based on Climate

🧭 Project Overview

🎯 Project Goals

🌐 Use Case: Climate & Tourism

❓ Problem Statement

📈 Key Metrics

⚙️ Technical Stack

🛠️ Key Features

📁 Repository Structure

🔁 Pipeline Logic (Airflow DAG)

📚 Additional Documentation

📊 Dashboard Overview

🌍 Global View: Where and when is the best place to travel?

🏙️ Local View: What’s the best time to visit a specific city?

🧪 Testing & Reliability

🚀 Getting Started

📌 Future Improvements

👥 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌤️ Toetrandro-etl — Travel Recommendation Based on Climate

🧭 Project Overview

🎯 Project Goals

🌐 Use Case: Climate & Tourism

❓ Problem Statement

📈 Key Metrics

⚙️ Technical Stack

🛠️ Key Features

📁 Repository Structure

🔁 Pipeline Logic (Airflow DAG)

📚 Additional Documentation

📊 Dashboard Overview

🌍 Global View: Where and when is the best place to travel?

🏙️ Local View: What’s the best time to visit a specific city?

🧪 Testing & Reliability

🚀 Getting Started

📌 Future Improvements

👥 Author

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages