✈️ AeroDelay - Flight Delay Prediction System

A comprehensive machine learning system for predicting flight delays using Random Forest Regression. This project demonstrates end-to-end ML workflow including data preprocessing, feature engineering, model training, hyperparameter tuning, and deployment through a web interface.

🎯 Project Overview

Flight delays cost airlines and passengers billions annually. This project builds a predictive model to forecast flight delays based on historical data, enabling proactive decision-making for airlines, airports, and travelers.

Key Features:

🔄 Robust data preprocessing pipeline
🎛️ Feature engineering from temporal and categorical data
🌲 Random Forest model with hyperparameter tuning
📊 Comprehensive model evaluation and visualization
🚀 Interactive web application using Streamlit

📁 Project Structure

AeroDelay/
├── data/
│   ├── raw/                    # Original flight data
│   └── processed/              # Preprocessed data ready for modeling
├── models/                     # Saved trained models
├── notebooks/
│   ├── preprocessing.ipynb           # Data exploration and preprocessing
│   └── model_training_and_evaluation.ipynb  # Model training and evaluation
├── src/
│   ├── preprocess.py          # Data preprocessing functions
│   ├── train.py               # Model training pipeline
│   └── predict.py             # Prediction module with FlightDelayPredictor class
├── app.py                     # Streamlit web application
├── requirements.txt           # Project dependencies
└── README.md

🛠️ Installation

Prerequisites

Python 3.8 or higher
pip package manager

Setup

Clone the repository

git clone <repository-url>
cd AeroDelay

Create a virtual environment (recommended)

python -m venv venv
source venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

🚀 Usage

1. Data Preprocessing

Preprocess raw flight data using the preprocessing module:

python src/preprocess.py

Or explore the preprocessing notebook:

jupyter notebook notebooks/preprocessing.ipynb

Preprocessing Steps:

Remove unnecessary columns (IDs, data leakage features)
Handle missing values and duplicates
Feature engineering (flight duration, temporal features, weekend indicator)
Encode categorical variables (Airline, Origin, Destination, Aircraft Type)
Scale numeric features using StandardScaler

2. Model Training

Train the Random Forest model with hyperparameter tuning:

python src/train.py

Or use the training notebook:

jupyter notebook notebooks/model_training_and_evaluation.ipynb

Training Process:

Train/test split (80/20)
Baseline Random Forest model
GridSearchCV for hyperparameter optimization
Model evaluation with multiple metrics
Feature importance analysis
Model persistence using joblib

3. Make Predictions

Use the trained model to predict flight delays:

from src.predict import FlightDelayPredictor

# Load model
predictor = FlightDelayPredictor("models/flight_delay_model.pkl")

# Make predictions
predictions = predictor.predict(X_test)

Or run the example:

python src/predict.py

4. Web Application

Launch the interactive Streamlit app:

streamlit run app.py

📊 Top Important Features

ScheduledDuration - Flight duration in minutes
Distance - Flight distance
DepartureHour - Hour of departure
Airline_Encoded - Airline carrier
Origin_Encoded - Origin airport

🔬 Technical Details

Machine Learning Pipeline

Data Preprocessing
- Label encoding for categorical features
- Standard scaling for numerical features
- Time-based feature extraction
Model Architecture
- Algorithm: Random Forest Regressor
- Hyperparameters: Optimized via GridSearchCV
- Cross-validation: 3-fold CV
Evaluation Metrics
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- Residual analysis

Delay Categories

The system categorizes predictions into severity levels:

🟢 On Time: < 15 minutes
🟡 Minor Delay: 15-30 minutes
🟠 Moderate Delay: 30-60 minutes
🔴 Major Delay: > 60 minutes

📈 Visualizations

The notebooks include comprehensive visualizations:

Distribution of delay minutes
Average delay by airline
Feature importance bar charts
Residual plots
Predicted vs Actual scatter plots

🧪 Testing the System

Run a quick test to ensure everything works:

# Test preprocessing
python -c "from src.preprocess import preprocess_data; print('✓ Preprocessing module OK')"

# Test prediction
python src/predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✈️ AeroDelay - Flight Delay Prediction System

🎯 Project Overview

📁 Project Structure

🛠️ Installation

Prerequisites

Setup

🚀 Usage

1. Data Preprocessing

2. Model Training

3. Make Predictions

4. Web Application

📊 Top Important Features

🔬 Technical Details

Machine Learning Pipeline

Delay Categories

📈 Visualizations

🧪 Testing the System

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
models		models
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

✈️ AeroDelay - Flight Delay Prediction System

🎯 Project Overview

📁 Project Structure

🛠️ Installation

Prerequisites

Setup

🚀 Usage

1. Data Preprocessing

2. Model Training

3. Make Predictions

4. Web Application

📊 Top Important Features

🔬 Technical Details

Machine Learning Pipeline

Delay Categories

📈 Visualizations

🧪 Testing the System

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages