Skip to content

xiaotuzi7877/ML-Major-Delay-Cause-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ML-Ensemble Major-Delay-Cause-Classifier

This repository presents a machine learning approach to classify the primary cause of flight delays at U.S. airports. The methodology emphasizes ensemble learning, extensive feature engineering, and class imbalance handling.


πŸ“„ Report

πŸ‘‰ [Click here to view the full PDF report](./Major_Delay_Cause for an airport.pdf)

πŸ“’ Notebook

πŸ‘‰ [Major_Delay_Cause.ipynb](./Major_Delay_Cause for an airport.ipynb)


Overview

  • Task: Multi-class classification of delay causes: carrier, weather, NAS, late aircraft
  • Dataset: 13,000+ aggregated monthly airport-carrier delay records
  • Target Variable: Major_Delay_Cause (derived from delay components)

Methodology

  • Preprocessing:
    • Missing value imputation (median)
    • Standardization of numerical features
    • One-hot encoding of categorical variables
  • Feature Engineering:
    • Delay rates, risk scores, historical performance, seasonality indicators
    • Z-score normalization for airport/carrier
  • Modeling:
    • Baselines: Logistic Regression, Decision Tree, KNN
    • Ensembles: Random Forest, Gradient Boosting, Bagging, AdaBoost
    • Final ensemble via Soft Voting on tuned Gradient Boosting and Random Forest

Performance

  • Best F1 (Macro): 0.827 (Gradient Boosting with SMOTE)
  • Voting Ensemble Accuracy: 88%
  • Evaluation Metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix

Files

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors