This repository presents a machine learning approach to classify the primary cause of flight delays at U.S. airports. The methodology emphasizes ensemble learning, extensive feature engineering, and class imbalance handling.
π [Click here to view the full PDF report](./Major_Delay_Cause for an airport.pdf)
π [Major_Delay_Cause.ipynb](./Major_Delay_Cause for an airport.ipynb)
- Task: Multi-class classification of delay causes:
carrier,weather,NAS,late aircraft - Dataset: 13,000+ aggregated monthly airport-carrier delay records
- Target Variable:
Major_Delay_Cause(derived from delay components)
- Preprocessing:
- Missing value imputation (median)
- Standardization of numerical features
- One-hot encoding of categorical variables
- Feature Engineering:
- Delay rates, risk scores, historical performance, seasonality indicators
- Z-score normalization for airport/carrier
- Modeling:
- Baselines: Logistic Regression, Decision Tree, KNN
- Ensembles: Random Forest, Gradient Boosting, Bagging, AdaBoost
- Final ensemble via Soft Voting on tuned Gradient Boosting and Random Forest
- Best F1 (Macro): 0.827 (Gradient Boosting with SMOTE)
- Voting Ensemble Accuracy: 88%
- Evaluation Metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix
Major_Delay_Cause.ipynb: Full pipeline including EDA, modeling, and evaluationexported_output.pdf: Complete project report with visualizations and analysis