This Streamlit application analyzes airline passenger satisfaction data through exploratory data analysis and predictive modeling. It helps business leaders understand factors affecting passenger satisfaction and enables targeted customer service interventions by identifying potentially unsatisfied customers.
The application provides:
- Data loading and preprocessing capabilities
- Exploratory data visualization
- Machine learning model for predicting passenger satisfaction
- Threshold optimization for focusing customer service resources
- Model evaluation and export functionality
Airlines need to efficiently identify unsatisfied passengers and understand key factors affecting satisfaction. With limited customer service resources, it's crucial to prioritize outreach to potentially unsatisfied customers to address their concerns promptly.
- Interactive Data Loading: Upload your data or use the provided example dataset
- Variable Visualization: Explore distributions with boxplots, histograms, and percentile plots
- Feature Engineering: Create derived features and handle outliers
- Model Training: Build a Gradient Boosting classifier with customizable parameters
- Model Evaluation: View comprehensive performance metrics and visualizations
- Threshold Optimization: Tune decision thresholds to balance precision and recall
- Export Functionality: Save models, thresholds, and processed data for deployment
- Python 3.8 or higher
- Git (for cloning the repository)
git clone https://github.com/GuyenSoto/PBI-Airline.git
cd PBI-Airline# On Windows
python -m venv venv
venv\Scripts\activate
# On macOS/Linux
python -m venv venv
source venv/bin/activatepip install -r requirements.txtstreamlit run airline_2025.pyThis will start the Streamlit server and open the application in your default web browser. If it doesn't open automatically, navigate to the URL displayed in your terminal (typically http://localhost:8501).
-
Load Data:
- Upload your CSV file or use the provided example dataset
- Review basic statistics and configure your target column
-
Variable Visualization:
- Select variables to visualize their distributions
- Create new features if needed
- Remove outliers to improve model performance
-
Model Training and Evaluation:
- Configure model parameters
- Train the model
- Review performance metrics, confusion matrices, and feature importance
-
Threshold Optimization:
- Adjust decision thresholds to balance precision and recall
- Understand the trade-offs between different threshold values
-
Export Results:
- Save your trained model, optimized threshold, and processed data
- Generate a summary report of findings
The sample dataset (satisfaction.csv) contains:
- Passenger demographic information
- Flight and trip details
- Ratings for various service aspects
- Verified satisfaction labels
Key features include:
- Flight distance
- Departure and arrival delays
- Service ratings (food, entertainment, etc.)
- Customer demographics (age, gender, etc.)
The application uses a Gradient Boosting Classifier with:
- Customizable hyperparameters (n_estimators, learning_rate, max_depth)
- Preprocessing pipelines for numerical and categorical features
- Feature importance analysis
- Optimized decision thresholds for focusing on unsatisfied passengers
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Data based on airline passenger satisfaction surveys
- Built with Streamlit, Pandas, Scikit-learn, and Matplotlib








