Airline Passenger Satisfaction Analysis

Project Overview

This Streamlit application analyzes airline passenger satisfaction data through exploratory data analysis and predictive modeling. It helps business leaders understand factors affecting passenger satisfaction and enables targeted customer service interventions by identifying potentially unsatisfied customers.

The application provides:

Data loading and preprocessing capabilities
Exploratory data visualization
Machine learning model for predicting passenger satisfaction
Threshold optimization for focusing customer service resources
Model evaluation and export functionality

Key Visualizations

Flight Distance Distribution

Feature Importance

Precision-Recall Curve

Business Problem

Airlines need to efficiently identify unsatisfied passengers and understand key factors affecting satisfaction. With limited customer service resources, it's crucial to prioritize outreach to potentially unsatisfied customers to address their concerns promptly.

Features

Interactive Data Loading: Upload your data or use the provided example dataset
Variable Visualization: Explore distributions with boxplots, histograms, and percentile plots
Feature Engineering: Create derived features and handle outliers
Model Training: Build a Gradient Boosting classifier with customizable parameters
Model Evaluation: View comprehensive performance metrics and visualizations
Threshold Optimization: Tune decision thresholds to balance precision and recall
Export Functionality: Save models, thresholds, and processed data for deployment

Installation

Prerequisites

Python 3.8 or higher
Git (for cloning the repository)

Clone the Repository

git clone https://github.com/GuyenSoto/PBI-Airline.git
cd PBI-Airline

Set up a Virtual Environment (recommended)

# On Windows
python -m venv venv
venv\Scripts\activate

# On macOS/Linux
python -m venv venv
source venv/bin/activate

Install Dependencies

pip install -r requirements.txt

Usage

Running the Application

streamlit run airline_2025.py

This will start the Streamlit server and open the application in your default web browser. If it doesn't open automatically, navigate to the URL displayed in your terminal (typically http://localhost:8501).

Application Workflow

Load Data:
- Upload your CSV file or use the provided example dataset
- Review basic statistics and configure your target column
Variable Visualization:
- Select variables to visualize their distributions
- Create new features if needed
- Remove outliers to improve model performance
Model Training and Evaluation:
- Configure model parameters
- Train the model
- Review performance metrics, confusion matrices, and feature importance
Threshold Optimization:
- Adjust decision thresholds to balance precision and recall
- Understand the trade-offs between different threshold values
Export Results:
- Save your trained model, optimized threshold, and processed data
- Generate a summary report of findings

Dataset

The sample dataset (satisfaction.csv) contains:

Passenger demographic information
Flight and trip details
Ratings for various service aspects
Verified satisfaction labels

Key features include:

Flight distance
Departure and arrival delays
Service ratings (food, entertainment, etc.)
Customer demographics (age, gender, etc.)

Delay Distributions

Departure Delay

Arrival Delay

Model Details

The application uses a Gradient Boosting Classifier with:

Customizable hyperparameters (n_estimators, learning_rate, max_depth)
Preprocessing pipelines for numerical and categorical features
Feature importance analysis
Optimized decision thresholds for focusing on unsatisfied passengers

Model Evaluation

Confusion Matrix (Default Threshold)

Confusion Matrix (Optimized Threshold)

Threshold Analysis

Precision vs Threshold

Recall vs Threshold

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Data based on airline passenger satisfaction surveys
Built with Streamlit, Pandas, Scikit-learn, and Matplotlib

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
output		output
RCG task.docx		RCG task.docx
README.md		README.md
airline_2025.py		airline_2025.py
requirements.txt		requirements.txt
satisfaction.csv		satisfaction.csv

Folders and files

Latest commit

History

Repository files navigation

Airline Passenger Satisfaction Analysis

Project Overview

Key Visualizations

Flight Distance Distribution

Feature Importance

Precision-Recall Curve

Business Problem

Features

Installation

Prerequisites

Clone the Repository

Set up a Virtual Environment (recommended)

Install Dependencies

Usage

Running the Application

Application Workflow

Dataset

Delay Distributions

Departure Delay

Arrival Delay

Model Details

Model Evaluation

Confusion Matrix (Default Threshold)

Confusion Matrix (Optimized Threshold)

Threshold Analysis

Precision vs Threshold

Recall vs Threshold

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages