This project analyzes historical airfare data across different US cities using SQL queries to uncover insights about pricing patterns, routes, and carrier competition.
The analysis uses a comprehensive airfare dataset containing flight information from 1996 to 2018, including:
- Flight routes between US cities
- Fare information and passenger counts
- Carrier data (largest market share and lowest-fare carriers)
- Distance between cities
- Temporal information (year and quarter)
Key dataset dimensions:
- Total records: 90,021
- Time span: 23 years (1996-2018)
- Distinct cities: 163
- Quarterly data points
-
Price Range
- Average fare: $193.19
- Minimum fare: $56.42
- Maximum fare: $540.45
- Average passenger count per route: 803
-
Route Distances
- Shortest route: Los Angeles to San Diego (109 nautical miles)
- Longest route: Miami to Seattle (2,724 nautical miles)
- Average distance: 1,053 nautical miles
- Median distance: 926 nautical miles
-
Most Popular Routes (by total passenger volume)
- Los Angeles ↔ San Francisco: 1,738,198 passengers (avg. fare: $109.96)
- Miami ↔ New York City: 1,225,918 passengers (avg. fare: $155.11)
- Los Angeles ↔ New York City: 893,802 passengers (avg. fare: $325.60)
-
Carrier Analysis
- In 66.5% of routes, the carrier with the largest market share wasn't the lowest-fare carrier
- Average fare difference when carriers differ: $49.46
- Maximum fare difference observed: $451.83 (Orlando to San Antonio route)
-
Seasonal Patterns
- Q1 shows lowest average passenger numbers (750) despite similar flight frequencies
- Average fares remain relatively stable across quarters ($190-$196)
- Q4 shows slightly lower fare variability (standard deviation: 58.77)
- Docker
- Docker Compose
- PostgreSQL client (optional, for direct DB access)
- Clone this repository:
git clone https://github.com/cacaprog/airfare-analysis-sql
cd airfare-analysis-sql- Create a
docker-compose.ymlfile:
version: '3.8'
services:
db:
image: postgres:14
environment:
POSTGRES_USER: airfare_user
POSTGRES_PASSWORD: your_password
POSTGRES_DB: airfare_db
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
volumes:
postgres_data:- Start the PostgreSQL container:
docker-compose up -d- Import the data:
# If using psql client
psql -h localhost -U airfare_user -d airfare_db -f schema.sql
psql -h localhost -U airfare_user -d airfare_db -f data.sqlairfare-analysis/
├── README.md
├── docker-compose.yml
├── schema.sql
├── data.sql
├── queries/
│ ├── basic_stats.sql
│ ├── carrier_analysis.sql
│ ├── route_analysis.sql
│ └── seasonal_analysis.sql
└── results/
└── visualizations/
All SQL queries used in this analysis are available in the queries/ directory. Each file contains documented queries for different aspects of the analysis:
basic_stats.sql: Basic statistical analysiscarrier_analysis.sql: Carrier competition analysisroute_analysis.sql: Popular routes and distance analysisseasonal_analysis.sql: Seasonal patterns and trends
- Add time series analysis to track fare trends over the 23-year period
- Incorporate geographical visualization of routes
- Analyze the impact of market competition on fares
- Study the correlation between distance and fare pricing
- Investigate carrier market share changes over time
Feel free to fork this project and submit pull requests with improvements or additional analyses.
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset source: airfare_data.csv
- Analysis inspiration: Codecademy project