Flight Analytics & Recommendation System

Overview

A comprehensive flight analytics and route recommendation system built for the MariaDB Hackathon 2025. This project demonstrates advanced database features including vector similarity search, high-performance analytics, and AI-powered recommendations using the OpenFlights dataset.

Live Demo

✈️ *[View Live Demo] ✈️

Key Features

📊 Real-time Analytics Dashboard
- Busiest airports by route count
- Top airlines by route coverage
- Most popular flight routes
- Airport distribution by country
- Overall statistics (7,698 airports, 6,162 airlines, 67,240 routes, 237 countries)
🤖 AI-Powered Route Recommendations
- Vector similarity search using TF-IDF embeddings
- Direct route discovery
- Alternative route suggestions
- Cosine similarity-based ranking
- Search by IATA airport codes
🎨 Modern UI/UX
- Glass-morphism design
- Gradient color schemes
- Smooth animations and transitions
- Fully responsive layout
- Dark theme optimized for readability

Technology Stack

Backend

FastAPI - Modern Python web framework
MongoDB - NoSQL database with vector storage
scikit-learn - TF-IDF vectorization and cosine similarity
Motor - Async MongoDB driver
Python 3.11

Frontend

React 19 - UI library
Tailwind CSS - Utility-first CSS framework
shadcn/ui - Beautiful UI components
Lucide React - Modern icon library
Axios - HTTP client

Data

OpenFlights Dataset - 7,698 airports, 6,162 airlines, 67,240 routes
Source: OpenFlights GitHub

Architecture

┌────────────────┐         ┌────────────────┐         ┌────────────────┐
│  React Frontend │───API──▶│  FastAPI Backend│───DB──▶│    MongoDB      │
│   Port 3000     │         │   Port 8001     │         │   Port 27017    │
└────────────────┘         └────────────────┘         └────────────────┘
                                    │
                                    v
                          ┌────────────────────┐
                          │  OpenFlights Data  │
                          │  (CSV Ingestion)   │
                          └────────────────────┘

Project Structure

flight-analytics/
├── backend/
│   ├── server.py                 # FastAPI application with all endpoints
│   ├── data_ingestion.py         # Data loading and embedding generation
│   ├── requirements.txt          # Python dependencies
│   ├── .env                      # Environment variables
│   └── data/                     # OpenFlights CSV files
│       ├── airports.dat
│       ├── airlines.dat
│       └── routes.dat
├── frontend/
│   ├── src/
│   │   ├── pages/
│   │   │   ├── Dashboard.jsx      # Main landing page
│   │   │   ├── Analytics.jsx      # Analytics dashboard
│   │   │   └── Recommendations.jsx # Route search & recommendations
│   │   ├── App.js                # Main app component with routing
│   │   ├── App.css               # Global styles
│   │   └── components/ui/        # shadcn/ui components
│   ├── package.json
│   ├── tailwind.config.js
│   └── .env
└── README.md

Quick Start (Local Development)

Prerequisites

Node.js 16+ and Yarn
Python 3.9+
MongoDB (local or cloud instance)

1. Clone the Repository

git clone <your-repo-url>
cd flight-analytics

2. Backend Setup

cd backend

# Create virtual environment (optional)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env and set:
# MONGO_URL=mongodb://localhost:27017
# DB_NAME=flight_analytics

# Download OpenFlights data (already included in data/ folder)
# Or re-download:
cd data
curl -sL "https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat" -o airports.dat
curl -sL "https://raw.githubusercontent.com/jpatokal/openflights/master/data/airlines.dat" -o airlines.dat
curl -sL "https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat" -o routes.dat
cd ..

# Ingest data into MongoDB
python3 data_ingestion.py

# Start backend server
uvicorn server:app --host 0.0.0.0 --port 8001 --reload

Backend will be running at http://localhost:8001

API docs available at http://localhost:8001/docs

3. Frontend Setup

cd frontend

# Install dependencies
yarn install

# Configure environment
cp .env.example .env
# Edit .env and set:
# REACT_APP_BACKEND_URL=http://localhost:8001

# Start frontend
yarn start

Frontend will be running at http://localhost:3000

API Endpoints

Analytics

GET /api/analytics/stats - Overall statistics
GET /api/analytics/busiest-airports?limit=10 - Busiest airports by route count
GET /api/analytics/top-airlines?limit=10 - Top airlines by route coverage
GET /api/analytics/popular-routes?limit=10 - Most popular routes
GET /api/analytics/airports-by-country?limit=20 - Airports by country

Search

GET /api/search/airports?q=london&limit=10 - Search airports by name/city/IATA
GET /api/search/routes/{airport_id}?limit=20 - Get routes for specific airport

Recommendations

GET /api/recommendations/similar-routes?source=JFK&destination=LAX&top_k=10 - AI-powered similar routes
GET /api/recommendations/direct-routes?source=JFK&destination=LAX - Direct routes between airports

Example API Calls

# Get overall stats
curl http://localhost:8001/api/analytics/stats

# Get top 5 busiest airports
curl "http://localhost:8001/api/analytics/busiest-airports?limit=5"

# Search for airports in London
curl "http://localhost:8001/api/search/airports?q=london&limit=5"

# Get route recommendations for JFK to LAX
curl "http://localhost:8001/api/recommendations/similar-routes?source=JFK&destination=LAX&top_k=10"

How It Works

1. Data Ingestion

# Load OpenFlights CSVs
airports = pd.read_csv('airports.dat')
airlines = pd.read_csv('airlines.dat')
routes = pd.read_csv('routes.dat')

# Compute TF-IDF embeddings for routes
vectorizer = TfidfVectorizer(analyzer='char_wb', ngram_range=(2, 5))
embeddings = vectorizer.fit_transform(routes['route_text'])

# Store in MongoDB
await db.routes.insert_many([{
    'source': row.source,
    'dest': row.dest,
    'embedding': pickle.dumps(embedding)
}])

2. Similarity Search

# Transform query
query_vector = vectorizer.transform(["JFK-LAX"])

# Compute cosine similarity
similarities = cosine_similarity(query_vector, all_embeddings)

# Get top K results
top_indices = similarities.argsort()[-k:][::-1]

3. Analytics Queries

Using MongoDB aggregation pipelines:

# Busiest airports
pipeline = [
    {"$group": {"_id": "$dest_id", "count": {"$sum": 1}}},
    {"$sort": {"count": -1}},
    {"$limit": 10}
]

Database Schema

Collections

airports

{
  "id": 3682,
  "name": "Hartsfield Jackson Atlanta International Airport",
  "city": "Atlanta",
  "country": "United States",
  "iata": "ATL",
  "icao": "KATL",
  "latitude": 33.6367,
  "longitude": -84.4281
}

airlines

{
  "id": 324,
  "name": "American Airlines",
  "iata": "AA",
  "icao": "AAL",
  "country": "United States",
  "active": true
}

routes

{
  "id": 1,
  "source": "JFK",
  "source_id": 3797,
  "dest": "LAX",
  "dest_id": 3484,
  "airline": "AA",
  "airline_id": 324,
  "route_text": "JFK-LAX",
  "embedding": "<binary_vector_data>"
}

Machine Learning Implementation

TF-IDF Vectorization

Input: Route text (e.g., "JFK-LAX")
Method: Character-level n-grams (2-5)
Output: 128-dimensional sparse vector
Storage: Serialized as binary in MongoDB

Cosine Similarity

Formula: similarity = (A · B) / (||A|| * ||B||)
Range: 0 to 1 (0 = completely different, 1 = identical)
Use Case: Finding similar routes based on pattern matching

Performance Optimization

Indexing: Created indexes on frequently queried fields (IATA codes, IDs)
Batch Processing: Routes inserted in batches of 1,000 during ingestion
Async Operations: All database operations are asynchronous
Caching: Frontend caches API responses
Lazy Loading: Paginated results for large datasets

Deployment

Deploy to Emergent (Recommended)

This project is already deployed on netlify

🚀 Live Demo 🚀

Deploy to Other Platforms

Docker Deployment (create docker-compose.yml):

version: '3.8'
services:
  mongodb:
    image: mongo:7
    ports:
      - "27017:27017"
    volumes:
      - mongo_data:/data/db

  backend:
    build: ./backend
    ports:
      - "8001:8001"
    environment:
      - MONGO_URL=mongodb://mongodb:27017
      - DB_NAME=flight_analytics
    depends_on:
      - mongodb

  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    environment:
      - REACT_APP_BACKEND_URL=http://localhost:8001
    depends_on:
      - backend

volumes:
  mongo_data:

Run with:

docker-compose up -d

Testing

Backend API Tests

# Test stats endpoint
curl http://localhost:8001/api/analytics/stats

# Test recommendations
curl "http://localhost:8001/api/recommendations/similar-routes?source=JFK&destination=LAX&top_k=5"

Frontend Tests

Navigate to http://localhost:3000
Verify dashboard loads with statistics
Click "View Analytics" and check all tabs
Click "Find Routes" and search for JFK → LAX
Verify recommendations appear with similarity scores

Features Demonstrated

✅ Database Operations

Complex aggregation pipelines
Vector storage (embeddings as binary)
Multi-collection joins
Indexing strategies

✅ Machine Learning

TF-IDF vectorization
Cosine similarity ranking
Feature engineering

✅ API Design

RESTful endpoints
Query parameters
Error handling
CORS configuration

✅ Frontend Development

Modern React with hooks
Responsive design
Smooth animations
Component architecture

Future Enhancements

MariaDB ColumnStore integration for faster analytics
Galera Cluster for high availability
Real-time flight status integration
Interactive route maps (Leaflet/Mapbox)
User authentication and saved searches
Export analytics to CSV/PDF
GraphQL API support
Mobile app (React Native)
Enhanced ML models (BERT embeddings, neural networks)

Troubleshooting

Backend won't start

# Check MongoDB is running
mongosh --eval "db.serverStatus()"

# Check Python dependencies
pip install -r requirements.txt

# Check logs
tail -f /var/log/supervisor/backend.err.log

Frontend won't start

# Clear cache and reinstall
rm -rf node_modules yarn.lock
yarn install

# Check backend URL in .env
cat frontend/.env

Data ingestion fails

# Ensure data files exist
ls -la backend/data/

# Re-download data
cd backend/data
curl -sL "https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat" -o airports.dat

Contributing

This is a hackathon project, but contributions are welcome!

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - feel free to use this project for learning or hackathons!

Acknowledgments

OpenFlights - For providing comprehensive aviation data
MariaDB Foundation - For hosting the hackathon
FastAPI - For the excellent Python web framework
React & Tailwind CSS - For the modern frontend stack
shadcn/ui - For beautiful UI components
railway - For the deployment platform

Contact

Built for MariaDB Hackathon 2025

Made with ❤️ and lots of ☕ by your name

🚀 **[View Live Demo]🚀

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
backend		backend
frontend		frontend
tests		tests
.gitignore		.gitignore
DEPLOY.md		DEPLOY.md
README.md		README.md
SETUP.md		SETUP.md
package-lock.json		package-lock.json
setup_github.sh		setup_github.sh
yarn.lock		yarn.lock

Folders and files

Latest commit

History

Repository files navigation

Flight Analytics & Recommendation System

Overview

Live Demo

Key Features

Technology Stack

Backend

Frontend

Data

Architecture

Project Structure

Quick Start (Local Development)

Prerequisites

1. Clone the Repository

2. Backend Setup

3. Frontend Setup

API Endpoints

Analytics

Search

Recommendations

Example API Calls

How It Works

1. Data Ingestion

2. Similarity Search

3. Analytics Queries

Database Schema

Collections

Machine Learning Implementation

TF-IDF Vectorization

Cosine Similarity

Performance Optimization

Deployment

Deploy to Emergent (Recommended)

Deploy to Other Platforms

Testing

Backend API Tests

Frontend Tests

Features Demonstrated

Future Enhancements

Troubleshooting

Backend won't start

Frontend won't start

Data ingestion fails

Contributing

License

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages