- Taxi trip duration in NYC varies significantly by time, location, and traffic
- Inaccurate estimates impact passenger ETAs, fleet utilization, and pricing decisions
- Rule-based or manual estimation does not scale to city-level operations
- Trained a regression model on historical NYC Taxi trip data
- Built an API-first inference service using FastAPI
- Enforced schema validation to block invalid inputs
- Reused the same preprocessing pipeline at training and inference (no train–serve skew)
- Deployed as a Dockerized service for reproducibility and consistency
- Target: Trip duration (minutes)
- Evaluated using regression metrics (MAE, RMSE) on held-out data
- Focused on stable, reliable predictions rather than overfitting for benchmark scores
- Enables real-time ETA estimation for passengers
- Supports driver allocation and fleet planning
- Helps detect abnormal or inefficient trips
- Architecture reusable for other prediction services (ETA, demand, churn, fraud)
Transforms real-world NYC taxi data into a reliable, production-ready machine-learning prediction service.
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start API
uvicorn app.main:app --reload
# Open
http://localhost:8000