The Food Delivery Time Estimation project utilizes machine learning models to predict the delivery time of food orders based on various factors such as geographical distance, delivery driver attributes, and vehicle type. This project leverages Python and libraries like Pandas, Scikit-learn, and Streamlit to provide a user-friendly web interface for predictions and data analysis.
- Interactive Web Application: Built with Streamlit to provide a seamless user interface.
- Multiple ML Algorithms: Supports models such as Random Forest, Gradient Boosting, Linear Regression, and more.
- Data Visualization: Includes detailed exploratory data analysis with visualizations using Matplotlib and Seaborn.
- Distance Calculation: Implements the Haversine formula for precise distance computation between locations.
- Customizable Inputs: Allows users to input data dynamically for real-time predictions.
- Programming Language: Python
- Data Processing: Pandas, Numpy
- Machine Learning: Scikit-learn
- Visualization: Matplotlib, Seaborn
- Web Interface: Streamlit
- Big Data Processing: PySpark (optional for larger datasets)
The project uses a food delivery dataset sourced from Kaggle, which contains details such as:
- Restaurant and delivery location coordinates (latitude and longitude).
- Delivery driver attributes (e.g., age, vehicle type).
- Delivery time taken (in minutes).
-
Clone the repository:
git clone https://github.com/your-username/food-delivery-time-estimation.git
-
Navigate to the project directory:
cd food-delivery-time-estimation -
Install dependencies:
pip install -r requirements.txt
-
Run the application:
streamlit run app.py
-
Open the application in your browser.
-
Navigate between the following tabs:
- Application: Input details to estimate delivery time.
- Data Analysis: Explore and visualize the dataset.
- README: View project documentation directly within the app.
-
Choose a machine learning algorithm and provide necessary inputs like restaurant and customer coordinates, vehicle type, and driver age.
-
View estimated delivery time and additional model accuracy metrics.
- Clean and preprocess the dataset.
- Compute the distance between restaurant and delivery location using the Haversine formula.
- Convert categorical variables (e.g., vehicle type) to numerical using one-hot encoding.
- Train multiple machine learning models using Scikit-learn.
- Evaluate models based on metrics like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared.
- Accept user inputs via the Streamlit interface.
- Perform predictions using the selected machine learning model.
- Display results along with optional performance metrics.
The project includes detailed EDA with:
- Distribution analysis of key variables.
- Correlation heatmaps.
- Summary statistics for numeric data.
In which, we calculated the correlation between each numerical column and 'Time_taken(min)` column to identify the relevant variables that impact the delivery time.

A scatter plot was created to represent the geographic locations of restaurants and delivery points using latitude and longitude. Using the defined Haversine method, a new column 'distance' was added to the Data Frame for further analysis.
def haversine(lat1, lon1, lat2, lon2):
R = 6371 # Το ρ της γης είναι 6371 km.
# Μετατροπή από γεωγραφικού πλάτους και μήκους σε radians.
lat1 = mt.radians(lat1)
lon1 = mt.radians(lon1)
lat2 = mt.radians(lat2)
lon2 = mt.radians(lon2)
# Υπολογισμός διαφερός κάθε γεωγραφικής θέσεις
Δlat = lat2 - lat1
Δlon = lon2 - lon1
# Φόρμουλα Haversine
a = mt.sin(Δlat/2)**2 + mt.cos(lat1) * mt.cos(lat2) * mt.sin(Δlon/2)**2
c = 2 * mt.atan2(mt.sqrt(a), mt.sqrt(1-a))
d = R * c
return dBy analysing the distribution of delivery times based on the type of vehicle, we ploted the kernel density estimation (KDE) plot.

To visualise the age distribution of delivery drivers using a histogram.

The models were evaluated based on:
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
- R-squared (R²)
Random Forest Regression provided the best results with the highest R² score and lowest error metrics.
The vast array of machine learning models such as: Random Forest Regression, Gradient Boosting Regression, Linear Regression, Decision Tree Regression, Extra Trees Fores and K-Neighbors Regression.
Furthermore, within the confines of the Web Interface the user can see the Exploratory Data Analysis and analyse the data, as we did, step by step. Aiding in the knowledge and total understanding of Data Analysis in general.
Include screenshots here to showcase the Streamlit application, demonstrating features such as: Input forms for predictions. Model selection options. Prediction results and data visualization tabs.
- Incorporate additional features like weather and traffic data.
- Support larger datasets using distributed computing with PySpark.
- Optimize models for better performance.
- Vasileios Katotomichelakis (Π2020132)
- Charalampos Makrylakis (Π2019214)
This project is licensed under the MIT License. See the LICENSE file for details.
Special thanks to Kaggle for the dataset and open-source libraries used in this project.






