Food Delivery Time Estimation

Overview

The Food Delivery Time Estimation project utilizes machine learning models to predict the delivery time of food orders based on various factors such as geographical distance, delivery driver attributes, and vehicle type. This project leverages Python and libraries like Pandas, Scikit-learn, and Streamlit to provide a user-friendly web interface for predictions and data analysis.

Features

Interactive Web Application: Built with Streamlit to provide a seamless user interface.
Multiple ML Algorithms: Supports models such as Random Forest, Gradient Boosting, Linear Regression, and more.
Data Visualization: Includes detailed exploratory data analysis with visualizations using Matplotlib and Seaborn.
Distance Calculation: Implements the Haversine formula for precise distance computation between locations.
Customizable Inputs: Allows users to input data dynamically for real-time predictions.

Technologies Used

Programming Language: Python
Data Processing: Pandas, Numpy
Machine Learning: Scikit-learn
Visualization: Matplotlib, Seaborn
Web Interface: Streamlit
Big Data Processing: PySpark (optional for larger datasets)

Dataset

The project uses a food delivery dataset sourced from Kaggle, which contains details such as:

Restaurant and delivery location coordinates (latitude and longitude).
Delivery driver attributes (e.g., age, vehicle type).
Delivery time taken (in minutes).

Installation

Clone the repository:

git clone https://github.com/your-username/food-delivery-time-estimation.git

Navigate to the project directory:
```
cd food-delivery-time-estimation
```
Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
streamlit run app.py
```

Usage

Open the application in your browser.
Navigate between the following tabs:
- Application: Input details to estimate delivery time.
- Data Analysis: Explore and visualize the dataset.
- README: View project documentation directly within the app.
Choose a machine learning algorithm and provide necessary inputs like restaurant and customer coordinates, vehicle type, and driver age.
View estimated delivery time and additional model accuracy metrics.

Project Workflow

1. Data Processing

Clean and preprocess the dataset.
Compute the distance between restaurant and delivery location using the Haversine formula.
Convert categorical variables (e.g., vehicle type) to numerical using one-hot encoding.

2. Model Training

Train multiple machine learning models using Scikit-learn.
Evaluate models based on metrics like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared.

3. Real-Time Predictions

Accept user inputs via the Streamlit interface.
Perform predictions using the selected machine learning model.
Display results along with optional performance metrics.

Exploratory Data Analysis (EDA)

The project includes detailed EDA with:

Distribution analysis of key variables.
Correlation heatmaps.
Summary statistics for numeric data.

Correlational Analysis

In which, we calculated the correlation between each numerical column and 'Time_taken(min)` column to identify the relevant variables that impact the delivery time.

Geospatial Visualisation

A scatter plot was created to represent the geographic locations of restaurants and delivery points using latitude and longitude. Using the defined Haversine method, a new column 'distance' was added to the Data Frame for further analysis.

Haversine Method

def haversine(lat1, lon1, lat2, lon2):
    R = 6371  # Το ρ της γης είναι 6371 km.


    # Μετατροπή από γεωγραφικού πλάτους και μήκους σε radians.
    lat1 = mt.radians(lat1)
    lon1 = mt.radians(lon1)
    lat2 = mt.radians(lat2)
    lon2 = mt.radians(lon2)


    # Υπολογισμός διαφερός κάθε γεωγραφικής θέσεις
    Δlat = lat2 - lat1
    Δlon = lon2 - lon1


    # Φόρμουλα Haversine
    a = mt.sin(Δlat/2)**2 + mt.cos(lat1) * mt.cos(lat2) * mt.sin(Δlon/2)**2
    c = 2 * mt.atan2(mt.sqrt(a), mt.sqrt(1-a))
    d = R * c


    return d

Temporal Analysis

By analysing the distribution of delivery times based on the type of vehicle, we ploted the kernel density estimation (KDE) plot.

Age Distribution

To visualise the age distribution of delivery drivers using a histogram.

Results

The models were evaluated based on:

Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Mean Absolute Error (MAE)
R-squared (R²)

Random Forest Regression provided the best results with the highest R² score and lowest error metrics.

Streamlit Web Interface

The vast array of machine learning models such as: Random Forest Regression, Gradient Boosting Regression, Linear Regression, Decision Tree Regression, Extra Trees Fores and K-Neighbors Regression.

Furthermore, within the confines of the Web Interface the user can see the Exploratory Data Analysis and analyse the data, as we did, step by step. Aiding in the knowledge and total understanding of Data Analysis in general.

Include screenshots here to showcase the Streamlit application, demonstrating features such as: Input forms for predictions. Model selection options. Prediction results and data visualization tabs.

Future Enhancements

Incorporate additional features like weather and traffic data.
Support larger datasets using distributed computing with PySpark.
Optimize models for better performance.

Contributors

Vasileios Katotomichelakis (Π2020132)
Charalampos Makrylakis (Π2019214)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Special thanks to Kaggle for the dataset and open-source libraries used in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
food-delivery-time.csv		food-delivery-time.csv
food-delivery-time.py		food-delivery-time.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Food Delivery Time Estimation

Overview

Features

Technologies Used

Dataset

Installation

Usage

Project Workflow

1. Data Processing

2. Model Training

3. Real-Time Predictions

Exploratory Data Analysis (EDA)

Correlational Analysis

Geospatial Visualisation

Haversine Method

Temporal Analysis

Age Distribution

Results

Streamlit Web Interface

Future Enhancements

Contributors

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Food Delivery Time Estimation

Overview

Features

Technologies Used

Dataset

Installation

Usage

Project Workflow

1. Data Processing

2. Model Training

3. Real-Time Predictions

Exploratory Data Analysis (EDA)

Correlational Analysis

Geospatial Visualisation

Haversine Method

Temporal Analysis

Age Distribution

Results

Streamlit Web Interface

Future Enhancements

Contributors

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages