Predictive Analysis for UK Road Accident Data (2020)

Overview

This project conducts a comprehensive analysis of the UK road accident data for the year 2020. The primary goal is to provide actionable insights for policy-making to reduce the number of accidents and their severity. Using statistical and machine learning techniques, this project identifies patterns, determines key contributing factors, and builds predictive models to classify accidents based on severity.

Dataset

The analysis utilizes four key tables extracted from the database:

Accident: Contains accident-related information.
Vehicle: Details about the vehicles involved.
Casualty: Information about the casualties.
LSOA: Geospatial data about accident locations.

The data was preprocessed, cleaned, and integrated into a unified format for analysis.

Features

Outlier Analysis: Detected and treated outliers using Grubb’s test.
Accident Analysis by Time and Day:
- Peak accident times: 8 AM to 8 PM, with a spike at 5 PM.
- Fridays recorded the highest number of accidents, Sundays the least.
Motorcycle and Pedestrian Analysis:
- Motorbike accidents peaked during rush hours, with engine capacity influencing accident patterns.
- Pedestrian accidents were most common between 3 PM and 5 PM on weekdays.
Clustering:
- K-Means clustering identified accident hotspots in Kingston upon Hull.
- Visualized clusters on a map using the folium package.
Association Rule Mining: Leveraged the Apriori algorithm to uncover significant patterns contributing to accident severity.

Predictive Modeling

Models Implemented:

Decision Tree Classifier:
- Accuracy: 74.03%
- Strengths: Balanced precision and recall.
- Weaknesses: Risk of overfitting.
K-Neighbors Classifier:
- Accuracy: 66.05%
- Strengths: Slightly better at identifying fatal cases.
- Weaknesses: Lower precision.
Random Forest Classifier:
- Accuracy: 83.37%
- Strengths: High accuracy and recall for fatalities.
- Weaknesses: Complexity.
Gaussian Naive Bayes Classifier:
- Accuracy: 54.47%
- Strengths: High recall for fatalities.
- Weaknesses: Low precision.
Stacked Ensemble Model:
- Combined predictions of all models using Logistic Regression.
- Improved overall stability and robustness.

Installation

Clone the repository:

git clone https://github.com/your-username/uk-road-accident-analysis.git
cd uk-road-accident-analysis

Install dependencies:
```
pip install -r requirements.txt
```
Set up the database and extract data:
- Execute SQL queries to extract data from the provided schema.
- Load the data into the data/ directory.

Usage

Data Preprocessing:
```
python preprocess.py
```
Run Analysis:
```
python analyze.py
```
Train Models:
```
python train_models.py
```
Visualize Results:
```
python visualize_clusters.py
```

Results

Accident Trends:
- Peak times: 8 AM to 8 PM, especially at 5 PM.
- Fridays had the most accidents, Sundays the least.
Predictive Modeling:
- Random Forest Classifier emerged as the best individual model.
- Stacked ensemble showed improved stability and predictive performance.
Clustering:
- Identified three key accident clusters in Kingston upon Hull.

Applications

Policy Recommendations:
- Increase traffic regulation during peak hours.
- Implement awareness campaigns for motorbike safety.
- Improve pedestrian infrastructure near schools.
Proactive Safety Measures:
- Use predictive models to forecast potential accident hotspots.

Future Work

Enhance clustering analysis with additional geospatial features.
Explore more advanced ensemble techniques.
Incorporate real-time data for dynamic modeling.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Accident Data Analysis.ipynb		Accident Data Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Analysis for UK Road Accident Data (2020)

Overview

Dataset

Features

Predictive Modeling

Models Implemented:

Installation

Usage

Results

Applications

Future Work

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predictive Analysis for UK Road Accident Data (2020)

Overview

Dataset

Features

Predictive Modeling

Models Implemented:

Installation

Usage

Results

Applications

Future Work

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages