Skip to content

duonghieu7104/US_Accidents_DWH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚗 US Traffic Accident Data Warehouse (2016–2023)

📌 Project Overview

This project aims to build a comprehensive data warehouse system to manage and analyze traffic accident data in the United States from 2016 to 2023. Leveraging public datasets from Kaggle, the team implemented a full ETL pipeline using SSIS, built analytical cubes with SSAS, and visualized insights through Power BI dashboards. The system helps stakeholders such as government agencies and researchers identify risk factors and propose effective safety measures.


🎯 Objectives

  • Design a Kimball-style dimensional data warehouse.
  • Enable multidimensional queries to analyze accident causes, time, location, and other related attributes.
  • Deliver interactive dashboards that provide actionable insights for traffic safety improvement.

🧠 Technologies Used

Component Tools / Technologies
ETL SQL Server Integration Services (SSIS)
Data Warehouse SQL Server 2022
OLAP & Cubes SQL Server Analysis Services (SSAS)
Data Visualization Power BI
Development Environment Visual Studio 2022
Dataset Source Kaggle: US Accidents (7.7M records)

🗂️ System Architecture

  • Fact Table: FactAccident

  • Dimension Tables:

    • DimDate
    • DimLocation
    • DimDriver
    • DimVehicle
    • DimWeather
    • DimTwilight
    • DimRoadFeature
    • DimSpeedLimit

📌 Grain: Each fact record represents a single traffic accident.


🔄 Implementation Process

  1. Data collection and preprocessing (from Kaggle)
  2. Dimensional model design (Star Schema)
  3. ETL pipeline built using SSIS
  4. OLAP Cube creation using SSAS
  5. Analysis via SSAS and Excel PivotTables
  6. Interactive dashboards built in Power BI

📊 Power BI Dashboards

1. Accidents by Location

  • Miami, Los Angeles, and Houston have the highest number of accidents.
  • States like California and Florida lead in both accident count and vehicle damage.

2. Environmental & Driver Factors

  • Age groups 26–45 are most accident-prone due to higher mobility.
  • Most accidents occur under clear or partly cloudy weather, with dry road surfaces being the most common.
  • Daytime has more accidents, but nighttime accidents are more severe.

3. Time-based Trends

  • Accidents peak during rush hours (3 PM – 6 PM) and Fridays.
  • December has the highest accident count; January and February the lowest.
  • Weather severity in mid-year months increases accident impact.

📂 Dataset References


📚 Files and Artifacts

  • Vehicle.csv — Vehicle and driver information
  • US_Accidents.csv — Traffic accident details
  • .dtsx — SSIS packages for ETL
  • .bim — SSAS cube definition
  • .pbix — Power BI dashboards

🚀 Future Improvements

  • Integrate machine learning models to predict accident risk.
  • Incorporate real-time data sources such as weather APIs and traffic cameras.
  • Schedule and automate ETL jobs for continuous data refresh.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors