This project aims to build a comprehensive data warehouse system to manage and analyze traffic accident data in the United States from 2016 to 2023. Leveraging public datasets from Kaggle, the team implemented a full ETL pipeline using SSIS, built analytical cubes with SSAS, and visualized insights through Power BI dashboards. The system helps stakeholders such as government agencies and researchers identify risk factors and propose effective safety measures.
- Design a Kimball-style dimensional data warehouse.
- Enable multidimensional queries to analyze accident causes, time, location, and other related attributes.
- Deliver interactive dashboards that provide actionable insights for traffic safety improvement.
| Component | Tools / Technologies |
|---|---|
| ETL | SQL Server Integration Services (SSIS) |
| Data Warehouse | SQL Server 2022 |
| OLAP & Cubes | SQL Server Analysis Services (SSAS) |
| Data Visualization | Power BI |
| Development Environment | Visual Studio 2022 |
| Dataset Source | Kaggle: US Accidents (7.7M records) |
-
Fact Table:
FactAccident -
Dimension Tables:
DimDateDimLocationDimDriverDimVehicleDimWeatherDimTwilightDimRoadFeatureDimSpeedLimit
📌 Grain: Each fact record represents a single traffic accident.
- Data collection and preprocessing (from Kaggle)
- Dimensional model design (Star Schema)
- ETL pipeline built using SSIS
- OLAP Cube creation using SSAS
- Analysis via SSAS and Excel PivotTables
- Interactive dashboards built in Power BI
- Miami, Los Angeles, and Houston have the highest number of accidents.
- States like California and Florida lead in both accident count and vehicle damage.
- Age groups 26–45 are most accident-prone due to higher mobility.
- Most accidents occur under clear or partly cloudy weather, with dry road surfaces being the most common.
- Daytime has more accidents, but nighttime accidents are more severe.
- Accidents peak during rush hours (3 PM – 6 PM) and Fridays.
- December has the highest accident count; January and February the lowest.
- Weather severity in mid-year months increases accident impact.
Vehicle.csv— Vehicle and driver informationUS_Accidents.csv— Traffic accident details.dtsx— SSIS packages for ETL.bim— SSAS cube definition.pbix— Power BI dashboards
- Integrate machine learning models to predict accident risk.
- Incorporate real-time data sources such as weather APIs and traffic cameras.
- Schedule and automate ETL jobs for continuous data refresh.