Data Engineering NYC Parking Violations

Data Engineering Project on NY Parking Violations (~50M) Data using Docker, Airflow, AWS, Snowflake, dbt, Tableau

Notes on the Project: Data Engineering NYC Parking Violations Notes

This data engineering project involves processing and analyzing NYC parking violations data, which consists of approximately 50 million records. Below is a brief overview of the workflow and the steps involved.

Introduction

This project aims to process and analyze NYC parking violation data(~ 50M) to derive insights. The workflow involves data ingestion, storage, transformation, and visualization using various tools and technologies.

Data Source

The data originates from NYC OpenData, which provides a large dataset on parking violations in New York City.

Data Ingestion and Storage

Apache Airflow is used to orchestrate the data pipeline. Airflow automates extracting the parking violation data from the NYC OpenData portal.
The extracted data is then stored in Amazon S3 (Simple Storage Service), which serves as the staging area for the raw data.

Data Warehousing

From Amazon S3, the data is loaded into Snowflake, a cloud-based data warehousing solution. This process involves transforming the data into a structured format suitable for analysis.

Data Transformation

dbt (data build tool) transforms the data within Snowflake. dbt allows for transforming raw data into a more refined state, ready for analysis.

Data Visualization and Reporting

Once the data is transformed and stored in Snowflake, Tableau is used for creating visualizations, dashboards, and reports. This step enables stakeholders to derive insights from the parking violations data.

Programming Languages and Tools

Python and SQL are the primary programming languages used throughout the project. Python is used for scripting and automation tasks, while SQL is used for querying and managing the data within Snowflake.
The entire workflow operates within a Docker container, ensuring a consistent environment for all components of the project.
The project runs on a Linux operating system, providing a robust and scalable platform for the data pipeline.

Project Setup

To set up the project, follow these steps:

Clone the repository:

git clone https://github.com/BadreeshShetty/Data-Engineering-ETL-Airflow-DBT-Parking.git
cd Data-Engineering-ETL-Airflow-DBT-Parking

Build and start the Docker containers:
```
docker-compose up --build -d
```
Access the Airflow web UI to monitor the data pipeline.

Usage

Trigger the Airflow DAG to start the data ingestion process.
Monitor the data extraction and loading into Amazon S3.
Check the data transformation process in Snowflake using dbt.
Access Tableau to create visualizations and reports based on the transformed data.

Video Link

Video Link: https://youtu.be/PNe9POTPx4I

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
airflow		airflow
data		data
dbt		dbt
notebooks		notebooks
script		script
snowflake-queries		snowflake-queries
visualizations		visualizations
.gitattributes		.gitattributes
.gitignore		.gitignore
County_Precinct.csv		County_Precinct.csv
Dockerfile		Dockerfile
NYParkingViolationsArchitecture.png		NYParkingViolationsArchitecture.png
NYParkingViolationsPPT.pptx		NYParkingViolationsPPT.pptx
OpenDataPortalMonthlyIssuance20160920.xlsx		OpenDataPortalMonthlyIssuance20160920.xlsx
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering NYC Parking Violations

Data Engineering Project on NY Parking Violations (~50M) Data using Docker, Airflow, AWS, Snowflake, dbt, Tableau

Table of Contents

Introduction

Data Source

Data Ingestion and Storage

Data Warehousing

Data Transformation

Data Visualization and Reporting

Programming Languages and Tools

Project Setup

Usage

Video Link

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Engineering NYC Parking Violations

Data Engineering Project on NY Parking Violations (~50M) Data using Docker, Airflow, AWS, Snowflake, dbt, Tableau

Table of Contents

Introduction

Data Source

Data Ingestion and Storage

Data Warehousing

Data Transformation

Data Visualization and Reporting

Programming Languages and Tools

Project Setup

Usage

Video Link

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages