🎧 Spotify Data Engineering Project (Azure)

📌 Overview

This project showcases an end-to-end Azure Data Engineering pipeline built to process Spotify streaming datasets.
The primary focus is on duplicate data detection, data quality enforcement, and scalable analytics delivery using Azure-native services.

🏗️ Architecture

The solution follows the Medallion Architecture to ensure data reliability and performance.

🥉 Bronze Layer

Raw Spotify data ingestion (CSV/JSON)
Stored in Azure Data Lake Storage Gen2
Orchestrated via Azure Data Factory

🥈 Silver Layer

Data cleansing and schema standardization
Duplicate record removal using PySpark
Delta Lake used for ACID compliance

🥇 Gold Layer

Aggregated, analytics-ready datasets
Served via Azure Synapse Analytics

⚙️ Tech Stack

Layer	Technology
Ingestion	Azure Data Factory
Storage	Azure Data Lake Storage Gen2
Processing	Azure Databricks (PySpark)
Warehouse	Azure Synapse Analytics
Format	Delta Lake

🔄 Pipeline Flow

Ingest raw Spotify data into ADLS Gen2
Validate schema and metadata
Detect and eliminate duplicate records
Store cleansed data in Delta format
Aggregate data for analytics
Query curated datasets using Synapse

✨ Key Features

✔ End-to-end automated pipeline
✔ Duplicate data detection & removal
✔ Idempotent and re-runnable workflows
✔ Scalable medallion architecture
✔ Enterprise-grade data quality handling

📊 Analytics & Use Cases

Top streamed tracks and artists
Popularity trend analysis
User listening behavior insights
BI-ready datasets for reporting tools

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
databricks		databricks
dataset		dataset
factory		factory
files		files
linkedService		linkedService
pipeline		pipeline
sql		sql
README.md		README.md
loop_input.txt		loop_input.txt
publish_config.json		publish_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎧 Spotify Data Engineering Project (Azure)

📌 Overview

🏗️ Architecture

🥉 Bronze Layer

🥈 Silver Layer

🥇 Gold Layer

⚙️ Tech Stack

🔄 Pipeline Flow

✨ Key Features

📊 Analytics & Use Cases

About

Uh oh!

Releases

Packages

Languages

swapniltake1/spotify-data-engineering-project

Folders and files

Latest commit

History

Repository files navigation

🎧 Spotify Data Engineering Project (Azure)

📌 Overview

🏗️ Architecture

🥉 Bronze Layer

🥈 Silver Layer

🥇 Gold Layer

⚙️ Tech Stack

🔄 Pipeline Flow

✨ Key Features

📊 Analytics & Use Cases

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages