Skip to content

lakhanqurban/data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Data Engineering Zoomcamp Cohort 2026

This Repo contain coursework and final project for the Data Engineering Zoomcamp by DataTalks.Club (Cohort 2026).

About the Course

The Data Engineering Zoomcamp is a free course covering the fundamentals of data engineering — from containerization and infrastructure-as-code to batch and stream processing. It is taught by Alexey Grigorev and the DataTalks.Club team.

Technologies & Tools

Area Tools
Containerization Docker, Docker Compose
Infrastructure as Code Terraform (GCP & AWS)
Workflow Orchestration Kestra
Data Ingestion dlt (data load tool)
Data Warehouse BigQuery, DuckDB
Cloud Storage AWS S3 (Parquet, partitioned)
Analytics Engineering dbt
Data Platforms Bruin
Batch Processing Apache Spark
Stream Processing Apache Kafka, Apache Flink
Dashboard Streamlit, Plotly
Language Python, SQL

Homework

All homework solutions for each module are in the cohorts/2026/ directory:

Module Topic Homework
1 Docker & Terraform
2 Workflow Orchestration (Kestra)
3 Data Warehouse (BigQuery)
4 Analytics Engineering (dbt)
5 Data Platforms (Bruin)
6 Batch Processing (Spark)
7 Stream Processing (Kafka & Flink)
Workshop 1 Data Ingestion with dlt

Final Project — 🌍 Global Earthquake Analytics Dashboard

An end-to-end data engineering pipeline analyzing ~168,000 global earthquake events (2020–2025) from the USGS Earthquake Hazards Program, with cloud infrastructure on AWS and an interactive Streamlit dashboard.

Full Project README →.

Pipeline Architecture

USGS API ──► CSV (Data Lake + S3) ──► DuckDB + S3 Parquet (Warehouse) ──► dbt ──► Streamlit
                                              ▲
                                        Terraform (AWS IaC)

Key Features

  • Cloud Infrastructure — Terraform provisions AWS S3 buckets (data lake + warehouse) with versioning & lifecycle policies
  • Ingestion — Fetches earthquake data from the USGS REST API in quarterly chunks; uploads to S3 data lake
  • Warehouse — Loads into DuckDB with sorted tables for zone-map optimization; exports partitioned Parquet to S3
  • Transformations — dbt staging + fact table + 3 mart models; 15 schema tests (all passing)
  • Dashboard — 4 interactive tiles: temporal trends, magnitude distribution, top regions, global earthquake map
  • Orchestration — Makefile runs the full pipeline (make all) + infra (make infra-up) + dashboard (make dashboard)

Dashboard Highlights

Tile Description
📈 Earthquake Activity Over Time Monthly count colored by avg magnitude
📊 Magnitude Distribution Pie + stacked bar (Minor → Great)
🏔️ Top Active Regions 20 most earthquake-prone regions
🗺️ Global Earthquake Map Interactive scatter geo with 5,000 sampled events

Quick Start

cd earthquake-analytics
pip install -r requirements.txt
make infra-up    # (optional) provision AWS S3
make all         # Run full pipeline (ingest → load → transform)
make dashboard   # Launch Streamlit dashboard

Repository Structure

.
├── home/                # Homework solutions for each module
│   ├── 01/
│   ├── 02/
│   ├── 03/
│   ├── 04/
│   ├── 05/
│   ├── 06/
│   ├── 07/
│   └── workshop/
├── earthquake-analytics/        # Final project
│   ├── pipeline/                #   Ingestion & warehouse loading
│   ├── dbt_earthquake/          #   dbt models & tests
│   ├── dashboard/               #   Streamlit app
│   ├── terraform/               #   AWS S3 infrastructure (IaC)
│   └── Makefile                 #   Pipeline orchestration
├── 01-docker-terraform/         # Course material (modules 1-7)
├── 02-workflow-orchestration/
├── ...
└── README.md                

Acknowledgements

Thanks to Alexey Grigorev and the entire DataTalks.Club team for offering this incredible course for free. Special thanks to all the instructors and the community on Slack for their support throughout the cohort.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors