Data Engineering Zoomcamp Cohort 2026

This Repo contain coursework and final project for the Data Engineering Zoomcamp by DataTalks.Club (Cohort 2026).

About the Course

The Data Engineering Zoomcamp is a free course covering the fundamentals of data engineering — from containerization and infrastructure-as-code to batch and stream processing. It is taught by Alexey Grigorev and the DataTalks.Club team.

Technologies & Tools

Area	Tools
Containerization	Docker, Docker Compose
Infrastructure as Code	Terraform (GCP & AWS)
Workflow Orchestration	Kestra
Data Ingestion	dlt (data load tool)
Data Warehouse	BigQuery, DuckDB
Cloud Storage	AWS S3 (Parquet, partitioned)
Analytics Engineering	dbt
Data Platforms	Bruin
Batch Processing	Apache Spark
Stream Processing	Apache Kafka, Apache Flink
Dashboard	Streamlit, Plotly
Language	Python, SQL

Homework

All homework solutions for each module are in the cohorts/2026/ directory:

Module	Topic	Homework
1	Docker & Terraform
2	Workflow Orchestration (Kestra)
3	Data Warehouse (BigQuery)
4	Analytics Engineering (dbt)
5	Data Platforms (Bruin)
6	Batch Processing (Spark)
7	Stream Processing (Kafka & Flink)
Workshop 1	Data Ingestion with dlt

Final Project — 🌍 Global Earthquake Analytics Dashboard

An end-to-end data engineering pipeline analyzing ~168,000 global earthquake events (2020–2025) from the USGS Earthquake Hazards Program, with cloud infrastructure on AWS and an interactive Streamlit dashboard.

Full Project README →.

Pipeline Architecture

USGS API ──► CSV (Data Lake + S3) ──► DuckDB + S3 Parquet (Warehouse) ──► dbt ──► Streamlit
                                              ▲
                                        Terraform (AWS IaC)

Key Features

Cloud Infrastructure — Terraform provisions AWS S3 buckets (data lake + warehouse) with versioning & lifecycle policies
Ingestion — Fetches earthquake data from the USGS REST API in quarterly chunks; uploads to S3 data lake
Warehouse — Loads into DuckDB with sorted tables for zone-map optimization; exports partitioned Parquet to S3
Transformations — dbt staging + fact table + 3 mart models; 15 schema tests (all passing)
Dashboard — 4 interactive tiles: temporal trends, magnitude distribution, top regions, global earthquake map
Orchestration — Makefile runs the full pipeline (make all) + infra (make infra-up) + dashboard (make dashboard)

Dashboard Highlights

Tile	Description
📈 Earthquake Activity Over Time	Monthly count colored by avg magnitude
📊 Magnitude Distribution	Pie + stacked bar (Minor → Great)
🏔️ Top Active Regions	20 most earthquake-prone regions
🗺️ Global Earthquake Map	Interactive scatter geo with 5,000 sampled events

Quick Start

cd earthquake-analytics
pip install -r requirements.txt
make infra-up    # (optional) provision AWS S3
make all         # Run full pipeline (ingest → load → transform)
make dashboard   # Launch Streamlit dashboard

Repository Structure

.
├── home/                # Homework solutions for each module
│   ├── 01/
│   ├── 02/
│   ├── 03/
│   ├── 04/
│   ├── 05/
│   ├── 06/
│   ├── 07/
│   └── workshop/
├── earthquake-analytics/        # Final project
│   ├── pipeline/                #   Ingestion & warehouse loading
│   ├── dbt_earthquake/          #   dbt models & tests
│   ├── dashboard/               #   Streamlit app
│   ├── terraform/               #   AWS S3 infrastructure (IaC)
│   └── Makefile                 #   Pipeline orchestration
├── 01-docker-terraform/         # Course material (modules 1-7)
├── 02-workflow-orchestration/
├── ...
└── README.md

Acknowledgements

Thanks to Alexey Grigorev and the entire DataTalks.Club team for offering this incredible course for free. Special thanks to all the instructors and the community on Slack for their support throughout the cohort.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Zoomcamp Cohort 2026

About the Course

Technologies & Tools

Homework

Final Project — 🌍 Global Earthquake Analytics Dashboard

Pipeline Architecture

Key Features

Dashboard Highlights

Quick Start

Repository Structure

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
earthquake-analytics		earthquake-analytics
home		home
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Zoomcamp Cohort 2026

About the Course

Technologies & Tools

Homework

Final Project — 🌍 Global Earthquake Analytics Dashboard

Pipeline Architecture

Key Features

Dashboard Highlights

Quick Start

Repository Structure

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages