Skip to content

Latest commit

 

History

History
96 lines (68 loc) · 5.12 KB

File metadata and controls

96 lines (68 loc) · 5.12 KB

Data Engineer Prep

License: MIT Last Commit Contributions Welcome

Prep for your next data engineering interview. Work through PySpark notebooks framed as real problems from Zephyr Coffee Co. (a fictional 200-store chain with messy data), review the theory docs before senior rounds, drill the quizzes the night before.

Built in the open. Contributions welcome — see below.


📚 Navigate


What's in here now

The PySpark module. READMEs inside guide you through it based on your level (beginner / intermediate / senior).

Hands-on notebooks (each framed as a Slack message from a Zephyr colleague asking you to solve a realistic problem):

Theory docs (10-min night-before-interview reviews):

Self-check quizzes (collapsible Q&A, 🟢 basics → ⚡ senior judgment):


Roadmap

Phase 2 (next, no dates):

  • Null handling & deduplication notebook (Zephyr's 2023 POS duplicate incident)
  • Nested data notebook (exploding loyalty event structs)
  • Structured streaming notebook
  • Delta Lake notebook
  • Quiz + theory coverage for each

Phase 3:

  • SQL module (window functions in SQL, gaps-and-islands, SCDs, query optimization)
  • Python for DE module (collections, generators, pandas↔Spark, testing)
  • System design scenarios for DE interviews
  • DE interview question bank

The repo aims to be honest about what's built and what's not. No fake timelines.


Resources for Data Engineers

Thought leaders worth following

  1. Sumit Mittal — Founder of BigDataBySumit
  2. Joe Reis — Co-author of Fundamentals of Data Engineering
  3. Zach Wilson — Data engineering specialist
  4. Shashank Mishra — Data engineer & educator
  5. Gowtham SB — Big data & cloud
  6. Manish Kumar - For questions and interview experience
  7. Darshil Parmar - For Crisp DE Videos
  8. Ansh Lamba - Best for Azure and Databricks

Resource That I love

  1. Data Pathshala Preparation of Data Engineering by Manish Kumar
  2. Data Engineer Handbook DE Concepts

Resume examples


Contributing

Typo fixes, clearer explanations, new quiz questions, Zephyr scenario ideas, and blog-link additions (with a one-line justification for why it beats what's already linked) are all welcome. Open an issue or a PR.

Please don't send: random link dumps, self-promotional content, or AI-generated filler. The curation is the point — every external link in this repo was added because it's genuinely the best free resource for that topic, not because it exists.