Thanks for helping improve Data Engineer Prep. This repo is meant to stay practical, curated, and interview-focused. Small fixes are welcome, and high-signal additions are even better.
- Typo fixes, broken-link fixes, and clearer explanations.
- New quiz questions with concise answers.
- Stronger examples for existing PySpark, SQL, data modeling, or AI-for-data-engineering topics.
- Zephyr Coffee Co. scenario ideas that make a concept feel realistic.
- Company interview patterns based on public sources or your own experience.
- External resources with a short reason for why they are better than what is already linked.
- Random link dumps.
- Self-promotional content.
- AI-generated filler that has not been checked by a human.
- Exact live interview questions copied verbatim.
- Proprietary datasets, take-home assignments, internal rubrics, interviewer names, or confidential offer details.
The curation is the point. If a link, question, or explanation does not make the reader more prepared, leave it out.
Before opening a pull request, check that your change:
- Is directly useful for data engineering interview prep.
- Matches the existing tone: practical, direct, and no fluff.
- Explains the "why", not only the "what".
- Uses correct Markdown links and relative paths.
- Avoids unsupported claims. If a fact depends on a source, cite it.
- Keeps examples small enough to understand, but realistic enough to matter.
Company interview pages are useful only when they are trustworthy. Please follow these rules carefully.
For community-sourced research:
- Copy
company_interviews/_template_online.md. - Use public sources only, such as GeeksForGeeks, Glassdoor, Reddit, Medium, YouTube, Naukri, or candidate blogs.
- Add a
Sourcessection with actual URLs. - Summarize patterns in your own words. Do not copy-paste from other sites.
- Prefer question patterns over exact wording.
For personal interview experiences:
- Copy
company_interviews/_template_personal.md. - Share round structure, topic areas, difficulty, prep strategy, and what the interview was testing.
- Do not include confidential or exact question text. Rephrase as a pattern.
- Include a "Delta from online.md" section if an
online.mdfile exists for that company.
When in doubt, describe the type of question instead of the question itself.
New notebooks should:
- Be self-contained or clearly explain how to load any sample data.
- Start with the business problem before the code.
- Use the Zephyr Coffee Co. framing where it helps.
- Include a walkthrough, at least one exercise, and a solution section.
- Keep outputs clean before committing.
- Include a Colab badge if the notebook is meant to run in Colab.
If you add a notebook, update the relevant module README and the root README.md if needed.
Quizzes should use the existing collapsible Q&A style:
- Start with basics.
- Move into intermediate judgment.
- End with senior/interview-style tradeoffs.
- Keep answers short enough to review the night before an interview.
Before opening a PR:
- Run a spell check or at least read the changed file end to end.
- Click any new links you added.
- Check that relative links work from the file's folder.
- Make sure no private or confidential information is included.
- Keep the PR focused. One topic per PR is easiest to review.
This repo values clear, battle-tested explanations over volume. A short correction that prevents someone from learning the wrong thing is a great contribution.