Skip to content

feat: add Databricks Platform Migration workshop and module#61

Open
devin-ai-integration[bot] wants to merge 4 commits into
mainfrom
devin/1778091246-databricks-migration-workshop
Open

feat: add Databricks Platform Migration workshop and module#61
devin-ai-integration[bot] wants to merge 4 commits into
mainfrom
devin/1778091246-databricks-migration-workshop

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented May 6, 2026

Summary

Adds a new Databricks Platform Migration workshop and module targeting Databricks-user audiences. The narrative: "Your team has standard data engineering code (PySpark, Airflow, dbt, SQL ETL) — now convert it to Databricks-native equivalents."

New files

  • modules/data-engineering/databricks-platform-migration.md — Challenge module covering PySpark to Databricks Notebooks, Airflow to Databricks Workflows, dbt-bigquery to dbt-databricks, Parquet to Delta Lake.
  • workshops/databricks-migration/README.md — Two-track workshop (6 labs) plus scheduled O&M section:
    • Track A (Open-Source to Databricks): Labs A1-A3 run as parallel Devin sessions on streamify-data-engineering
    • Track B (Legacy to Databricks): Labs B1-B3 cover CDW to Delta Lake, SQL ETL to PySpark notebooks, data quality framework
    • Scheduled Maintenance: 5 recurring O&M prompts (Delta table maintenance, dbt drift detection, notebook code quality, dependency hygiene, data quality monitoring)

Updated files

  • modules/data-engineering/README.md — Added new module and repo entries
  • workshops/README.md — Added Databricks Migration to workshops table
  • catalog/repos.md — Enriched streamify-data-engineering description, added etl-workflow (alphabetically ordered)
  • catalog/upstream-map.yaml — Added etl-workflow entry (alphabetically ordered)

Review & Testing Checklist for Human

  • Run at least one Track A prompt on streamify-data-engineering to verify Devin output quality
  • Verify all repo links resolve to existing org repos
  • Review parallel session design (Track A) for merge conflict risk
  • Check module/workshop cross-references are consistent

Notes

  • streamify-data-engineering has no explicit LICENSE file (upstream: ankurchavda/streamify)
  • Track A designed as parallelization showcase: 3 sessions, 1 repo, 3 PRs
  • Scheduled Maintenance section added for Databricks O&M narrative (Delta Lake maintenance, drift detection, code quality, dependency hygiene, data quality monitoring)

Link to Devin session: https://partner-workshops.devinenterprise.com/sessions/e069cd377a4e4fc3b94f08fa1f2d6295
Requested by: @bsmitches


Open in Devin Review

- New module: modules/data-engineering/databricks-platform-migration.md
  Converts standard PySpark/Airflow/dbt stacks to Databricks Notebooks,
  Delta Lake, Workflows, and dbt-databricks. Uses streamify-data-engineering
  and uc-data-source-migration-legacy-to-modern repos.

- New workshop: workshops/databricks-migration/README.md
  Two tracks (6 labs) for Databricks-user audiences:
  Track A: Open-source stack → Databricks Lakehouse (parallel sessions)
  Track B: Legacy data → Databricks Lakehouse

- Updated data-engineering modules README with new module and repo entries
- Updated workshops README with new workshop listing
- Updated catalog repos.md with enriched streamify-data-engineering entry
  and new etl-workflow entry
- Updated upstream-map.yaml with etl-workflow entry
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration[bot]

This comment was marked as resolved.

… maintenance section

- Move etl-workflow entry to correct alphabetical position in
  upstream-map.yaml (between dotnet-modular-monolith-fe-react and fineract)
  and repos.md (between angular-1.x-dashboard and ts-informatica-powercenter)
- Add Scheduled Maintenance section to Databricks Migration workshop with
  5 recurring O&M prompts (Delta table maintenance, dbt drift detection,
  notebook code quality, dependency hygiene, data quality monitoring)
devin-ai-integration[bot]

This comment was marked as resolved.

- Add etl-workflow to module Repositories section with full detailed
  section (anchor, repo link, Step 1-4 instructions)
- Remove explicit upstream_url: null from etl-workflow in upstream-map.yaml
  to match pattern of other original repos
- Update modules/README.md navigation index: add Databricks Platform
  Migration and COBOL Copybook rows to Data Engineering table, update
  module count from 7 to 9
devin-ai-integration[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant