Skip to content

chore: retire ETL stage from unified pipeline#206

Merged
DSuveges merged 1 commit into
2606-public-releasefrom
chore/retire-etl-stage
Jun 8, 2026
Merged

chore: retire ETL stage from unified pipeline#206
DSuveges merged 1 commit into
2606-public-releasefrom
chore/retire-etl-stage

Conversation

@d0choa

@d0choa d0choa commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

The ETL stage in the unified pipeline DAG has had zero step consumers since #195. This PR retires the orphan configuration, loaders, DAG stage function, and supporting operator/enum entries.

Targets the 2606-public-release branch (PR #201) so it lands together with the public release work.

Removed

  • clusters.yaml: etl and etl_literature clusters, plus the step_job_properties.etl block.
  • unified_pipeline.yaml: etl_version and the etl_literature step entry.
  • etl.conf: deleted entirely.
  • config/unified_pipeline.py: the etl AppConfig loader (and its PPP overlay), etl_version / etl_jar_origin_uri, the now-unused jar_uri() helper, and the exts map in config_uri().
  • dags/unified_pipeline.py: the etl_stage() function, its call site, and the imports that only it used (ETLJobBuilder, CopyBlobOperator, to_hocon).
  • operators/dataproc.py: ETLJobBuilder class.
  • models/step.py: UnifiedPipelineStage.ETL enum entry.
  • operators/diff.py: refreshed docstring examples that referenced the removed etl_stage task IDs (now use pts_target).

Net diff: +8 / -410 across 8 files.

Test plan

  • uv run ruff check src/ tests/ — passes
  • uv run ruff check . (excluding untracked locals) — passes
  • uv run deptry . (excluding untracked locals) — passes
  • uv run pytest tests/ — 161 passed, 1 xfailed
  • Render the unified_pipeline DAG in Airflow to confirm it loads without the ETL stage

The ETL stage in the unified pipeline DAG has had zero step consumers
since PR #195. Remove the now-orphan configuration, loaders, DAG stage
function, and supporting operator/enum entries:

- clusters.yaml: drop the `etl` and `etl_literature` clusters plus the
  `step_job_properties.etl` block.
- unified_pipeline.yaml: drop `etl_version` and the `etl_literature`
  step entry.
- etl.conf: deleted (no longer loaded).
- config/unified_pipeline.py: drop the `etl` AppConfig loader (and its
  PPP overlay), `etl_version`/`etl_jar_origin_uri`, the now-unused
  `jar_uri()` helper, and the `exts` map in `config_uri()`.
- dags/unified_pipeline.py: drop the `etl_stage()` function, its call,
  and the imports that only it used (`ETLJobBuilder`, `CopyBlobOperator`,
  `to_hocon`).
- operators/dataproc.py: delete `ETLJobBuilder`.
- models/step.py: drop `UnifiedPipelineStage.ETL`.
- operators/diff.py: refresh docstring examples that referenced the
  removed `etl_stage` task IDs.

@DSuveges DSuveges left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for cleaning the orchestration from the scala etl specific logic and configuration.

  • Removal of etl config
  • Removal of etl steps from the unified pipeline
  • Removal of etl/etl literature pipeline definitions.
  • Removal of logic on how etl steps should be called

@DSuveges DSuveges merged commit c6d5035 into 2606-public-release Jun 8, 2026
2 checks passed
@DSuveges DSuveges deleted the chore/retire-etl-stage branch June 8, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants