Skip to content

2606 public release#201

Merged
project-defiant merged 58 commits into
devfrom
2606-public-release
Jun 17, 2026
Merged

2606 public release#201
project-defiant merged 58 commits into
devfrom
2606-public-release

Conversation

@DSuveges

@DSuveges DSuveges commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Upon prepping for the 2026.06 release, the following changes were made in the configuration and orchestration framework.

Config changes

Unified pipeline config

  • PTS version bumped
  • PIS version bumped
  • ETL removed
  • Gentropy version bumped
  • ChEMBL version bumped
  • EFO version bumped
  • Depmap version bumped
  • MONDO version bumped
  • Curation version bumped
  • GETx version bumped
  • STRING version bumped
  • Heritability step added
  • PTS literature step added
  • PTS openFDA step added
  • PTS literature steps are added
  • PTS timeseries view generation is now not PPP specific
  • Release metrics turned on

Clusters

  • PTS cluster is multi-node, autoscaling enabled
  • ETL cluster configuration removed
  • PTS literature cluster config added

PIS

  • Clinical mining update
  • PanelApp update
  • Literature dating update
  • Heritability estimate update

PTS

  • Baseline expression aggregation config added
  • OpenFDA added from ETL
  • Target update with Uniprot SSL
  • DepMap file change follow up.
  • Clinical precedence QC tag update
  • Onotoma literature step added
  • drug_molecule QC tag update
  • clinical_report update with new summaries and LLM extraction results
  • Evidence p-value scaling is now in config.
  • EuopePMC evidence is now parquet
  • Search facet uses reactome
  • STRING DB version fixed

ETL

  • ETL configuration dropped entirely

Gentropy

  • L2G feature matrix generation updated: interactions considered, training on partial dataset

Code changes

For certain changes it was not enough to update configuration but the logic how the configuration is interpreted.

  • Remove logic to manage ETL steps from unified pipeline.py
  • ETL job guilder dropped

ireneisdoomed and others added 22 commits June 17, 2026 13:42
- The file has the same schema and identical format
- The new release has 33k fewer lines, which might indicate we are not getting all ratings. It might not impact the number of evidence and associations at the end.
The ETL stage in the unified pipeline DAG has had zero step consumers
since PR #195. Remove the now-orphan configuration, loaders, DAG stage
function, and supporting operator/enum entries:

- clusters.yaml: drop the `etl` and `etl_literature` clusters plus the
  `step_job_properties.etl` block.
- unified_pipeline.yaml: drop `etl_version` and the `etl_literature`
  step entry.
- etl.conf: deleted (no longer loaded).
- config/unified_pipeline.py: drop the `etl` AppConfig loader (and its
  PPP overlay), `etl_version`/`etl_jar_origin_uri`, the now-unused
  `jar_uri()` helper, and the `exts` map in `config_uri()`.
- dags/unified_pipeline.py: drop the `etl_stage()` function, its call,
  and the imports that only it used (`ETLJobBuilder`, `CopyBlobOperator`,
  `to_hocon`).
- operators/dataproc.py: delete `ETLJobBuilder`.
- models/step.py: drop `UnifiedPipelineStage.ETL`.
- operators/diff.py: refresh docstring examples that referenced the
  removed `etl_stage` task IDs.
Mirror the PTS config change (opentargets/pts#142): the chembl_molecule step
now reads input/clinical_report/aact_extraction_batch_results to mine
clinical-trial (AACT) synonyms. That input is staged by pis_clinical_report, so
pts_chembl_molecule now also depends on it in the unified pipeline (otherwise
the DAG could run chembl_molecule before the AACT batch is present).
…_molecule dep

Mirror opentargets/pis#202: download aact_extraction_batch_results in the PIS
drug step (clinical_report glob split into top-level / aact / chembl subtrees to
exclude it). Since pts_chembl_molecule already depends on pis_drug, revert the
earlier pts_chembl_molecule -> pis_clinical_report edge — the DAG dependencies
stay as they were.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DSuveges DSuveges force-pushed the 2606-public-release branch from e54ccc5 to 418f3e9 Compare June 17, 2026 12:43
@DSuveges DSuveges marked this pull request as ready for review June 17, 2026 13:22
@DSuveges DSuveges requested a review from project-defiant June 17, 2026 13:22
Removed 'UNVALIDATED_INDICATION' from invalid clinical report QC settings.
@DSuveges This change was uncommitted in Airflow. Can you confirm this is correct?
@project-defiant project-defiant merged commit b275f10 into dev Jun 17, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants