feat(drug): download aact_extraction_batch_results in the drug step#202
Merged
Conversation
d0choa
added a commit
to opentargets/orchestration
that referenced
this pull request
Jun 11, 2026
…_molecule dep Mirror opentargets/pis#202: download aact_extraction_batch_results in the PIS drug step (clinical_report glob split into top-level / aact / chembl subtrees to exclude it). Since pts_chembl_molecule already depends on pis_drug, revert the earlier pts_chembl_molecule -> pis_clinical_report edge — the DAG dependencies stay as they were.
6fe831d to
6824ed6
Compare
The AACT clinical-trial batch extraction feeds chembl_molecule (opentargets/pts#142, tracking opentargets/issues#4414). It lives in its own standalone source (gs://ot-team/irene/clinical_mining/aact_extraction_batch_result/output), so the drug step copies it (via copy_many) to input/clinical_report/aact_extraction_batch_results. chembl_molecule already depends on pis_drug, so this needs no pts_chembl_molecule -> pis_clinical_report DAG edge. The clinical_report step is unchanged (the batch is no longer nested under its input tree).
6824ed6 to
5ac4a3f
Compare
DSuveges
pushed a commit
to opentargets/orchestration
that referenced
this pull request
Jun 11, 2026
…_molecule dep Mirror opentargets/pis#202: download aact_extraction_batch_results in the PIS drug step (clinical_report glob split into top-level / aact / chembl subtrees to exclude it). Since pts_chembl_molecule already depends on pis_drug, revert the earlier pts_chembl_molecule -> pis_clinical_report edge — the DAG dependencies stay as they were.
DSuveges
pushed a commit
to opentargets/orchestration
that referenced
this pull request
Jun 17, 2026
…_molecule dep Mirror opentargets/pis#202: download aact_extraction_batch_results in the PIS drug step (clinical_report glob split into top-level / aact / chembl subtrees to exclude it). Since pts_chembl_molecule already depends on pis_drug, revert the earlier pts_chembl_molecule -> pis_clinical_report edge — the DAG dependencies stay as they were.
project-defiant
pushed a commit
to opentargets/orchestration
that referenced
this pull request
Jun 17, 2026
* chore: update data and software versions * chore: add clinical_report llm dep * chore: update pis paths * fix: split ontoma into two steps to avoid circular issue * fix: gentropy version typo * chore: download essentiality from depmap directly * fix: point `pts_literature_publication_match` to `pts_ontoma_literature` * revert: essentiality task cannot pull from depmap * fix: update openfda config * fix: add missing target dep * fix: add missing dep for search_facet * fix: add string_version as a pts env variable * fix: remove qc flags from drug_molecule * fix: typo * fix: update essentiality filename * fix: add pts_target to pts_evidence_postprocess_clinical_precedence deps * chore: update pts * fix: split openfda subtasks into independent tasks (spark job goes idle) * fix: baseline_expression step only triggers a single spark job * fix: add pis_heritability to unified dag * fix: update score expression for some sources * chore: rename pts_ontoma_literature to run on literature cluster * perf: improve pts cluster settings to allow parallel jobs * perf: improve pts cluster settings to allow parallel jobs * fix: typo * fix(gentropy): add interactions * fix: baseline expression path typo * chore: avoid preemptible secondary workers in literature cluster * revert: baseline expression path typo * chore: uncomment metrics * fix(epmc): evidence format is parquet * chore: rename baseline_expression_aggregated to baseline_expression * chore(l2g): set `train_on_full_dataset` to false * chore: update pts to check target fix * chore: update pts to check target fix * chore(pis): updating PanelApp data source for 2026.05.11 release - The file has the same schema and identical format - The new release has 33k fewer lines, which might indicate we are not getting all ratings. It might not impact the number of evidence and associations at the end. * chore: bump chembl_version to 37 * chore: retire ETL stage from unified pipeline The ETL stage in the unified pipeline DAG has had zero step consumers since PR #195. Remove the now-orphan configuration, loaders, DAG stage function, and supporting operator/enum entries: - clusters.yaml: drop the `etl` and `etl_literature` clusters plus the `step_job_properties.etl` block. - unified_pipeline.yaml: drop `etl_version` and the `etl_literature` step entry. - etl.conf: deleted (no longer loaded). - config/unified_pipeline.py: drop the `etl` AppConfig loader (and its PPP overlay), `etl_version`/`etl_jar_origin_uri`, the now-unused `jar_uri()` helper, and the `exts` map in `config_uri()`. - dags/unified_pipeline.py: drop the `etl_stage()` function, its call, and the imports that only it used (`ETLJobBuilder`, `CopyBlobOperator`, `to_hocon`). - operators/dataproc.py: delete `ETLJobBuilder`. - models/step.py: drop `UnifiedPipelineStage.ETL`. - operators/diff.py: refresh docstring examples that referenced the removed `etl_stage` task IDs. * chore: enable pts_association_timeseries_view for non-PPP runs * refactor: delete etl config * chore(pts): propagating changes evidence_clinical_precedence config * fix: revert testing output * chore(uv): update lockfile * fix(pts): literature config added * fix(ot_crispr): study table is now exported in csv * feat(pts): wire aact_extraction_batch_results into chembl_molecule Mirror the PTS config change (opentargets/pts#142): the chembl_molecule step now reads input/clinical_report/aact_extraction_batch_results to mine clinical-trial (AACT) synonyms. That input is staged by pis_clinical_report, so pts_chembl_molecule now also depends on it in the unified pipeline (otherwise the DAG could run chembl_molecule before the AACT batch is present). * refactor(pis): move aact batch download to drug step, drop the chembl_molecule dep Mirror opentargets/pis#202: download aact_extraction_batch_results in the PIS drug step (clinical_report glob split into top-level / aact / chembl subtrees to exclude it). Since pts_chembl_molecule already depends on pis_drug, revert the earlier pts_chembl_molecule -> pis_clinical_report edge — the DAG dependencies stay as they were. * refactor(pis): point aact glob at standalone source, drop clinical_report split * refactor(pis): use copy_many for the aact batch download * chore: bump pis_version to 26.06.0-dev.2 and pts_version to 26.06.0-dev.4 * chore: remove unnecessary flag * chore: configuration updates * chore(pts): migrate partition_count configs from pts repo Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore: update for new run * fix(colocalisation): fixing gentropy tag (v3.3.0-dev.56) for cluster * fix(clinical_target): remove 'UNVALIDATED_INDICATION' flag Removed 'UNVALIDATED_INDICATION' from invalid clinical report QC settings. * fix(credible_set): add `pts_target` as dependency for `isTransQtl` @DSuveges This change was uncommitted in Airflow. Can you confirm this is correct? * chore(metrics): add pts_clinical_target as dependency --------- Co-authored-by: Irene Lopez <irene.lopezs@protonmail.com> Co-authored-by: David Ochoa <ochoa@ebi.ac.uk> Co-authored-by: root <root@inst-builder-debian-11-build-build-8rm9w.europe-west4-b.c.gce-image-builder.internal> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Irene López Santiago <45119610+ireneisdoomed@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Download the AACT clinical-trial batch extraction (
aact_extraction_batch_results) in the drug step.This input feeds
chembl_molecule(opentargets/pts#142, tracking issue opentargets/issues#4414). It lives in its own standalone source —gs://ot-team/irene/clinical_mining/aact_extraction_batch_result/output— so the drug step copies it toinput/clinical_report/aact_extraction_batch_results/(the path bothchembl_moleculeandclinical_reportread).Because
chembl_moleculealready depends onpis_drug, this needs no newpts_chembl_molecule → pis_clinical_reportDAG edge.What changed
config.yaml, drug step only — adds acopy_manythat copies…/aact_extraction_batch_result/output/*→input/clinical_report/aact_extraction_batch_results/. The clinical_report step is unchanged (the batch is no longer nested under its input tree).Companion changes
chembl_moleculesource path unchanged.