Skip to content

Post RFC100 cleanup#1334

Merged
jamesqo merged 9 commits into
masterfrom
post-rfc100-cleanup
Jun 22, 2026
Merged

Post RFC100 cleanup#1334
jamesqo merged 9 commits into
masterfrom
post-rfc100-cleanup

Conversation

@jamesqo

@jamesqo jamesqo commented May 5, 2026

Copy link
Copy Markdown
Contributor

This can be reviewed properly once the initial rfc100 PR is merged

@jamesqo

jamesqo commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

Addressing Rob's PR feedback from #1318:

  • Source automation-environment for each of the airflow called scripts
  • Removing the test-msk-import-scripts
  • Change the name of validate_blue_green_study.py
  • Change whitespace
  • Make sure fetch-ddp-and-cmo-access-data script is not running
  • Decide if we're turning off : PDX, Spectrum, Extract
  • Decide whether to keep the tests around

@jamesqo jamesqo force-pushed the post-rfc100-cleanup branch from e25fd2e to 89c1dcb Compare May 6, 2026 19:54
@jamesqo jamesqo changed the title [ no merge ] Post rfc100 cleanup Post RFC100 cleanup May 6, 2026
Comment thread dags/import_base.py Outdated
Comment thread dags/import_genie_dag.py Outdated
"""
import_genie_dag.py
Imports Genie study to MySQL and ClickHouse databases using blue/green deployment strategy.
import_genie_clickhouse_dag.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thin kwe can just standardize to just import_genie? no reference to mysql or clickhouse in general

Comment thread dags/import_genie_dag.py Outdated
data_nodes=("importer_ssh",),

tasks["data_repos"] >> tasks["verify_management_state"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove al lthese extra new lines

Comment thread dags/import_genie_dag.py Outdated
),
db_properties_filename="manage_genie_database_update_tools.properties",
db_properties_filename="manage_genie_clickhouse_database_update_tools.properties",
# disabled on pipelines5 machine during testing phase

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove this comment

Comment thread dags/import_public_dag.py Outdated
"""
import_public_dag.py
Imports to Public cBioPortal MySQL and ClickHouse databases using blue/green deployment strategy.
import_public_clickhouse_dag.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments -- a little confused how why the script name here differs from file name

Comment thread import-scripts/airflow-import-direct-to-clickhouse.sh
Comment thread import-scripts/import-cmo-data-msk.sh

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believed this was dead code, can restore

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this erased?

- Rename airflow-import-sql.sh to airflow-import-direct-to-clickhouse.sh
- Fix docstrings in genie/public DAGs (remove ClickHouse references)
- Collapse extra blank lines in genie DAG
- Remove stale 'disabled on pipelines5' comment
- Restore monitor-stalled-jobs.sh and test_if_impact_has_lost_allele_count.sh

@sheridancbio sheridancbio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We are dropping much of the "-clickhouse" strings from names which were distinguishing the clickhouse flavor of the functionality (since that is now the only way).

One larger concern:
Have we actually decided to permanently stop importing these ? (or maybe do it now with airflow?):

  • extract projects
  • pdx data
  • tempo data
  • msk-mind
  • spectrum

There are commented out lines in the dmp wrapper script which should just be deleted now I guess?

Comment thread import-scripts/airflow-import-direct-to-clickhouse.sh
@jamesqo jamesqo merged commit 815899b into master Jun 22, 2026
2 checks passed
@jamesqo jamesqo deleted the post-rfc100-cleanup branch June 22, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants