Skip to content

refactor: align deCODE harmonisation to shared config pattern and common/processing utilities#1244

Open
project-defiant wants to merge 1 commit into
devfrom
feat/decode-harmonisation-refactor
Open

refactor: align deCODE harmonisation to shared config pattern and common/processing utilities#1244
project-defiant wants to merge 1 commit into
devfrom
feat/decode-harmonisation-refactor

Conversation

@project-defiant

Copy link
Copy Markdown
Contributor

✨ Context

This PR follows up on the unification of harmonisation efforts in gentropy. Here we focus on deCODE summary statistics harmonisation.

🛠 What does this PR implement

  • Adopt common/processing helpers in deCODESummaryStatistics.from_source: replace private _infer_allele_frequency with shared infer_allele_frequency_from_maf; use flag_equal_alleles and flag_non_atgc_alleles in place of inline predicates
  • Collapse remove_star_alleles and remove_multiallelics into verify_atgc in both deCODESummaryStatisticsHarmonisationConfig and the step init — the ATGC predicate already excludes both star and multiallelic markers
  • Remove inline Python defaults from deCODESummaryStatisticsHarmonisationStep init; all defaults are now declared once in the Hydra config dataclass
  • Rename deCODEHarmonisationConfig fields: min_mac → min_allele_count_threshold, min_sample_size → sample_size_threshold; add perform_* toggle booleans
  • Move DECODE_SCHEMA to module-level constant; add ConfigDict(extra="forbid") to deCODEHarmonisationConfig
  • Make EFOMapping.annotate_study_index generic (TypeVar S bound=StudyIndex) so it returns the concrete subclass type rather than base StudyIndex
  • Fix SessionConfig s3_configuration/gcs_configuration defaults to use field(default_factory=...) for Hydra structured config compatibility

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g uv run pre-commit run --all-files)?

…mon/processing utilities

- Adopt common/processing helpers in deCODESummaryStatistics.from_source:
  replace private _infer_allele_frequency with shared
  infer_allele_frequency_from_maf; use flag_equal_alleles and
  flag_non_atgc_alleles in place of inline predicates
- Collapse remove_star_alleles and remove_multiallelics into verify_atgc in
  both deCODESummaryStatisticsHarmonisationConfig and the step __init__ —
  the ATGC predicate already excludes both star and multiallelic markers
- Remove inline Python defaults from deCODESummaryStatisticsHarmonisationStep
  __init__; all defaults are now declared once in the Hydra config dataclass
- Rename deCODEHarmonisationConfig fields: min_mac → min_allele_count_threshold,
  min_sample_size → sample_size_threshold; add perform_* toggle booleans
- Move DECODE_SCHEMA to module-level constant; add ConfigDict(extra="forbid")
  to deCODEHarmonisationConfig
- Make EFOMapping.annotate_study_index generic (TypeVar S bound=StudyIndex)
  so it returns the concrete subclass type rather than base StudyIndex
- Fix SessionConfig s3_configuration/gcs_configuration defaults to use
  field(default_factory=...) for Hydra structured config compatibility

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant