Fix/association query and drug scoring#1
Open
dalloliogm wants to merge 6 commits into
Open
Conversation
The query used 'associatedTargets(Bs: $ensemblIds)' where 'Bs' is not a valid Open Targets GraphQL argument. This caused the primary query to always return zero rows, silently triggering a fallback that fetched 500 rows and filtered in Python on every call. Replace with target→associatedDiseases(efoIds: [...]), which uses the correct documented filter. Remove the fallback entirely. Update the three call sites that parsed the old data shape (data.disease.associatedTargets.rows) to use the new shape (data.target.associatedDiseases.rows): score_clinical, score_pathway, and the disease name extraction in validate(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The drugAndClinicalCandidates query used 'diseases { id name }' but the
API returns ClinicalDiseaseListItem objects, causing a 400 error that
silently masked all drug data. Correct query is 'diseases { disease { id name } }'.
The stage_map used legacy string values ('Approved', 'Phase III', etc.)
but the API now returns uppercase underscore format (APPROVAL, PHASE_3,
PHASE_2_3, etc.). Add new format entries while keeping legacy ones for
backwards compatibility.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously max_phase was computed across all drugs for the target regardless of disease, causing targets like IL17A to score 0 clinical for psoriasis even though secukinumab is approved for it. Split drug rows into two buckets: indication-matched (disease EFO ID present in the drug's disease list) and off-indication. Use the indication max phase as the primary signal; off-indication drugs are capped at phase 2 so they can contribute evidence without inflating scores to approved-drug level. Update reason strings to clearly distinguish 'this indication' vs 'other indications' for transparency in the output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tests/test_scoring.py with 8 unit tests covering: - Approved drug for queried indication → clinical score 5 - Approved drug for other indication → capped at score 3 - No drugs, no genetics → score 0 - Strong genetic association → score 4 - Approved drug + genetics → score 5 - Phase 3 drug for indication → score 4 - get_association uses target→associatedDiseases query (not old Bs filter) - get_association passes both ensemblId and efoId variables Tests use hand-crafted fixture dicts mirroring the new API response shapes; no network calls required. Add pytest>=8.0.0 to dev dependencies in pyproject.toml. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three bugs in open_targets_client.py caused clinical scores to be systematically wrong — most visibly, well-validated targets like IL17A and ADRA2A scored 0 clinical evidence even for diseases where approved
drugs exist. The root cause was a combination of an invalid GraphQL argument, a broken API subquery, and a scoring function that ignored which disease a drug was actually approved for.
fallback that fetched 500 targets and filtered in Python on every single call. Fixed by switching query direction to target → associatedDiseases(efoIds: [...]), which uses the correct documented filter and
eliminates the fallback entirely.
old query was rejected by the API, masking all drug data without surfacing an error to the user.
contributed a phase of 0. Both formats are now supported.
into indication-matched (full phase value) and off-indication (capped at phase 2). This is why IL17A scored 0 for psoriasis despite secukinumab being approved for it.