Skip to content

Fix/association query and drug scoring#1

Open
dalloliogm wants to merge 6 commits into
romainstuder:mainfrom
dalloliogm:fix/association-query-and-drug-scoring
Open

Fix/association query and drug scoring#1
dalloliogm wants to merge 6 commits into
romainstuder:mainfrom
dalloliogm:fix/association-query-and-drug-scoring

Conversation

@dalloliogm

Copy link
Copy Markdown

Three bugs in open_targets_client.py caused clinical scores to be systematically wrong — most visibly, well-validated targets like IL17A and ADRA2A scored 0 clinical evidence even for diseases where approved
drugs exist. The root cause was a combination of an invalid GraphQL argument, a broken API subquery, and a scoring function that ignored which disease a drug was actually approved for.

  • get_association used a non-existent GraphQL filter (associatedTargets(Bs: $ensemblIds)) — the Bs argument is not part of the Open Targets API schema. Every query returned zero rows silently, triggering a
    fallback that fetched 500 targets and filtered in Python on every single call. Fixed by switching query direction to target → associatedDiseases(efoIds: [...]), which uses the correct documented filter and
    eliminates the fallback entirely.
  • diseases subquery caused silent 400 errors — drugAndClinicalCandidates returns ClinicalDiseaseListItem objects, so the correct subquery is diseases { disease { id name } }, not diseases { id name }. The
    old query was rejected by the API, masking all drug data without surfacing an error to the user.
  • stage_map used stale string values — the Open Targets API now returns APPROVAL, PHASE_3, PHASE_2_3, etc. instead of the legacy "Approved", "Phase III" format. No stage values were matching, so all drugs
    contributed a phase of 0. Both formats are now supported.
  • score_clinical ignored indication when scoring drugs — max_phase was computed across all drugs for a target globally, regardless of whether they were approved for the queried disease. Drugs are now split
    into indication-matched (full phase value) and off-indication (capped at phase 2). This is why IL17A scored 0 for psoriasis despite secukinumab being approved for it.
  • Added tests/test_scoring.py with 8 unit tests covering the fixed logic — no network calls, using hand-crafted fixture dicts matching the new API response shapes.

dalloliogm and others added 6 commits April 13, 2026 10:04
The query used 'associatedTargets(Bs: $ensemblIds)' where 'Bs' is not a
valid Open Targets GraphQL argument. This caused the primary query to
always return zero rows, silently triggering a fallback that fetched
500 rows and filtered in Python on every call.

Replace with target→associatedDiseases(efoIds: [...]), which uses the
correct documented filter. Remove the fallback entirely.

Update the three call sites that parsed the old data shape
(data.disease.associatedTargets.rows) to use the new shape
(data.target.associatedDiseases.rows): score_clinical, score_pathway,
and the disease name extraction in validate().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The drugAndClinicalCandidates query used 'diseases { id name }' but the
API returns ClinicalDiseaseListItem objects, causing a 400 error that
silently masked all drug data. Correct query is 'diseases { disease { id name } }'.

The stage_map used legacy string values ('Approved', 'Phase III', etc.)
but the API now returns uppercase underscore format (APPROVAL, PHASE_3,
PHASE_2_3, etc.). Add new format entries while keeping legacy ones for
backwards compatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously max_phase was computed across all drugs for the target
regardless of disease, causing targets like IL17A to score 0 clinical
for psoriasis even though secukinumab is approved for it.

Split drug rows into two buckets: indication-matched (disease EFO ID
present in the drug's disease list) and off-indication. Use the
indication max phase as the primary signal; off-indication drugs are
capped at phase 2 so they can contribute evidence without inflating
scores to approved-drug level.

Update reason strings to clearly distinguish 'this indication' vs
'other indications' for transparency in the output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tests/test_scoring.py with 8 unit tests covering:
- Approved drug for queried indication → clinical score 5
- Approved drug for other indication → capped at score 3
- No drugs, no genetics → score 0
- Strong genetic association → score 4
- Approved drug + genetics → score 5
- Phase 3 drug for indication → score 4
- get_association uses target→associatedDiseases query (not old Bs filter)
- get_association passes both ensemblId and efoId variables

Tests use hand-crafted fixture dicts mirroring the new API response
shapes; no network calls required.

Add pytest>=8.0.0 to dev dependencies in pyproject.toml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant