Fix/association query and drug scoring by dalloliogm · Pull Request #1 · romainstuder/target-ai

dalloliogm · 2026-04-13T09:08:44Z

Three bugs in open_targets_client.py caused clinical scores to be systematically wrong — most visibly, well-validated targets like IL17A and ADRA2A scored 0 clinical evidence even for diseases where approved
drugs exist. The root cause was a combination of an invalid GraphQL argument, a broken API subquery, and a scoring function that ignored which disease a drug was actually approved for.

get_association used a non-existent GraphQL filter (associatedTargets(Bs: $ensemblIds)) — the Bs argument is not part of the Open Targets API schema. Every query returned zero rows silently, triggering a
fallback that fetched 500 targets and filtered in Python on every single call. Fixed by switching query direction to target → associatedDiseases(efoIds: [...]), which uses the correct documented filter and
eliminates the fallback entirely.
diseases subquery caused silent 400 errors — drugAndClinicalCandidates returns ClinicalDiseaseListItem objects, so the correct subquery is diseases { disease { id name } }, not diseases { id name }. The
old query was rejected by the API, masking all drug data without surfacing an error to the user.
stage_map used stale string values — the Open Targets API now returns APPROVAL, PHASE_3, PHASE_2_3, etc. instead of the legacy "Approved", "Phase III" format. No stage values were matching, so all drugs
contributed a phase of 0. Both formats are now supported.
score_clinical ignored indication when scoring drugs — max_phase was computed across all drugs for a target globally, regardless of whether they were approved for the queried disease. Drugs are now split
into indication-matched (full phase value) and off-indication (capped at phase 2). This is why IL17A scored 0 for psoriasis despite secukinumab being approved for it.
Added tests/test_scoring.py with 8 unit tests covering the fixed logic — no network calls, using hand-crafted fixture dicts matching the new API response shapes.

The query used 'associatedTargets(Bs: $ensemblIds)' where 'Bs' is not a valid Open Targets GraphQL argument. This caused the primary query to always return zero rows, silently triggering a fallback that fetched 500 rows and filtered in Python on every call. Replace with target→associatedDiseases(efoIds: [...]), which uses the correct documented filter. Remove the fallback entirely. Update the three call sites that parsed the old data shape (data.disease.associatedTargets.rows) to use the new shape (data.target.associatedDiseases.rows): score_clinical, score_pathway, and the disease name extraction in validate(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The drugAndClinicalCandidates query used 'diseases { id name }' but the API returns ClinicalDiseaseListItem objects, causing a 400 error that silently masked all drug data. Correct query is 'diseases { disease { id name } }'. The stage_map used legacy string values ('Approved', 'Phase III', etc.) but the API now returns uppercase underscore format (APPROVAL, PHASE_3, PHASE_2_3, etc.). Add new format entries while keeping legacy ones for backwards compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously max_phase was computed across all drugs for the target regardless of disease, causing targets like IL17A to score 0 clinical for psoriasis even though secukinumab is approved for it. Split drug rows into two buckets: indication-matched (disease EFO ID present in the drug's disease list) and off-indication. Use the indication max phase as the primary signal; off-indication drugs are capped at phase 2 so they can contribute evidence without inflating scores to approved-drug level. Update reason strings to clearly distinguish 'this indication' vs 'other indications' for transparency in the output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add tests/test_scoring.py with 8 unit tests covering: - Approved drug for queried indication → clinical score 5 - Approved drug for other indication → capped at score 3 - No drugs, no genetics → score 0 - Strong genetic association → score 4 - Approved drug + genetics → score 5 - Phase 3 drug for indication → score 4 - get_association uses target→associatedDiseases query (not old Bs filter) - get_association passes both ensemblId and efoId variables Tests use hand-crafted fixture dicts mirroring the new API response shapes; no network calls required. Add pytest>=8.0.0 to dev dependencies in pyproject.toml. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dalloliogm and others added 6 commits April 13, 2026 10:04

Remove accidentally committed __pycache__ files

711db39

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add Python/pytest entries to .gitignore

a01fc12

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/association query and drug scoring#1

Fix/association query and drug scoring#1
dalloliogm wants to merge 6 commits into
romainstuder:mainfrom
dalloliogm:fix/association-query-and-drug-scoring

dalloliogm commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dalloliogm commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant