Skip to content

fix(clinical_report): only flag aact studies based on intervention intention#144

Merged
DSuveges merged 1 commit into
mainfrom
il-clinical-indirect-fix
Jun 22, 2026
Merged

fix(clinical_report): only flag aact studies based on intervention intention#144
DSuveges merged 1 commit into
mainfrom
il-clinical-indirect-fix

Conversation

@ireneisdoomed

Copy link
Copy Markdown
Contributor

The problem: ClinicalReport.flag_indirect_primary_purpose was flagging all clinical reports where drug_intent == null.
This is correct for AACT trials, but the function is called on the full set of reports, meaning that all other sources will be flagged as they are not covered by the LLM extraction.

The fix: the flag in flag_indirect_primary_purpose is the same, with the additional condition that only AACT reports are checked.

The consequences can be seen downstream in the evidence set:

───────────────┬────────┬───────────┬─────────┐
│ clinicalStage ┆ count  ┆ old_count ┆ diff    │
│ ---           ┆ ---    ┆ ---       ┆ ---     │
│ str           ┆ u32    ┆ u32       ┆ i64     │
╞═══════════════╪════════╪═══════════╪═════════╡
│ PHASE_2       ┆ 298610 ┆ 180369    ┆ 118241  │
│ UNKNOWN       ┆ 93148  ┆ 45782     ┆ 47366   │
│ PHASE_1_2     ┆ 69078  ┆ 31743     ┆ 37335   │
│ PHASE_1       ┆ 101779 ┆ 72167     ┆ 29612   │
│ PHASE_3       ┆ 137848 ┆ 118380    ┆ 19468   │
│ PHASE_2_3     ┆ 21917  ┆ 14387     ┆ 7530    │
│ EARLY_PHASE_1 ┆ 11389  ┆ 6919      ┆ 4470    │
│ PHASE_4       ┆ 22899  ┆ 25639     ┆ -2740   │
│ APPROVAL      ┆ 690    ┆ 102184    ┆ -101494 │
└───────────────┴────────┴───────────┴─────────┘`

Note: even though only PHASE_4 and APPROVAL are the only stages where the difference is negative, this bug is affecting all stages: we are basically only building indications/evidence based on AACT trials that pass the criteria. Once the fix is in place, we should see more evidence over the board.

I will put exact numbers once I run it locally.

Changelog

  • Fix in flag_indirect_primary_purpose
  • Added test_flag_indirect_primary_purpose_non_aact_not_flagged
  • Small fix that involves not hardcoding the clinical source names, and import them from the library instead

@ireneisdoomed

Copy link
Copy Markdown
Contributor Author

The fix is successful. We gain back all the evidence from AACT: gain a lot of precedence for existing associations + 5k new associations that we were missing.

Clinical Report Metrics after fix

qualityControls (exploded) After (fixed) Before (dev2)
null 185,904 130,020
INDIRECT_PRIMARY_PURPOSE 54,688 113,653
UNVALIDATED_INDICATION 44,449 44,478
NO_DISEASE 21,986 21,986
PHASE_IV_NOT_APPROVED 13,707 13,671
clinicalStage After (fixed) Before (dev2)
APPROVAL 697 690
EARLY_PHASE_1 11,532 11,389
PHASE_1 104,390 101,779
PHASE_1_2 71,065 69,078
PHASE_2 306,013 298,610
PHASE_2_3 22,636 21,917
PHASE_3 141,923 137,848
PHASE_4 23,407 22,899
UNKNOWN 94,466 93,148

Evidence Metrics after fix

Metric After (fixed) Old (dev2, pre-fix)
Total rows 891,674 776,129
Unique clinicalReportId 116,441 93,812
Unique diseaseFromSourceMappedId 3,230 3,035
Unique targetFromSourceId 1,516 1,489
Unique drugId 4,457 3,732
Unique (disease, target) pairs 107,589 102,160
source After (fixed) Before (dev2)
AACT 775,941 776,129
DailyMed 80,685 0
TTD 15,114 0
EMA Human Drugs 7,413 0
ATC 4,896 0
PMDA 2,866 0
EMA 2,114 0
FDA 1,476 0
USAN 1,080 0
INN 89 0

@ireneisdoomed ireneisdoomed requested a review from DSuveges June 22, 2026 08:57

@DSuveges DSuveges left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes makes perfect sense. Haven't seen anything dealbreaker.

@DSuveges DSuveges force-pushed the il-clinical-indirect-fix branch from 95bfe54 to b680a82 Compare June 22, 2026 10:10
@DSuveges DSuveges merged commit c689893 into main Jun 22, 2026
2 checks passed
@DSuveges DSuveges deleted the il-clinical-indirect-fix branch June 22, 2026 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants