fix(clickhouse): drug synonyms/tradeNames are {label, source} struct arrays#130
Merged
Conversation
…arrays drug_molecule's `synonyms` and `tradeNames` changed from array<string> to array<struct<label, source>> (opentargets/pts#142, tracking opentargets/issues#4414). The drug_log ClickHouse table ingests output/drug_molecule directly, so its column DDL must match the parquet — loading the struct arrays into Array(String) columns would fail. Update both to Array(Tuple(label String, source String)), mirroring the existing crossReferences tuple. The final drug table (postload SELECT *) inherits the new types, so no other change is needed.
remo87
pushed a commit
to opentargets/platform-api
that referenced
this pull request
Jun 17, 2026
drug_molecule's `synonyms` and `tradeNames` changed from array<string> to array<struct<label, source>> (opentargets/pts#142, tracking opentargets/issues#4414), and the POS ClickHouse drug table now stores Array(Tuple(label, source)) (opentargets/pos#130). Read both as Seq[LabelAndSource] (the existing type Target already uses) — the ClickHouse JSON read path parses the named tuple as a {label, source} object, the same way crossReferences (Tuple(source, ids)) already works. The Drug GraphQL type is derived, so the field type auto-updates to [LabelAndSource]; the synonyms/ tradeNames field docs are updated to describe the provenance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
drug_molecule'ssynonymsandtradeNamesfields changed fromarray<string>toarray<struct<label, source>>(opentargets/pts#142, tracking issue opentargets/issues#4414) — each name now carries provenance (ChEMBL, orAACTfor names mined from clinical trials).POS ingests
output/drug_moleculedirectly into thedrug_logClickHouse table viaconfig/clickhouse/schema/drug.sql, so the column DDL must match the parquet. Loading the new struct arrays intoArray(String)columns would fail.What changed
config/clickhouse/schema/drug.sql—synonymsandtradeNameschange fromArray(String)toArray(Tuple(label String, source String)), mirroring the existingcrossReferencestuple. The finaldrugtable (scripts/drug.sql,SELECT *) inherits the new types, so nothing else changes.Scope (verified)
This is the only POS consequence:
drug_moleculeindex; the drug dataset goes only to ClickHouse.search_drugis built from the flattened search dataset (.labelstrings, schema unchanged);indicationdoesn't carry these fields.src/Python or other ClickHouse script references drugsynonyms/tradeNames.