Skip to content

Automate UMLS data download and artifact build pipeline #61

@AlexMikhalev

Description

@AlexMikhalev

Summary

Script to authenticate with UMLS API, download release files, filter to relevant subsets, and rebuild automata artifacts.

Details

  • Replaces manual download process
  • Complements/supersedes feat: Python backup script to download SNOMED CT without UMLS API key #35 (Python SNOMED download script)
  • UMLS API supports programmatic access with API key authentication
  • Key files to download: MRCONSO.RRF, MRREL.RRF, MRSTY.RRF
  • Filter to relevant SABs (SNOMEDCT_US, RXNORM, NCI, NDFRT, ICD11)
  • Rebuild umls_automata.bin.zst artifact after download
  • Should be runnable in CI for periodic updates

Acceptance Criteria

  • Script authenticates with UMLS API using stored credentials
  • Downloads and extracts required RRF files
  • Filters to relevant vocabularies
  • Rebuilds automata artifact
  • Can run headless in CI environment
  • Documented in README

See also: #35

Priority: P3

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions