Populate accession_id for 4DN files and collections
Depends on #36 (adds the accession_id field, GraphQL wiring, and index).
Description
During the 4DN sync/enrichment pipeline, populate the accession_id field (added in #36) for 4DN files and collections by extracting the accession from each entity's persistent_id.
Motivation
4DN local_id values are opaque UUIDs; the accession that users recognize lives only inside the persistent_id URL. Verified against the live DB (sampled, 100% consistent):
file.persistent_id → https://data.4dnucleome.org/4DNFI6IJS617 (file accession 4DNF…)
collection.persistent_id → https://data.4dnucleome.org/4DNEXGXBT684 (experiment accession 4DNE…); the same accession is also already present verbatim in collection.abbreviation.
Extraction helpers already exist and can be reused: extract_accession() (4DNF[A-Z0-9]+) and extract_experiment_accession() (4DNE[A-Z][A-Z0-9]+) in src/cfdb/services/fourdn.py.
Expected Outcome
- Every 4DN file document has
accession_id set to its 4DNF… accession, parsed from persistent_id via extract_accession().
- Every 4DN collection has
accession_id set to its 4DNE… accession (from extract_experiment_accession() / persistent_id; abbreviation is an available cross-check).
- A 4DN file/collection with no parseable accession leaves
accession_id null and is logged, rather than failing the sync.
- After a sync, querying 4DN files/collections by
accession_id returns the expected entities.
Populate
accession_idfor 4DN files and collectionsDepends on #36 (adds the
accession_idfield, GraphQL wiring, and index).Description
During the 4DN sync/enrichment pipeline, populate the
accession_idfield (added in #36) for 4DN files and collections by extracting the accession from each entity'spersistent_id.Motivation
4DN
local_idvalues are opaque UUIDs; the accession that users recognize lives only inside thepersistent_idURL. Verified against the live DB (sampled, 100% consistent):file.persistent_id→https://data.4dnucleome.org/4DNFI6IJS617(file accession4DNF…)collection.persistent_id→https://data.4dnucleome.org/4DNEXGXBT684(experiment accession4DNE…); the same accession is also already present verbatim incollection.abbreviation.Extraction helpers already exist and can be reused:
extract_accession()(4DNF[A-Z0-9]+) andextract_experiment_accession()(4DNE[A-Z][A-Z0-9]+) insrc/cfdb/services/fourdn.py.Expected Outcome
accession_idset to its4DNF…accession, parsed frompersistent_idviaextract_accession().accession_idset to its4DNE…accession (fromextract_experiment_accession()/persistent_id;abbreviationis an available cross-check).accession_idnull and is logged, rather than failing the sync.accession_idreturns the expected entities.