Populate accession_id for ENCODE files and collections
Depends on #36 (adds the accession_id field, GraphQL wiring, and index).
Description
Populate the accession_id field (added in #36) for ENCODE files and collections during ENCODE ingestion. For ENCODE the accession is already the local_id, so this is a direct copy at materialization time.
Motivation
Verified against the live DB and confirmed in src/cfdb/services/encode.py (sampled, 100% consistent):
file.local_id is the ENCFF file accession (e.g. ENCFF951PII), set from the File accession TSV column.
collection.local_id is the ENCSR experiment accession (e.g. ENCSR282IXW), set from Experiment accession.
persistent_id for both is already a derived URL built from the accession.
ENCODE is already queryable by accession via local_id, but populating accession_id makes ENCODE consistent with the cross-DCC field added in #36, so a single accession_id query input works uniformly across DCCs.
Expected Outcome
- ENCODE file documents have
accession_id set to the ENCFF accession (copy of local_id / File accession).
- ENCODE collections have
accession_id set to the ENCSR accession (copy of local_id / Experiment accession).
- After ingestion, querying ENCODE files/collections by
accession_id returns the expected entities.
Populate
accession_idfor ENCODE files and collectionsDepends on #36 (adds the
accession_idfield, GraphQL wiring, and index).Description
Populate the
accession_idfield (added in #36) for ENCODE files and collections during ENCODE ingestion. For ENCODE the accession is already thelocal_id, so this is a direct copy at materialization time.Motivation
Verified against the live DB and confirmed in
src/cfdb/services/encode.py(sampled, 100% consistent):file.local_idis the ENCFF file accession (e.g.ENCFF951PII), set from theFile accessionTSV column.collection.local_idis the ENCSR experiment accession (e.g.ENCSR282IXW), set fromExperiment accession.persistent_idfor both is already a derived URL built from the accession.ENCODE is already queryable by accession via
local_id, but populatingaccession_idmakes ENCODE consistent with the cross-DCC field added in #36, so a singleaccession_idquery input works uniformly across DCCs.Expected Outcome
accession_idset to the ENCFF accession (copy oflocal_id/File accession).accession_idset to the ENCSR accession (copy oflocal_id/Experiment accession).accession_idreturns the expected entities.