Add accession_id field to FileMetadataModel and Collection
Description
Add an optional accession_id string field to both FileMetadataModel and Collection in src/cfdb/models.py, and wire it through the GraphQL query layer so users can filter files and collections by accession directly. This issue covers the schema, query plumbing, and indexing only — per-DCC population is handled in follow-up issues (4DN, ENCODE; HuBMAP deferred).
Motivation
DCC users identify files and experiments by their accession IDs, but those IDs are not uniformly queryable today:
- 4DN stores opaque UUIDs in
local_id (e.g. 5c463b50-ac5b-461e-9ea0-3b79da124fc4). The human-facing accession (4DNF… for files, 4DNE… for experiments) only appears embedded in the persistent_id URL (and, for collections, in abbreviation). A user who knows 4DNFI6IJS617 cannot query for it without reconstructing the full persistent_id URL.
- ENCODE already stores the accession as
local_id (ENCFF… files, ENCSR… experiments), so it is queryable but under a different field than 4DN.
A single, consistently-named accession_id field across DCCs gives users one query input that works everywhere, decoupled from URL construction and from each DCC's local_id convention.
Expected Outcome
FileMetadataModel and Collection expose an optional, nullable accession_id field.
- The GraphQL schema accepts
accession_id as a queryable input on both files and the collection sub-selection, following the existing OR/AND clause conventions, and exposes it as an output field.
- A MongoDB index exists on
accession_id (and the nested collections.accession_id) on the denormalized files collection (~3.5M docs) so accession lookups are not collection scans.
- The field is nullable and unpopulated by default; populating it per-DCC is out of scope here.
Add
accession_idfield to FileMetadataModel and CollectionDescription
Add an optional
accession_idstring field to bothFileMetadataModelandCollectioninsrc/cfdb/models.py, and wire it through the GraphQL query layer so users can filter files and collections by accession directly. This issue covers the schema, query plumbing, and indexing only — per-DCC population is handled in follow-up issues (4DN, ENCODE; HuBMAP deferred).Motivation
DCC users identify files and experiments by their accession IDs, but those IDs are not uniformly queryable today:
local_id(e.g.5c463b50-ac5b-461e-9ea0-3b79da124fc4). The human-facing accession (4DNF…for files,4DNE…for experiments) only appears embedded in thepersistent_idURL (and, for collections, inabbreviation). A user who knows4DNFI6IJS617cannot query for it without reconstructing the full persistent_id URL.local_id(ENCFF…files,ENCSR…experiments), so it is queryable but under a different field than 4DN.A single, consistently-named
accession_idfield across DCCs gives users one query input that works everywhere, decoupled from URL construction and from each DCC'slocal_idconvention.Expected Outcome
FileMetadataModelandCollectionexpose an optional, nullableaccession_idfield.accession_idas a queryable input on bothfilesand the collection sub-selection, following the existing OR/AND clause conventions, and exposes it as an output field.accession_id(and the nestedcollections.accession_id) on the denormalizedfilescollection (~3.5M docs) so accession lookups are not collection scans.