Skip to content

Add accession_id field to FileMetadataModel and Collection #36

Description

@conradbzura

Add accession_id field to FileMetadataModel and Collection

Description

Add an optional accession_id string field to both FileMetadataModel and Collection in src/cfdb/models.py, and wire it through the GraphQL query layer so users can filter files and collections by accession directly. This issue covers the schema, query plumbing, and indexing only — per-DCC population is handled in follow-up issues (4DN, ENCODE; HuBMAP deferred).

Motivation

DCC users identify files and experiments by their accession IDs, but those IDs are not uniformly queryable today:

  • 4DN stores opaque UUIDs in local_id (e.g. 5c463b50-ac5b-461e-9ea0-3b79da124fc4). The human-facing accession (4DNF… for files, 4DNE… for experiments) only appears embedded in the persistent_id URL (and, for collections, in abbreviation). A user who knows 4DNFI6IJS617 cannot query for it without reconstructing the full persistent_id URL.
  • ENCODE already stores the accession as local_id (ENCFF… files, ENCSR… experiments), so it is queryable but under a different field than 4DN.

A single, consistently-named accession_id field across DCCs gives users one query input that works everywhere, decoupled from URL construction and from each DCC's local_id convention.

Expected Outcome

  • FileMetadataModel and Collection expose an optional, nullable accession_id field.
  • The GraphQL schema accepts accession_id as a queryable input on both files and the collection sub-selection, following the existing OR/AND clause conventions, and exposes it as an output field.
  • A MongoDB index exists on accession_id (and the nested collections.accession_id) on the denormalized files collection (~3.5M docs) so accession lookups are not collection scans.
  • The field is nullable and unpopulated by default; populating it per-DCC is out of scope here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions