Skip to content

Populate accession_id for ENCODE files and collections #38

Description

@conradbzura

Populate accession_id for ENCODE files and collections

Depends on #36 (adds the accession_id field, GraphQL wiring, and index).

Description

Populate the accession_id field (added in #36) for ENCODE files and collections during ENCODE ingestion. For ENCODE the accession is already the local_id, so this is a direct copy at materialization time.

Motivation

Verified against the live DB and confirmed in src/cfdb/services/encode.py (sampled, 100% consistent):

  • file.local_id is the ENCFF file accession (e.g. ENCFF951PII), set from the File accession TSV column.
  • collection.local_id is the ENCSR experiment accession (e.g. ENCSR282IXW), set from Experiment accession.
  • persistent_id for both is already a derived URL built from the accession.

ENCODE is already queryable by accession via local_id, but populating accession_id makes ENCODE consistent with the cross-DCC field added in #36, so a single accession_id query input works uniformly across DCCs.

Expected Outcome

  • ENCODE file documents have accession_id set to the ENCFF accession (copy of local_id / File accession).
  • ENCODE collections have accession_id set to the ENCSR accession (copy of local_id / Experiment accession).
  • After ingestion, querying ENCODE files/collections by accession_id returns the expected entities.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions