Skip to content

Populate accession_id for 4DN files and collections #37

Description

@conradbzura

Populate accession_id for 4DN files and collections

Depends on #36 (adds the accession_id field, GraphQL wiring, and index).

Description

During the 4DN sync/enrichment pipeline, populate the accession_id field (added in #36) for 4DN files and collections by extracting the accession from each entity's persistent_id.

Motivation

4DN local_id values are opaque UUIDs; the accession that users recognize lives only inside the persistent_id URL. Verified against the live DB (sampled, 100% consistent):

  • file.persistent_idhttps://data.4dnucleome.org/4DNFI6IJS617 (file accession 4DNF…)
  • collection.persistent_idhttps://data.4dnucleome.org/4DNEXGXBT684 (experiment accession 4DNE…); the same accession is also already present verbatim in collection.abbreviation.

Extraction helpers already exist and can be reused: extract_accession() (4DNF[A-Z0-9]+) and extract_experiment_accession() (4DNE[A-Z][A-Z0-9]+) in src/cfdb/services/fourdn.py.

Expected Outcome

  • Every 4DN file document has accession_id set to its 4DNF… accession, parsed from persistent_id via extract_accession().
  • Every 4DN collection has accession_id set to its 4DNE… accession (from extract_experiment_accession() / persistent_id; abbreviation is an available cross-check).
  • A 4DN file/collection with no parseable accession leaves accession_id null and is logged, rather than failing the sync.
  • After a sync, querying 4DN files/collections by accession_id returns the expected entities.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions