Skip to content

ETL convention: multiple timestamps for lab measurements #16

@BorisDelange

Description

@BorisDelange

Context

Raised by Niklas Rodemund in the context of the SICdb 2 → OMOP ETL:

For many values (e.g., lab results), there are at least four time points:

  • the sample collection time (the lab value corresponds to this time),
  • the measurement time (which may include correction factors),
  • the charting time (when the result becomes available in the EHR),
  • and the order time (interesting for process-related questions).

Ultimately, a model might need multiple times - at least including the charting time - since otherwise data leakage would occur.

OMOP doesn't seem to take this into account - do we have a plan?

Short answer

OMOP CDM v5.4 does not natively expose multiple timestamps per measurement, but a clean ETL pattern exists by combining the SPECIMEN table and additional OBSERVATION rows linked via the v5.4 *_event_id mechanism.

What the OMOP spec actually says

MEASUREMENT.measurement_datetime semantics are explicitly defined (CDM v5.4 MEASUREMENT):

"If there are multiple dates in the source data associated with a record such as order_date, draw_date, and result_date, choose the one that is closest to the date the sample was drawn from the patient."

measurement_datetimesample collection time.

Why use the SPECIMEN table?

In OMOP, MEASUREMENT represents the measured value (e.g. glucose = 1.2 g/L), while SPECIMEN represents the physical sample itself (this blood tube, drawn at 08:32, from this anatomic site, of this volume). One specimen typically produces multiple measurements — a single EDTA tube yields ~20 CBC values. SPECIMEN avoids duplicating the collection metadata across all derived measurement rows and lets you store additional collection properties (specimen_concept_id, quantity, anatomic_site_concept_id, disease_status_concept_id).

Proposed ETL convention for INDICATE

# Source datetime Target in OMOP
1 Sample collection time MEASUREMENT.measurement_datetime and SPECIMEN.specimen_datetime
2 Analysis / measurement time (analyser) OBSERVATION row, observation_concept_id = 3043556 (Date of analysis of unspecified specimen, LOINC 45353-0)
3 Charting / result-reported time OBSERVATION row, observation_concept_id = 1175684 (Date and time lab result reported, LOINC 90056-3)
4 Order time OBSERVATION row, observation_concept_id = 42529317 (Lab order date, LOINC 82785-7)

All target concepts are standard LOINC concepts with domain_id = Observation.

Linkage (v5.4 native mechanism)

  • MEASUREMENT → SPECIMEN: set MEASUREMENT.measurement_event_id = SPECIMEN.specimen_id and MEASUREMENT.meas_event_field_concept_id = 1147822 (concept representing the SPECIMEN table).
  • OBSERVATION → MEASUREMENT: set OBSERVATION.observation_event_id = MEASUREMENT.measurement_id and OBSERVATION.obs_event_field_concept_id to the concept representing the MEASUREMENT table.

This is preferred over FACT_RELATIONSHIP, which was the pre-v5.4 workaround (forum thread). v5.4's *_event_id mechanism supersedes that workaround.

References


@MaximMoinat — could you validate this proposed convention as INDICATE's OMOP reference?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions