Skip to content

Review request: new position_metadata and distance_metadata shapes #45

@stevevanhooser

Description

@stevevanhooser

Reviewer: jess@walthamdatascience.com

Both classes were redesigned in the V2-import branch (claude/did-matlab-v2-import-Rs8AX) to faithfully match the v1 corpora rather than the previous concrete-scalar drafts. The earlier V_delta drafts assumed these classes carried numeric values; the v1 corpora (PRED, 20211116, B, Dab, JH) show they carry ontology-typed descriptors with the actual values streaming from linked elements. Filing this for your review before the changes settle.

Live schemas on the branch:

Matching did-matlab migrators on claude/did-matlab-v2-import-Rs8AX:

  • src/did/+did2/+convert/+migrators/position_metadata.m
  • src/did/+did2/+convert/+migrators/distance_metadata.m

position_metadata

Old schema (concrete scalars, value-style)

super: [base]
fields:
* x                  double
* y                  double
* z                  double
* position_units     char
  coordinate_system  char

New schema (semantic, descriptor-style; mirrors probe_location)

super: [base]
depends_on:
* element_id

fields:
* measurement (ontology_term)              — what kind of position
                                              (e.g., "midpoint position")
* units       (ontology_term)              — measurement unit
  dimensions  (structure, !mustBeScalar)   — array of per-axis records:
                  axis  (char, required)   — "axis_1", "axis_2", ...
                  node  (char, required)   — per-axis CURIE
                  name  (char, optional)   — resolved label

Conversion rules

did_v1 location V_delta location Transformation
position_metadata.ontologyNode position_metadata.measurement wrap as ontology_term {node, name}; name resolved via ndi.ontology.lookup
position_metadata.units position_metadata.units wrap as ontology_term; name resolved via lookup
position_metadata.dimensions (comma-separated CURIE list) position_metadata.dimensions(i).{axis, node, name} split on ,; build one record per CURIE with axis = "axis_1", "axis_2", …; name resolved via lookup
(no v1 source) depends_on[name="element_id"] the v1 doc already had this as its only depends_on entry; preserved verbatim
position_metadata.x / .y / .z / .position_units / .coordinate_system (removed) v1 documents never carried numeric values for this class (no files block either); the previous V_delta draft assumed it stored coordinates

Worked v1 → V_delta example

v1 (JH corpus):

"position_metadata": {
  "ontologyNode": "EMPTY:0000137",
  "units":        "NCIT:C48367",
  "dimensions":   "NCIT:C44477,NCIT:C44478"
}

V_delta after migration (names from ndi.ontology.lookup; empty if library unavailable):

"position_metadata": {
  "measurement": {"node": "EMPTY:0000137", "name": "midpoint position"},
  "units":       {"node": "NCIT:C48367",   "name": "Micrometer"},
  "dimensions":  [
    {"axis": "axis_1", "node": "NCIT:C44477", "name": "Horizontal Axis"},
    {"axis": "axis_2", "node": "NCIT:C44478", "name": "Vertical Axis"}
  ]
}

distance_metadata

Old schema (scalar distance)

super: [base]
depends_on:
* element_id_1
* element_id_2

fields:
* distance       double
* distance_units char

New schema (paired endpoint array-of-records)

super: [base]
depends_on:
* element_id

fields:
* endpoints (structure, !mustBeScalar)     — array of per-endpoint records:
                  label           (char, required)  — "A", "B"
                  measurement     (ontology_term, required)
                  integer_ids     (matrix of integer, optional)
                  string_ids      (string array, optional)
                  numeric_values  (matrix of double, optional)
* units     (ontology_term, required)      — measurement unit

Conversion rules

did_v1 location V_delta location Transformation
distance_metadata.ontologyNode_X (X = A or B) distance_metadata.endpoints(i).measurement wrap as ontology_term; name resolved via ndi.ontology.lookup
distance_metadata.integerIDs_X (scalar or array) distance_metadata.endpoints(i).integer_ids passthrough as double row vector
distance_metadata.ontologyStringValues_X (comma-separated did_uid string) distance_metadata.endpoints(i).string_ids split on ,; cast to string array
distance_metadata.ontologyNumericValues_X (often empty) distance_metadata.endpoints(i).numeric_values passthrough as double row vector
(per-record label) distance_metadata.endpoints(i).label the X suffix (A, B) preserved verbatim; migrator discovers labels by regex-scanning ontology_node_X keys
distance_metadata.units distance_metadata.units wrap as ontology_term; name resolved via lookup
(no v1 source) depends_on[name="element_id"] v1 docs ship a single element_id edge already
distance_metadata.distance / .distance_units (old V_delta draft) (removed) v1 has no scalar distance value; the document records the schema of a distance measurement between two endpoint sets
depends_on[name="element_id_1"] / element_id_2 (old V_delta draft) (removed) replaced with the single element_id v1 actually uses

Worked v1 → V_delta example

v1 (JH corpus, abbreviated):

"distance_metadata": {
  "ontologyNode_A":          "EMPTY:0000096",
  "integerIDs_A":            1,
  "ontologyNumericValues_A": [],
  "ontologyStringValues_A":  "41269430c5d0f467_40b3ce7c80fe06c8",
  "ontologyNode_B":          "EMPTY:0000134",
  "integerIDs_B":            [1, 2, 3, ..., 19],
  "ontologyNumericValues_B": [],
  "ontologyStringValues_B":  "41269430c5a8949f_40c21c6cafc6a2f4,41269430c5a8a8a7_c0d94f2bda76778c, ... (19 ids)",
  "units":                   "NCIT:C48367"
}

V_delta after migration:

"distance_metadata": {
  "endpoints": [
    {
      "label":          "A",
      "measurement":    {"node": "EMPTY:0000096", "name": "<resolved>"},
      "integer_ids":    [1],
      "string_ids":     ["41269430c5d0f467_40b3ce7c80fe06c8"],
      "numeric_values": []
    },
    {
      "label":          "B",
      "measurement":    {"node": "EMPTY:0000134", "name": "<resolved>"},
      "integer_ids":    [1, 2, 3, /* ... */, 19],
      "string_ids":     ["41269430c5a8949f_40c21c6cafc6a2f4", /* ... 19 entries ... */],
      "numeric_values": []
    }
  ],
  "units": {"node": "NCIT:C48367", "name": "Micrometer"}
}

Design choices to push back on if you disagree

  1. Position-axis labels are positional (axis_1, axis_2, ...) rather than x, y, z. v1 had no axis names — the ontology nodes themselves classified the axes — so positional labels avoid baking in a spatial assumption. Migrator could rewrite to x/y/z later if a convention is added.
  2. Distance endpoints kept the v1 A/B labels rather than recoded to endpoint_1/endpoint_2. Felt more faithful since v1 was explicit about the pairing.
  3. endpoints.string_ids left as inline string arrays rather than lifted into depends_on entries (endpoint_a_target_1, endpoint_b_target_1, …). The alternative would give real graph-level referential integrity once did2.validate.references lands, but the depends_on names would be verbose. Open to flipping.
  4. Name fields on ontology_term composites stay empty if ndi.ontology.lookup is unavailable (CI without ndi-ontology-matlab installed, unknown CURIEs). The V_delta validator only requires the value to be a struct, so the inner shape is permissive.

Corpus impact (Python simulator, mirrors did-matlab +did2.+convert.+v1_to_v2):

corpus total migrated quarantined
PRED 14 14 0
20211116 1220 1220 0
B 12917 12917 0
JH 78688 78688 0 (was 15672 before; -2078 position, -2078 distance)
Dab 27561 25956 1605 (only stimulus_bath remains, unrelated)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions