Reviewer: jess@walthamdatascience.com
Both classes were redesigned in the V2-import branch (claude/did-matlab-v2-import-Rs8AX) to faithfully match the v1 corpora rather than the previous concrete-scalar drafts. The earlier V_delta drafts assumed these classes carried numeric values; the v1 corpora (PRED, 20211116, B, Dab, JH) show they carry ontology-typed descriptors with the actual values streaming from linked elements. Filing this for your review before the changes settle.
Live schemas on the branch:
Matching did-matlab migrators on claude/did-matlab-v2-import-Rs8AX:
src/did/+did2/+convert/+migrators/position_metadata.m
src/did/+did2/+convert/+migrators/distance_metadata.m
position_metadata
Old schema (concrete scalars, value-style)
super: [base]
fields:
* x double
* y double
* z double
* position_units char
coordinate_system char
New schema (semantic, descriptor-style; mirrors probe_location)
super: [base]
depends_on:
* element_id
fields:
* measurement (ontology_term) — what kind of position
(e.g., "midpoint position")
* units (ontology_term) — measurement unit
dimensions (structure, !mustBeScalar) — array of per-axis records:
axis (char, required) — "axis_1", "axis_2", ...
node (char, required) — per-axis CURIE
name (char, optional) — resolved label
Conversion rules
| did_v1 location |
V_delta location |
Transformation |
position_metadata.ontologyNode |
position_metadata.measurement |
wrap as ontology_term {node, name}; name resolved via ndi.ontology.lookup |
position_metadata.units |
position_metadata.units |
wrap as ontology_term; name resolved via lookup |
position_metadata.dimensions (comma-separated CURIE list) |
position_metadata.dimensions(i).{axis, node, name} |
split on ,; build one record per CURIE with axis = "axis_1", "axis_2", …; name resolved via lookup |
| (no v1 source) |
depends_on[name="element_id"] |
the v1 doc already had this as its only depends_on entry; preserved verbatim |
position_metadata.x / .y / .z / .position_units / .coordinate_system |
(removed) |
v1 documents never carried numeric values for this class (no files block either); the previous V_delta draft assumed it stored coordinates |
Worked v1 → V_delta example
v1 (JH corpus):
"position_metadata": {
"ontologyNode": "EMPTY:0000137",
"units": "NCIT:C48367",
"dimensions": "NCIT:C44477,NCIT:C44478"
}
V_delta after migration (names from ndi.ontology.lookup; empty if library unavailable):
"position_metadata": {
"measurement": {"node": "EMPTY:0000137", "name": "midpoint position"},
"units": {"node": "NCIT:C48367", "name": "Micrometer"},
"dimensions": [
{"axis": "axis_1", "node": "NCIT:C44477", "name": "Horizontal Axis"},
{"axis": "axis_2", "node": "NCIT:C44478", "name": "Vertical Axis"}
]
}
distance_metadata
Old schema (scalar distance)
super: [base]
depends_on:
* element_id_1
* element_id_2
fields:
* distance double
* distance_units char
New schema (paired endpoint array-of-records)
super: [base]
depends_on:
* element_id
fields:
* endpoints (structure, !mustBeScalar) — array of per-endpoint records:
label (char, required) — "A", "B"
measurement (ontology_term, required)
integer_ids (matrix of integer, optional)
string_ids (string array, optional)
numeric_values (matrix of double, optional)
* units (ontology_term, required) — measurement unit
Conversion rules
| did_v1 location |
V_delta location |
Transformation |
distance_metadata.ontologyNode_X (X = A or B) |
distance_metadata.endpoints(i).measurement |
wrap as ontology_term; name resolved via ndi.ontology.lookup |
distance_metadata.integerIDs_X (scalar or array) |
distance_metadata.endpoints(i).integer_ids |
passthrough as double row vector |
distance_metadata.ontologyStringValues_X (comma-separated did_uid string) |
distance_metadata.endpoints(i).string_ids |
split on ,; cast to string array |
distance_metadata.ontologyNumericValues_X (often empty) |
distance_metadata.endpoints(i).numeric_values |
passthrough as double row vector |
| (per-record label) |
distance_metadata.endpoints(i).label |
the X suffix (A, B) preserved verbatim; migrator discovers labels by regex-scanning ontology_node_X keys |
distance_metadata.units |
distance_metadata.units |
wrap as ontology_term; name resolved via lookup |
| (no v1 source) |
depends_on[name="element_id"] |
v1 docs ship a single element_id edge already |
distance_metadata.distance / .distance_units (old V_delta draft) |
(removed) |
v1 has no scalar distance value; the document records the schema of a distance measurement between two endpoint sets |
depends_on[name="element_id_1"] / element_id_2 (old V_delta draft) |
(removed) |
replaced with the single element_id v1 actually uses |
Worked v1 → V_delta example
v1 (JH corpus, abbreviated):
"distance_metadata": {
"ontologyNode_A": "EMPTY:0000096",
"integerIDs_A": 1,
"ontologyNumericValues_A": [],
"ontologyStringValues_A": "41269430c5d0f467_40b3ce7c80fe06c8",
"ontologyNode_B": "EMPTY:0000134",
"integerIDs_B": [1, 2, 3, ..., 19],
"ontologyNumericValues_B": [],
"ontologyStringValues_B": "41269430c5a8949f_40c21c6cafc6a2f4,41269430c5a8a8a7_c0d94f2bda76778c, ... (19 ids)",
"units": "NCIT:C48367"
}
V_delta after migration:
"distance_metadata": {
"endpoints": [
{
"label": "A",
"measurement": {"node": "EMPTY:0000096", "name": "<resolved>"},
"integer_ids": [1],
"string_ids": ["41269430c5d0f467_40b3ce7c80fe06c8"],
"numeric_values": []
},
{
"label": "B",
"measurement": {"node": "EMPTY:0000134", "name": "<resolved>"},
"integer_ids": [1, 2, 3, /* ... */, 19],
"string_ids": ["41269430c5a8949f_40c21c6cafc6a2f4", /* ... 19 entries ... */],
"numeric_values": []
}
],
"units": {"node": "NCIT:C48367", "name": "Micrometer"}
}
Design choices to push back on if you disagree
- Position-axis labels are positional (
axis_1, axis_2, ...) rather than x, y, z. v1 had no axis names — the ontology nodes themselves classified the axes — so positional labels avoid baking in a spatial assumption. Migrator could rewrite to x/y/z later if a convention is added.
- Distance endpoints kept the v1
A/B labels rather than recoded to endpoint_1/endpoint_2. Felt more faithful since v1 was explicit about the pairing.
endpoints.string_ids left as inline string arrays rather than lifted into depends_on entries (endpoint_a_target_1, endpoint_b_target_1, …). The alternative would give real graph-level referential integrity once did2.validate.references lands, but the depends_on names would be verbose. Open to flipping.
- Name fields on
ontology_term composites stay empty if ndi.ontology.lookup is unavailable (CI without ndi-ontology-matlab installed, unknown CURIEs). The V_delta validator only requires the value to be a struct, so the inner shape is permissive.
Corpus impact (Python simulator, mirrors did-matlab +did2.+convert.+v1_to_v2):
| corpus |
total |
migrated |
quarantined |
| PRED |
14 |
14 |
0 |
| 20211116 |
1220 |
1220 |
0 |
| B |
12917 |
12917 |
0 |
| JH |
78688 |
78688 |
0 (was 15672 before; -2078 position, -2078 distance) |
| Dab |
27561 |
25956 |
1605 (only stimulus_bath remains, unrelated) |
Reviewer: jess@walthamdatascience.com
Both classes were redesigned in the V2-import branch (
claude/did-matlab-v2-import-Rs8AX) to faithfully match the v1 corpora rather than the previous concrete-scalar drafts. The earlier V_delta drafts assumed these classes carried numeric values; the v1 corpora (PRED, 20211116, B, Dab, JH) show they carry ontology-typed descriptors with the actual values streaming from linked elements. Filing this for your review before the changes settle.Live schemas on the branch:
stable/position_metadata.jsonstable/distance_metadata.jsonMatching did-matlab migrators on
claude/did-matlab-v2-import-Rs8AX:src/did/+did2/+convert/+migrators/position_metadata.msrc/did/+did2/+convert/+migrators/distance_metadata.mposition_metadataOld schema (concrete scalars, value-style)
New schema (semantic, descriptor-style; mirrors
probe_location)Conversion rules
position_metadata.ontologyNodeposition_metadata.measurementontology_term{node, name}; name resolved viandi.ontology.lookupposition_metadata.unitsposition_metadata.unitsontology_term; name resolved via lookupposition_metadata.dimensions(comma-separated CURIE list)position_metadata.dimensions(i).{axis, node, name},; build one record per CURIE withaxis = "axis_1","axis_2", …; name resolved via lookupdepends_on[name="element_id"]depends_onentry; preserved verbatimposition_metadata.x/.y/.z/.position_units/.coordinate_systemfilesblock either); the previous V_delta draft assumed it stored coordinatesWorked v1 → V_delta example
v1 (JH corpus):
V_delta after migration (names from
ndi.ontology.lookup; empty if library unavailable):distance_metadataOld schema (scalar distance)
New schema (paired endpoint array-of-records)
Conversion rules
distance_metadata.ontologyNode_X(X = A or B)distance_metadata.endpoints(i).measurementontology_term; name resolved viandi.ontology.lookupdistance_metadata.integerIDs_X(scalar or array)distance_metadata.endpoints(i).integer_idsdoublerow vectordistance_metadata.ontologyStringValues_X(comma-separated did_uid string)distance_metadata.endpoints(i).string_ids,; cast tostringarraydistance_metadata.ontologyNumericValues_X(often empty)distance_metadata.endpoints(i).numeric_valuesdoublerow vectordistance_metadata.endpoints(i).labelA,B) preserved verbatim; migrator discovers labels by regex-scanningontology_node_Xkeysdistance_metadata.unitsdistance_metadata.unitsontology_term; name resolved via lookupdepends_on[name="element_id"]element_idedge alreadydistance_metadata.distance/.distance_units(old V_delta draft)depends_on[name="element_id_1"]/element_id_2(old V_delta draft)element_idv1 actually usesWorked v1 → V_delta example
v1 (JH corpus, abbreviated):
V_delta after migration:
Design choices to push back on if you disagree
axis_1,axis_2, ...) rather thanx,y,z. v1 had no axis names — the ontology nodes themselves classified the axes — so positional labels avoid baking in a spatial assumption. Migrator could rewrite tox/y/zlater if a convention is added.A/Blabels rather than recoded toendpoint_1/endpoint_2. Felt more faithful since v1 was explicit about the pairing.endpoints.string_idsleft as inline string arrays rather than lifted intodepends_onentries (endpoint_a_target_1,endpoint_b_target_1, …). The alternative would give real graph-level referential integrity oncedid2.validate.referenceslands, but the depends_on names would be verbose. Open to flipping.ontology_termcomposites stay empty ifndi.ontology.lookupis unavailable (CI without ndi-ontology-matlab installed, unknown CURIEs). The V_delta validator only requires the value to be a struct, so the inner shape is permissive.Corpus impact (Python simulator, mirrors did-matlab
+did2.+convert.+v1_to_v2):stimulus_bathremains, unrelated)