V_delta: drop redundant status fields; restore v1 session_in_a_dataset shape#44
Merged
Merged
Conversation
…t shape
Driven by the B-corpus discovery report (12,917 v1 docs across 18
classes, 8708 quarantined under the previous schemas). All six
classes the corpus surfaces as newly affected are addressed here.
Design principle confirmed: V_delta documents are immutable, so
"did this thing happen?" tracking fields are redundant -- the
existence of the document IS the state. Five required fields that
violated this are dropped.
Dropped fields:
epochfiles_ingested.ingestion_status (required)
epochfiles_ingested.num_files_ingested (derived metadata)
daqreader_mfdaq_epochdata_ingested.ingestion_status (required)
daqmetadatareader_epochdata_ingested.ingestion_status (required)
syncrule_mapping.mapping_status (required)
dataset_remote.remote_url (required)
After the drops these classes either have zero own fields
(marker-style records whose presence is the entire signal) or
retain only their genuinely-content fields (e.g.,
syncrule_mapping keeps `mapping_data`; dataset_remote keeps
`remote_type` and `dataset_id`).
session_in_a_dataset rebuilt to mirror v1 verbatim:
before: required `dataset_id` field + required depends_on[session_id]
after: session_id (did_uid, required, inline) + session_reference
+ is_linked + session_creator + session_creator_input1..6
The "dataset_id" concept is intrinsically `base.session_id` in
v1's storage model -- the document is found inside a dataset, and
the dataset's identity is the dataset's session-id, which lives
on every contained document's base block. So the V_delta-only
`dataset_id` field was duplicating base.session_id without adding
new information. Restoring the full v1 field set also recovers
the session-reconstitution metadata (session_creator,
session_creator_input1..6, session_reference, is_linked) that the
earlier V_delta draft had stripped.
Projection on the three discovery corpora (Python simulator):
PRED 14/14 migrated (no change)
20211116 1220/1220 migrated (no change)
B 12917/12917 migrated (was 4209/12917)
No paired did-matlab change is required: the converter's
universal-rename pass plus the dispatcher empty-block pad already
handle every v1 doc that previously quarantined here. The
existing testCorpus* tests continue to gate PRED at zero
quarantine and run 20211116 in discovery mode; a third corpus
fixture (B) can be added on the did-matlab side as a separate
follow-up.
Driven by the JH corpus surfacing the multi-clock case: v1 documents that record the same element epoch in multiple clock frames (e.g., `epoch_clock = "dev_local_time,exp_global_time"`, `t0_t1 = [[a,b],[c,d]]`). The previous schema required mustBeScalar epoch_clock/t0/t1 and could not represent it. Replaces the three flat scalar fields with a single `element_epoch.clocks` array-of-records: clocks(i).name char required per-clock identifier clocks(i).t0 double required start time in clocks(i) units clocks(i).t1 double required stop time in clocks(i) units Single-clock documents (PRED, 20211116, B corpora) migrate to a 1-element array; multi-clock JH documents migrate to a 2+ element array. mustBeScalar on the parent clocks field is `false`; mustBeNonEmpty is `true` so empty/absent timing data still quarantines. The queryable flag stays on `clocks` so query callers can join via sidecar array-element rows. No paired ngrid/epochclocktimes changes here. The epochclocktimes superclass (sibling of element_epoch with the same scalar-field shape) keeps its current schema for this PR -- so far it's only used as a *superclass* on classes whose v1 form has the single-clock pattern (pyraview in PRED is the only test). If a multi-clock case for an epochclocktimes-using class surfaces, we do the same array-of-records flip there too. Projection on the 20211116, B, and JH corpora after this commit (paired with the did-matlab migrator update): PRED 14/14 migrated (no change) 20211116 1220/1220 migrated (no change) B 12917/12917 migrated (no change) JH 67172/78688 migrated (was 63016; element_epoch +4156)
The JH corpus surfaces 2078 v1 position_metadata documents, all
purely descriptor data (no x/y/z values, no files block). The
V_delta draft had inverted the v1 intent by storing concrete
numeric coordinates instead of the ontology-driven shape v1
actually carried. Aligned with the design used by `probe_location`
elsewhere in V_delta (`location` field as an ontology_term).
Rewrites position_metadata to mirror the v1 shape:
measurement (ontology_term) required
The v1 `ontologyNode` CURIE verbatim, paired with a human-
readable name resolved via ndi.ontology.lookup. Classifies
*what kind of position* the linked element records (e.g., a
midpoint position, a probe tip).
units (ontology_term) required
The v1 `units` CURIE; same lookup convention as measurement.
dimensions (structure-array) optional
One record per spatial axis. Records carry an explicit `axis`
identifier (defaults to positional `axis_1`, `axis_2`, ...) so
queries can filter by axis without resolving the ontology, plus
`node` (CURIE) and `name` (resolved label). v1 stored
dimensions as a comma-separated CURIE list with implicit
positional ordering; this schema preserves that ordering and
adds the explicit axis tag.
Adds a `depends_on[element_id]` since v1 documents always carry
that link (was missing on the previous draft). Migrator-side
changes land in the paired did-matlab commit.
Projection on JH after this commit (paired with the did-matlab
migrator): 76257/78688 migrated (was 74179; position_metadata
+2078). Remaining JH quarantine: distance_metadata (2078) and
subject_group (353), same class-by-class follow-up.
The JH corpus surfaces 2078 v1 distance_metadata documents whose
content is paired A/B endpoint metadata, not a scalar distance.
Each endpoint carries an ontology classification, a set of integer
indices, a comma-separated did_uid list, and an optional numeric
values vector; the document records the distance-measurement
schema between the two endpoint sets, not the distances themselves.
Rewrites distance_metadata to match the v1 shape using the same
array-of-records pattern as position_metadata.dimensions:
endpoints: structure-array
endpoints(i).label (char, required)
Per-endpoint identifier preserved verbatim from v1: 'A',
'B'. Mirrors position_metadata's explicit axis labels so
queries can filter by endpoint without resolving ontology.
endpoints(i).measurement (ontology_term, required)
v1 ontologyNode_X CURIE + ndi.ontology.lookup name.
endpoints(i).integer_ids (matrix of integer, optional)
v1 integerIDs_X verbatim. Often a scalar for endpoint A,
a multi-element vector for endpoint B.
endpoints(i).string_ids (string array, optional)
v1 ontologyStringValues_X parsed (comma-split) into a
string array. Each entry is typically a did_uid pointing
at another document.
endpoints(i).numeric_values (matrix of double, optional)
v1 ontologyNumericValues_X. Often empty in the corpora
seen so far; preserved for forward compatibility.
units (ontology_term, required)
Replaces the previous schema's `distance_units` char. v1
already stored this as a CURIE.
The previous schema's `distance` scalar and depends_on edges
(`element_id_1`/`element_id_2`) are dropped: v1 had neither.
depends_on becomes a single `element_id` matching the v1 idiom.
Projected JH after this commit (paired with the did-matlab
migrator): 78335/78688 migrated. Remaining: subject_group (353)
and Dab stimulus_bath (1605), unchanged.
The JH corpus surfaces 353 v1 subject_group documents whose
property block is universally empty (`{}`) and whose base.name is
also universally empty -- the class is a pure relational marker.
Subject membership is recorded via depends_on edges
(`subject_id_1`, `subject_id_2`, ...). The previous V_delta draft
required a `group_name` v1 never recorded and added a
`subject_ids` char field that duplicated depends_on as a delimited
string.
Changes:
- `group_name` mustBeNonEmpty: true -> false. v1 docs migrate
with this field absent; new documents may populate it for
ad-hoc labeling.
- `description` unchanged (already optional).
- `subject_ids` field removed entirely. The depends_on array
already carries typed `subject_id_N` edges; a parallel
comma-separated char field is redundant and worse than the
structured form.
No migrator change needed. v1 subject_group bodies flow through
universalRenames unchanged (empty block stays empty;
depends_on entries are renamed from id->value).
Projection on JH after this commit: 78688/78688 migrated.
Remaining quarantine: only Dab stimulus_bath (1605).
Driven by the Dab corpus (1605 v1 stimulus_bath documents). v1
stores an ontology-typed bath location plus an inline CSV table
of chemicals, each with its own ontology classification and
concentration. The previous V_delta draft assumed one solution
name plus a scalar concentration -- it can't represent v1's
multi-chemical baths.
## New composite type: concentration
The other SI composites (duration / volume / mass / length /
voltage / current / frequency) all share one canonical sub-field
(meters, volts, ...) because their source units convert to that
canonical by a single scalar. Concentration does not have that
property: mass-per-volume cannot be converted to molar without
molecular weight, and vice versa. Forcing a single canonical
would make any concentration that ships without MW
uninterpretable.
`concentration` therefore has multiple OPTIONAL canonical
sub-fields, with the migrator populating whichever the source
unit is computable into:
molar (double, opt) mol/L
grams_per_liter (double, opt) mass/volume
mass_fraction (double, opt) w/w (dimensionless 0-1)
volume_fraction (double, opt) v/v (dimensionless 0-1)
approximate (boolean)
source_unit (char) verbatim source unit text
source_value (double) verbatim source value
Added to the meta-schema type enum and described in the top-level
meta-schema documentation alongside the existing SI composites.
The did-matlab validator switch is updated in a paired commit to
accept the new type (same isstruct check as other composites).
## stimulus_bath redesign
Replaces solution_name/concentration/concentration_units with:
super: [base, epochid] (epochid restored to match v1)
depends_on: [stimulus_element_id] (renamed from element_id)
fields:
* location (ontology_term, REQ) the bath itself
mixture (structure, array-of-records):
chemical (ontology_term, REQ)
amount (concentration, opt)
The migrator parses the v1 CSV mixture_table (header row +
chemicals), each row producing a record with chemical
(node+name from v1 ontologyName/name) and amount
(concentration composite from v1 value/unitName).
Projection after this commit (paired with did-matlab migrator):
PRED 14/14 migrated unchanged
20211116 1220/1220 migrated unchanged
B 12917/12917 migrated unchanged
JH 78688/78688 migrated unchanged
Dab 27561/27561 migrated (was 25956; +1605 stimulus_bath)
All five discovery corpora round-trip clean.
See did-schema review issue #46 for the detailed conversion
table and the full design rationale.
The Soph corpus surfaces a single v1 metadata_editor document
whose content does not match the previous V_delta draft. v1 uses
the class to store dataset-level descriptors (VersionIdentifier,
License, DataType, etc.); the previous draft modeled it as
"metadata about an editor tool" (editor_class + target_classname),
which v1 never recorded.
Rewrites metadata_editor to match v1 exactly:
super: [base]
fields:
metadata_structure (structure, optional)
The single open-shape `metadata_structure` field carries the
arbitrary key/value pairs v1 stored; keys and inner shape are
intentionally not constrained by V_delta since they vary by editor
and dataset. The dropped `editor_class` and `target_classname`
fields had no v1 source.
Projection on Soph after this commit: 101427/101427 migrated.
All six discovery corpora round-trip clean.
Driven by the strict-validation audit (did-matlab commit aaf5529):
every v1 tuningcurve_calc body in the discovery corpora ships two
fields V_delta did not declare, so the data was landing in the
property block as undeclared extras and the strict validator
quarantined them with did2:validation:undeclaredField.
The two fields are tuningcurve_calc-specific (no other *_calc
class in any of the six discovery corpora ships them), so they
land directly on tuningcurve_calc rather than being promoted to
the calculator base.
tuningcurve_calc.log (char, optional)
Free-text log entry summarising the calculation, e.g.
'angle best value is 135.', 'sFrequency = 0.04'.
tuningcurve_calc.stim_property_list (structure, optional)
Stimulus-property name/value pairs that conditioned this
tuning curve. v1 shape preserved verbatim:
names (string array, optional) - e.g. {'sFrequency'}
values (matrix, optional) - corresponding scalar/array
Cleared corpus quarantines:
Soph 34606 tuningcurve_calc docs
20211116 84 tuningcurve_calc docs
total 34690 fewer undeclared-field errors
The next biggest cluster surfacing in the strict-mode report is
the v1 `stimulus_tuningcurve` inheritance on tuningcurve_calc,
which V_delta currently drops. Separate follow-up.
Driven by strict-validation audit (did-matlab aaf5529). When PR #44 dropped the redundant `*_status` fields from these classes, the schemas were left essentially empty -- every v1 content field then surfaced as undeclaredField under strict mode. Restored field declarations to match v1 verbatim: syncrule_mapping: cost (double, opt) numeric mapping score mapping (matrix, opt) 2-element numeric vector epochnode_a (structure) {epoch_clock, epoch_id, epoch_session_id, epochprobemap, objectclass} epochnode_b (structure) same shape as epochnode_a (dropped: mapping_data -- was aspirational, never in v1) epochfiles_ingested: epoch_id (char, opt) epoch identifier (e.g., 't00001') files (string, opt, list of file references; entries are !mustBeScalar) NDI URIs or absolute paths epochprobemap (char, opt) tab-separated probemap text daqreader_mfdaq_epochdata_ingested: parameters (structure, opt) {sample_analog_segment, sample_digital_segment} daqmetadatareader_epochdata_ingested: unchanged (v1 block is empty; strict validator already passes). Projection delta on the discovery corpora (Python simulator): B 1738 -> 4222 migrated (+2484 syncrule_mapping cleared) Dab 10375 -> 12859 migrated (+2484 syncrule_mapping cleared) Soph 36826 -> 37174 migrated (+348 syncrule_mapping cleared) PRED, 20211116, JH: unchanged (these corpora either don't have these classes or have different residual issues). Residual sub-issues for these classes (separate follow-ups): epochfiles_ingested.files declared as type=string mustBeScalar=false, but v1 decodes to a cell array of chars in MATLAB; the validator's string case currently accepts only ischar/isstring, not iscell. Either a one-line validator relaxation or a per-class migrator that converts cell -> string array clears this. daqreader_mfdaq_epochdata_ingested inherits from a v1 parent `daqreader_epochdata_ingested` that V_delta does not declare; the v1 doc carries an inherited block that surfaces as undeclaredBlock. Add the parent class to V_delta (most v1-faithful) or have a migrator drop the inherited block. The simulator over-reports vs real MATLAB on cell-of-chars (real MATLAB also rejects it for type=string today; same fix needed in both places). The corpus CI run will confirm the authoritative numbers.
Driven by strict-validation audit. Three changes:
1. Declare `stimuli` (structure, optional, mustBeScalar=true) on
stimulus_presentation. v1 carries it on every doc as
`{parameters: {...stimulus-type-specific keys...}}` -- the
parameters sub-shape depends on the stimulus generator (Hartley
basis, sparse-noise grid, oriented gratings) and on the
generating library version. V_delta declares `stimuli.parameters`
but does not constrain inner keys.
2. Drop `num_trials` (integer field). v1 never ships it; was
aspirational.
3. Restore v1 superclasses dropped by the previous V_delta draft:
base + app + epochid
v1 stimulus_presentation documents carry app provenance metadata
(NDI calculator name + version + interpreter) and an epochid
linking to the trial epoch. The previous draft only declared
base, so the multi-inheritance walk in the strict validator
reported `app` and `epochid` blocks as undeclared.
`presentation_order` is also slightly loosened: dropped the
`element_type: integer` constraint and the integer-array-only
documentation. v1 corpora ship a scalar 1 in every doc seen, not
a per-trial vector; treating it as an open matrix is more faithful.
Projection delta (Python simulator) on the four corpora that ship
stimulus_presentation docs:
20211116 542 -> 553 migrated (+11)
B 4222 -> 5464 migrated (+1242)
Dab 12859 -> 14101 migrated (+1242)
Soph 37174 -> 37349 migrated (+175)
Total: 2670 stimulus_presentation docs now migrate cleanly under
strict validation. No paired did-matlab change required.
Same pattern as stimulus_presentation: v1 ships 7 content fields
plus an `app` superclass, V_delta drafted 4 different fields and
declared only base as a superclass. Strict-validation audit
surfaced 1511 docs across 20211116 and Soph hitting
undeclaredField / undeclaredBlock failures.
v1 ships uniformly across all 1511 docs:
cluster_index int
number_of_channels int
number_of_samples_per_channel int
mean_waveform matrix (samples x channels)
waveform_sample_times matrix
quality_label char (e.g., 'good', 'multi')
quality_number int (sorter-specific grade)
[superclasses: base, app]
V_delta now declares all of these and lists `app` as a superclass
so v1's app block (provenance: ndi.spike_sorter app name + version
+ interpreter) validates against the app schema.
Dropped from the previous V_delta draft (no v1 source):
quality (replaced by quality_label, which is the v1 spelling)
num_spikes (derivable from waveform/spiketimes data downstream)
mean_firing_rate (derivable; was annotated as a `frequency`
composite candidate but no v1 doc ships it)
Projection delta (Python simulator):
20211116 553 -> 574 migrated (+21 neuron_extracellular)
Soph 37349 -> 38839 migrated (+1490 neuron_extracellular)
Total 1511 neuron_extracellular docs now migrate cleanly under
strict validation. No paired did-matlab change required.
Same pattern as stimulus_presentation / neuron_extracellular.
Strict-validation audit surfaced 11448 openminds-family docs
hitting undeclaredField (openminds itself) and undeclaredBlock
(its subclasses) failures.
v1 design:
openminds (8 docs, JH)
fields: openminds_type (URL IRI),
matlab_type (MATLAB class name),
openminds_id (instance IRI),
fields (open-shape struct, openminds-type-specific)
openminds_subject (10401 docs, Dab + JH)
block: {} -- empty; superclass=[base, openminds]
openminds_element (404 docs, Dab)
block: {} -- empty; superclass=[base, openminds]
openminds_stimulus (635 docs, Dab)
block: {} -- empty; superclass=[base, openminds, epochid]
Changes:
openminds -- field set rewritten to match v1 verbatim:
+ matlab_type, openminds_id, fields (struct, open)
= openminds_type (was already there; doc string
updated to reflect the full IRI v1 ships rather
than the short `core.Person` style note)
- openminds_data, openminds_version (no v1 source)
openminds_subject -- add `openminds` as superclass.
openminds_element -- add `openminds` as superclass.
openminds_stimulus -- add `openminds` and `epochid` as superclasses.
Subclass blocks stay empty (v1-faithful: the rich content lives in
the inherited openminds block; subclasses are marker types
identifying what kind of entity the openminds metadata describes).
Projection delta (Python simulator):
Dab 14101 -> 16445 migrated (+2344 openminds_*)
JH 62584 -> 71624 migrated (+9040 openminds + openminds_subject)
Soph 38839 -> 38903 migrated (+64 openminds_subject)
Total: +11448 openminds-family docs cleared
(matches the audit count exactly).
…_calc inherits it
The biggest remaining strict-validation cluster: tuningcurve_calc
documents in 20211116 (84) and Soph (34606) carry a v1
`stimulus_tuningcurve` superclass block that V_delta did not
declare. The previous V_delta stimulus_tuningcurve schema only
declared 4 fields (independent_variable, independent_values,
response_mean, response_stderr), but v1 ships 16, with different
naming conventions.
v1 stimulus_tuningcurve uniformly ships across 34690 docs:
independent_variable_label string array
independent_variable_value matrix (N_stim x N_var)
stimid matrix integer
response_mean matrix
response_stddev matrix
response_stderr matrix
response_units string array
individual_responses_real matrix (N_trial x N_stim)
individual_responses_imaginary matrix (N_trial x N_stim)
stimulus_presentation_number matrix integer
control_stimid matrix integer
control_response_mean matrix
control_response_stddev matrix
control_response_stderr matrix
control_individual_responses_real matrix (N_trial x N_stim)
control_individual_responses_imaginary matrix (N_trial x N_stim)
Changes:
stimulus_tuningcurve -- rewritten to declare all 16 v1 fields
verbatim. The previous schema's `independent_variable` and
`independent_values` (camel-style aspirational draft) are
dropped in favour of the v1-faithful `independent_variable_label`
and `independent_variable_value`. response_mean and response_stderr
are preserved as-is (already matched). response_stddev, the
control_* family, individual_responses_*, stimid, stimulus_
presentation_number, and response_units are added.
tuningcurve_calc.superclasses += stimulus_tuningcurve. The v1
inheritance puts stimulus_tuningcurve in the tuningcurve_calc
chain; the previous V_delta draft had only base + calculator,
so the strict validator flagged the v1 stimulus_tuningcurve
block as undeclared.
Paired with did-matlab change to relax `type=string` validator to
also accept cell-of-chars (MATLAB's jsondecode produces cells for
JSON arrays of strings; both `independent_variable_label` here and
`epochfiles_ingested.files` need that).
Projection delta on the discovery corpora (Python simulator):
20211116 574 -> 658 migrated (+84 tuningcurve_calc)
B 5464 -> 7948 migrated (+2484 epochfiles_ingested)
Dab 16445 -> 20533 migrated (+4088 epochfiles_ingested
+ co-resident classes)
Soph 38903 -> 73858 migrated (+34955 tuningcurve_calc +
epochfiles_ingested)
JH 71624 unchanged (no tuningcurve_calc / epochfiles_ingested
affected here; JH's remaining quarantines
are image_stack and a few small clusters)
Single largest schema cleanup so far: ~42K docs cleared across four
corpora.
Strict-validation audit surfaced 7007 image_stack docs in JH all
failing on undeclaredField image_stack.label. v1 ships a completely
different field set than the V_delta draft, and v1 inherits from
imageStack_parameters (camelCase block name that wasn't snake-cased
on the v2 side).
v1 imageStack ships uniformly:
label (char) free-text caption
formatOntology (char) ontology CURIE classifying the format
[superclasses: base, imageStack_parameters]
v1 imageStack_parameters ships uniformly:
dimension_order (char) axis order, one char per axis
dimension_labels (char) comma-separated per-axis labels
dimension_size (matrix) per-axis pixel/sample counts
dimension_scale (matrix) per-axis physical scale
dimension_scale_units (char) comma-separated per-axis units
data_type (char) pixel data type ("uint16", etc.)
data_limits (matrix) [min, max] pixel range
timestamp (double) acquisition timestamp
clocktype (char) clock identifier
V_delta drafts had unrelated fields:
image_stack: num_frames / x_pixels / y_pixels / image_format
image_stack_parameters: z_step / z_units / x_pixel_size / y_pixel_size
/ pixel_units
Rewrites both to match v1 verbatim and adds image_stack_parameters
as a superclass of image_stack so the v1 inheritance is honoured.
The v1 block-name `imageStack_parameters` (camelCase) is snake-cased
to `image_stack_parameters` by the paired did-matlab change to
universalRenames (which now snake-cases all top-level block keys,
not just the concrete-class key).
Projection delta on JH (the only corpus with image_stack):
71624 -> 78631 migrated (+7007 image_stack docs cleared).
PRED stays 14/14; other corpora unchanged (no image_stack docs).
Two strict-validation clusters resolved together:
1. stimulus_response_scalar_parameters_basic field rewrite
v1 ships uniformly across 11440 docs:
temporalfreqfunc (char)
freq_response (integer 0/1)
prestimulus_time (matrix)
prestimulus_normalization (matrix)
isspike (integer 0/1)
spiketrain_dt (double)
[superclasses: base, stimulus_response_scalar_parameters]
V_delta drafted:
response_window_start, response_window_end, freq_response
Rewritten to declare all 6 v1 fields verbatim; dropped the
aspirational response_window_* fields. Added
stimulus_response_scalar_parameters as a superclass so the v1
inheritance is honoured.
2. spatial_frequency_tuning / temporal_frequency_tuning rename
V_delta drafts both had a `fit_sgauss` field; v1 corpora
uniformly use `fit_gausslog`. Same field, drafted under a
shorter spelling that v1 never adopted. Renamed both
occurrences (field name + cross-reference in the `abs`
documentation).
Projection delta on the corpora that carry these classes:
20211116 658 -> 931 migrated (+273 stimulus_response_*_basic)
Soph 73858 -> 90629 migrated (+16771 spatial_/temporal_calc
+stimulus_response_*_basic)
~17K docs cleared. JH stays at 78631/57; B and Dab unchanged
(no spatial/temporal calc docs in those).
Brings all 6 v1 corpora (PRED, 20211116, B, Dab, JH, Soph; 221,827 docs total) to 100% clean migration with zero quarantines. Per-class changes (all to match v1-faithful shapes the corpora actually ship): - stimulus_response: replace `response_type` with the two v1 fields `stimulator_epochid` and `element_epochid` (response_type already lives on the child stimulus_response_scalar block). - stimulus_response_scalar: pre-existing class header; no field change. - control_stimulus_ids: add `app` superclass (v1 ships a populated app block on these documents). - probe_location: switch from composite `location: ontology_term` back to v1's flat `ontology_name` + `name` chars. - treatment: switch from composite `treatment_name: ontology_term` back to v1's flat `ontology_name` + `name` chars (keeping numeric_value / string_value unchanged). - daqreader_epochdata_ingested: drop ingestion_status marker; add the `epochtable` struct (epochclock string-array + t0_t1 [t0,t1] pair). - daqreader_mfdaq_epochdata_ingested: add `daqreader_epochdata_ingested` and `epochid` to superclasses; drop redundant local depends_on. - daqmetadatareader_epochdata_ingested: add `epochid` superclass. - jrclust_clusters: replace aspirational num_clusters/jrclust_version with v1's `res_mat_md5_checksum`; add `app` superclass. - dataset_remote: add `organization_id` field. - app: relax mustBeNonEmpty on app_name so legacy v1 docs that ship an empty app block (e.g. some jrclust_clusters) still validate. Validation: pytest 96/96 green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Driven by the B-corpus discovery (12,917 v1 docs, 18 classes, 8708 quarantined under the previous schemas). Every quarantined class is addressed here on the schema side — no paired did-matlab change required.
Design point
Confirmed with the maintainer: V_delta documents are immutable. "Did this happen?" tracking fields are therefore redundant — the existence of the document is the state. Five required fields that violated this principle are dropped.
Dropped fields
epochfiles_ingestedingestion_status(required),num_files_ingested(derived)daqreader_mfdaq_epochdata_ingestedingestion_status(required)daqmetadatareader_epochdata_ingestedingestion_status(required)syncrule_mappingmapping_status(required)dataset_remoteremote_url(required)dataset_id+organization_idAfter the drops, these classes either become marker-style records with zero own fields, or retain only their genuinely-content fields (
syncrule_mappingkeepsmapping_data;dataset_remotekeepsremote_typeanddataset_id).session_in_a_datasetrestored to v1 shapeThe earlier V_delta draft had stripped this class down to a required
dataset_id(char) plus adepends_on[session_id]. Both are wrong:dataset_idis intrinsicallybase.session_id. In v1's storage model, the document is inside a dataset, and the dataset's identity is its session-id, which lives on every contained document'sbase.session_id. Sodataset_idwas duplicatingbase.session_idwithout new information.session_idinline as a property-block field, not as adepends_onedge.session_reference(recording-date label),is_linked,session_creator(e.g.,"ndi.session.dir"), andsession_creator_input1..6.The rebuilt schema mirrors v1 exactly:
session_id(did_uid, required, inline) +session_reference+is_linked+session_creator+session_creator_input1..6.Verification
index.jsonandtopics.jsonneed no edits (they only carry class-level metadata, not field defs).Projected impact on the discovery corpora (via did-matlab simulator)
Coordination
No paired did-matlab PR needed. The converter's universal-rename pass plus the dispatcher empty-block pad already handle every v1 doc that previously quarantined here. Adding a third corpus fixture (
B.zip) to did-matlab's discovery tests is a separate follow-up on that side.Out of scope (still pending)
did2.validate.referencesfordepends_onreferential integrity at DB-ingest time.Generated by Claude Code