Skip to content

V_delta: drop redundant status fields; restore v1 session_in_a_dataset shape#44

Merged
stevevanhooser merged 16 commits into
mainfrom
claude/did-matlab-v2-import-Rs8AX
May 17, 2026
Merged

V_delta: drop redundant status fields; restore v1 session_in_a_dataset shape#44
stevevanhooser merged 16 commits into
mainfrom
claude/did-matlab-v2-import-Rs8AX

Conversation

@stevevanhooser
Copy link
Copy Markdown
Contributor

Driven by the B-corpus discovery (12,917 v1 docs, 18 classes, 8708 quarantined under the previous schemas). Every quarantined class is addressed here on the schema side — no paired did-matlab change required.

Design point

Confirmed with the maintainer: V_delta documents are immutable. "Did this happen?" tracking fields are therefore redundant — the existence of the document is the state. Five required fields that violated this principle are dropped.

Dropped fields

schema field dropped reason
epochfiles_ingested ingestion_status (required), num_files_ingested (derived) presence = ingested
daqreader_mfdaq_epochdata_ingested ingestion_status (required) presence = ingested
daqmetadatareader_epochdata_ingested ingestion_status (required) presence = ingested
syncrule_mapping mapping_status (required) presence = mapped
dataset_remote remote_url (required) not needed; v1 has dataset_id + organization_id

After the drops, these classes either become marker-style records with zero own fields, or retain only their genuinely-content fields (syncrule_mapping keeps mapping_data; dataset_remote keeps remote_type and dataset_id).

session_in_a_dataset restored to v1 shape

The earlier V_delta draft had stripped this class down to a required dataset_id (char) plus a depends_on[session_id]. Both are wrong:

  • dataset_id is intrinsically base.session_id. In v1's storage model, the document is inside a dataset, and the dataset's identity is its session-id, which lives on every contained document's base.session_id. So dataset_id was duplicating base.session_id without new information.
  • v1 stored session_id inline as a property-block field, not as a depends_on edge.
  • v1 also carried session-reconstitution metadata that V_delta had dropped: session_reference (recording-date label), is_linked, session_creator (e.g., "ndi.session.dir"), and session_creator_input1..6.

The rebuilt schema mirrors v1 exactly: session_id (did_uid, required, inline) + session_reference + is_linked + session_creator + session_creator_input1..6.

Verification

  • Meta-schema validation passes on all six touched schemas.
  • Existing pytest suite stays at 96/96 passing (V_beta + V_gamma coverage).
  • index.json and topics.json need no edits (they only carry class-level metadata, not field defs).

Projected impact on the discovery corpora (via did-matlab simulator)

corpus before after
PRED 14/14 14/14
20211116 1220/1220 1220/1220
B 4209/12917 12917/12917

Coordination

No paired did-matlab PR needed. The converter's universal-rename pass plus the dispatcher empty-block pad already handle every v1 doc that previously quarantined here. Adding a third corpus fixture (B.zip) to did-matlab's discovery tests is a separate follow-up on that side.

Out of scope (still pending)

  • did2.validate.references for depends_on referential integrity at DB-ingest time.
  • Step 7 NDI-matlab port.

Generated by Claude Code

claude added 16 commits May 15, 2026 18:41
…t shape

Driven by the B-corpus discovery report (12,917 v1 docs across 18
classes, 8708 quarantined under the previous schemas). All six
classes the corpus surfaces as newly affected are addressed here.

Design principle confirmed: V_delta documents are immutable, so
"did this thing happen?" tracking fields are redundant -- the
existence of the document IS the state. Five required fields that
violated this are dropped.

Dropped fields:

  epochfiles_ingested.ingestion_status            (required)
  epochfiles_ingested.num_files_ingested          (derived metadata)
  daqreader_mfdaq_epochdata_ingested.ingestion_status   (required)
  daqmetadatareader_epochdata_ingested.ingestion_status (required)
  syncrule_mapping.mapping_status                 (required)
  dataset_remote.remote_url                       (required)

After the drops these classes either have zero own fields
(marker-style records whose presence is the entire signal) or
retain only their genuinely-content fields (e.g.,
syncrule_mapping keeps `mapping_data`; dataset_remote keeps
`remote_type` and `dataset_id`).

session_in_a_dataset rebuilt to mirror v1 verbatim:

  before: required `dataset_id` field + required depends_on[session_id]
  after:  session_id (did_uid, required, inline) + session_reference
          + is_linked + session_creator + session_creator_input1..6

The "dataset_id" concept is intrinsically `base.session_id` in
v1's storage model -- the document is found inside a dataset, and
the dataset's identity is the dataset's session-id, which lives
on every contained document's base block. So the V_delta-only
`dataset_id` field was duplicating base.session_id without adding
new information. Restoring the full v1 field set also recovers
the session-reconstitution metadata (session_creator,
session_creator_input1..6, session_reference, is_linked) that the
earlier V_delta draft had stripped.

Projection on the three discovery corpora (Python simulator):

  PRED        14/14    migrated (no change)
  20211116  1220/1220  migrated (no change)
  B        12917/12917 migrated (was 4209/12917)

No paired did-matlab change is required: the converter's
universal-rename pass plus the dispatcher empty-block pad already
handle every v1 doc that previously quarantined here. The
existing testCorpus* tests continue to gate PRED at zero
quarantine and run 20211116 in discovery mode; a third corpus
fixture (B) can be added on the did-matlab side as a separate
follow-up.
Driven by the JH corpus surfacing the multi-clock case: v1
documents that record the same element epoch in multiple clock
frames (e.g., `epoch_clock = "dev_local_time,exp_global_time"`,
`t0_t1 = [[a,b],[c,d]]`). The previous schema required
mustBeScalar epoch_clock/t0/t1 and could not represent it.

Replaces the three flat scalar fields with a single
`element_epoch.clocks` array-of-records:

  clocks(i).name   char    required   per-clock identifier
  clocks(i).t0     double  required   start time in clocks(i) units
  clocks(i).t1     double  required   stop  time in clocks(i) units

Single-clock documents (PRED, 20211116, B corpora) migrate to a
1-element array; multi-clock JH documents migrate to a 2+ element
array.

mustBeScalar on the parent clocks field is `false`; mustBeNonEmpty
is `true` so empty/absent timing data still quarantines. The
queryable flag stays on `clocks` so query callers can join via
sidecar array-element rows.

No paired ngrid/epochclocktimes changes here. The epochclocktimes
superclass (sibling of element_epoch with the same scalar-field
shape) keeps its current schema for this PR -- so far it's only
used as a *superclass* on classes whose v1 form has the
single-clock pattern (pyraview in PRED is the only test). If a
multi-clock case for an epochclocktimes-using class surfaces, we
do the same array-of-records flip there too.

Projection on the 20211116, B, and JH corpora after this commit
(paired with the did-matlab migrator update):
  PRED        14/14    migrated  (no change)
  20211116  1220/1220  migrated  (no change)
  B        12917/12917 migrated  (no change)
  JH        67172/78688 migrated (was 63016; element_epoch +4156)
The JH corpus surfaces 2078 v1 position_metadata documents, all
purely descriptor data (no x/y/z values, no files block). The
V_delta draft had inverted the v1 intent by storing concrete
numeric coordinates instead of the ontology-driven shape v1
actually carried. Aligned with the design used by `probe_location`
elsewhere in V_delta (`location` field as an ontology_term).

Rewrites position_metadata to mirror the v1 shape:

  measurement (ontology_term)  required
    The v1 `ontologyNode` CURIE verbatim, paired with a human-
    readable name resolved via ndi.ontology.lookup. Classifies
    *what kind of position* the linked element records (e.g., a
    midpoint position, a probe tip).

  units (ontology_term)         required
    The v1 `units` CURIE; same lookup convention as measurement.

  dimensions (structure-array)  optional
    One record per spatial axis. Records carry an explicit `axis`
    identifier (defaults to positional `axis_1`, `axis_2`, ...) so
    queries can filter by axis without resolving the ontology, plus
    `node` (CURIE) and `name` (resolved label). v1 stored
    dimensions as a comma-separated CURIE list with implicit
    positional ordering; this schema preserves that ordering and
    adds the explicit axis tag.

Adds a `depends_on[element_id]` since v1 documents always carry
that link (was missing on the previous draft). Migrator-side
changes land in the paired did-matlab commit.

Projection on JH after this commit (paired with the did-matlab
migrator): 76257/78688 migrated (was 74179; position_metadata
+2078). Remaining JH quarantine: distance_metadata (2078) and
subject_group (353), same class-by-class follow-up.
The JH corpus surfaces 2078 v1 distance_metadata documents whose
content is paired A/B endpoint metadata, not a scalar distance.
Each endpoint carries an ontology classification, a set of integer
indices, a comma-separated did_uid list, and an optional numeric
values vector; the document records the distance-measurement
schema between the two endpoint sets, not the distances themselves.

Rewrites distance_metadata to match the v1 shape using the same
array-of-records pattern as position_metadata.dimensions:

  endpoints: structure-array
    endpoints(i).label           (char, required)
        Per-endpoint identifier preserved verbatim from v1: 'A',
        'B'. Mirrors position_metadata's explicit axis labels so
        queries can filter by endpoint without resolving ontology.

    endpoints(i).measurement     (ontology_term, required)
        v1 ontologyNode_X CURIE + ndi.ontology.lookup name.

    endpoints(i).integer_ids     (matrix of integer, optional)
        v1 integerIDs_X verbatim. Often a scalar for endpoint A,
        a multi-element vector for endpoint B.

    endpoints(i).string_ids      (string array, optional)
        v1 ontologyStringValues_X parsed (comma-split) into a
        string array. Each entry is typically a did_uid pointing
        at another document.

    endpoints(i).numeric_values  (matrix of double, optional)
        v1 ontologyNumericValues_X. Often empty in the corpora
        seen so far; preserved for forward compatibility.

  units (ontology_term, required)
        Replaces the previous schema's `distance_units` char. v1
        already stored this as a CURIE.

The previous schema's `distance` scalar and depends_on edges
(`element_id_1`/`element_id_2`) are dropped: v1 had neither.
depends_on becomes a single `element_id` matching the v1 idiom.

Projected JH after this commit (paired with the did-matlab
migrator): 78335/78688 migrated. Remaining: subject_group (353)
and Dab stimulus_bath (1605), unchanged.
The JH corpus surfaces 353 v1 subject_group documents whose
property block is universally empty (`{}`) and whose base.name is
also universally empty -- the class is a pure relational marker.
Subject membership is recorded via depends_on edges
(`subject_id_1`, `subject_id_2`, ...). The previous V_delta draft
required a `group_name` v1 never recorded and added a
`subject_ids` char field that duplicated depends_on as a delimited
string.

Changes:

  - `group_name` mustBeNonEmpty: true -> false. v1 docs migrate
    with this field absent; new documents may populate it for
    ad-hoc labeling.
  - `description` unchanged (already optional).
  - `subject_ids` field removed entirely. The depends_on array
    already carries typed `subject_id_N` edges; a parallel
    comma-separated char field is redundant and worse than the
    structured form.

No migrator change needed. v1 subject_group bodies flow through
universalRenames unchanged (empty block stays empty;
depends_on entries are renamed from id->value).

Projection on JH after this commit: 78688/78688 migrated.
Remaining quarantine: only Dab stimulus_bath (1605).
Driven by the Dab corpus (1605 v1 stimulus_bath documents). v1
stores an ontology-typed bath location plus an inline CSV table
of chemicals, each with its own ontology classification and
concentration. The previous V_delta draft assumed one solution
name plus a scalar concentration -- it can't represent v1's
multi-chemical baths.

## New composite type: concentration

The other SI composites (duration / volume / mass / length /
voltage / current / frequency) all share one canonical sub-field
(meters, volts, ...) because their source units convert to that
canonical by a single scalar. Concentration does not have that
property: mass-per-volume cannot be converted to molar without
molecular weight, and vice versa. Forcing a single canonical
would make any concentration that ships without MW
uninterpretable.

`concentration` therefore has multiple OPTIONAL canonical
sub-fields, with the migrator populating whichever the source
unit is computable into:

  molar           (double, opt)   mol/L
  grams_per_liter (double, opt)   mass/volume
  mass_fraction   (double, opt)   w/w (dimensionless 0-1)
  volume_fraction (double, opt)   v/v (dimensionless 0-1)
  approximate     (boolean)
  source_unit     (char)          verbatim source unit text
  source_value    (double)        verbatim source value

Added to the meta-schema type enum and described in the top-level
meta-schema documentation alongside the existing SI composites.
The did-matlab validator switch is updated in a paired commit to
accept the new type (same isstruct check as other composites).

## stimulus_bath redesign

Replaces solution_name/concentration/concentration_units with:

  super: [base, epochid]          (epochid restored to match v1)
  depends_on: [stimulus_element_id]   (renamed from element_id)

  fields:
  * location  (ontology_term, REQ)         the bath itself
    mixture   (structure, array-of-records):
                chemical (ontology_term, REQ)
                amount   (concentration, opt)

The migrator parses the v1 CSV mixture_table (header row +
chemicals), each row producing a record with chemical
(node+name from v1 ontologyName/name) and amount
(concentration composite from v1 value/unitName).

Projection after this commit (paired with did-matlab migrator):

  PRED         14/14    migrated  unchanged
  20211116   1220/1220  migrated  unchanged
  B        12917/12917  migrated  unchanged
  JH       78688/78688  migrated  unchanged
  Dab      27561/27561  migrated  (was 25956; +1605 stimulus_bath)

All five discovery corpora round-trip clean.

See did-schema review issue #46 for the detailed conversion
table and the full design rationale.
The Soph corpus surfaces a single v1 metadata_editor document
whose content does not match the previous V_delta draft. v1 uses
the class to store dataset-level descriptors (VersionIdentifier,
License, DataType, etc.); the previous draft modeled it as
"metadata about an editor tool" (editor_class + target_classname),
which v1 never recorded.

Rewrites metadata_editor to match v1 exactly:

  super: [base]
  fields:
    metadata_structure (structure, optional)

The single open-shape `metadata_structure` field carries the
arbitrary key/value pairs v1 stored; keys and inner shape are
intentionally not constrained by V_delta since they vary by editor
and dataset. The dropped `editor_class` and `target_classname`
fields had no v1 source.

Projection on Soph after this commit: 101427/101427 migrated.
All six discovery corpora round-trip clean.
Driven by the strict-validation audit (did-matlab commit aaf5529):
every v1 tuningcurve_calc body in the discovery corpora ships two
fields V_delta did not declare, so the data was landing in the
property block as undeclared extras and the strict validator
quarantined them with did2:validation:undeclaredField.

The two fields are tuningcurve_calc-specific (no other *_calc
class in any of the six discovery corpora ships them), so they
land directly on tuningcurve_calc rather than being promoted to
the calculator base.

  tuningcurve_calc.log               (char, optional)
    Free-text log entry summarising the calculation, e.g.
    'angle best value is 135.', 'sFrequency = 0.04'.

  tuningcurve_calc.stim_property_list (structure, optional)
    Stimulus-property name/value pairs that conditioned this
    tuning curve. v1 shape preserved verbatim:
      names  (string array, optional) - e.g. {'sFrequency'}
      values (matrix, optional)       - corresponding scalar/array

Cleared corpus quarantines:
  Soph        34606 tuningcurve_calc docs
  20211116       84 tuningcurve_calc docs
  total       34690 fewer undeclared-field errors

The next biggest cluster surfacing in the strict-mode report is
the v1 `stimulus_tuningcurve` inheritance on tuningcurve_calc,
which V_delta currently drops. Separate follow-up.
Driven by strict-validation audit (did-matlab aaf5529). When PR #44
dropped the redundant `*_status` fields from these classes, the
schemas were left essentially empty -- every v1 content field
then surfaced as undeclaredField under strict mode.

Restored field declarations to match v1 verbatim:

syncrule_mapping:
  cost          (double, opt)   numeric mapping score
  mapping       (matrix, opt)   2-element numeric vector
  epochnode_a   (structure)     {epoch_clock, epoch_id,
                                 epoch_session_id, epochprobemap,
                                 objectclass}
  epochnode_b   (structure)     same shape as epochnode_a
  (dropped: mapping_data -- was aspirational, never in v1)

epochfiles_ingested:
  epoch_id      (char, opt)     epoch identifier (e.g., 't00001')
  files         (string, opt,   list of file references; entries are
                 !mustBeScalar)  NDI URIs or absolute paths
  epochprobemap (char, opt)     tab-separated probemap text

daqreader_mfdaq_epochdata_ingested:
  parameters    (structure, opt) {sample_analog_segment,
                                  sample_digital_segment}

daqmetadatareader_epochdata_ingested: unchanged (v1 block is empty;
strict validator already passes).

Projection delta on the discovery corpora (Python simulator):
  B      1738   -> 4222   migrated  (+2484 syncrule_mapping cleared)
  Dab   10375   -> 12859  migrated  (+2484 syncrule_mapping cleared)
  Soph  36826   -> 37174  migrated  (+348  syncrule_mapping cleared)
  PRED, 20211116, JH: unchanged (these corpora either don't have
  these classes or have different residual issues).

Residual sub-issues for these classes (separate follow-ups):
  epochfiles_ingested.files
    declared as type=string mustBeScalar=false, but v1 decodes to
    a cell array of chars in MATLAB; the validator's string case
    currently accepts only ischar/isstring, not iscell. Either a
    one-line validator relaxation or a per-class migrator that
    converts cell -> string array clears this.
  daqreader_mfdaq_epochdata_ingested inherits from a v1 parent
    `daqreader_epochdata_ingested` that V_delta does not declare;
    the v1 doc carries an inherited block that surfaces as
    undeclaredBlock. Add the parent class to V_delta (most
    v1-faithful) or have a migrator drop the inherited block.

The simulator over-reports vs real MATLAB on cell-of-chars (real
MATLAB also rejects it for type=string today; same fix needed in
both places). The corpus CI run will confirm the authoritative
numbers.
Driven by strict-validation audit. Three changes:

1. Declare `stimuli` (structure, optional, mustBeScalar=true) on
   stimulus_presentation. v1 carries it on every doc as
   `{parameters: {...stimulus-type-specific keys...}}` -- the
   parameters sub-shape depends on the stimulus generator (Hartley
   basis, sparse-noise grid, oriented gratings) and on the
   generating library version. V_delta declares `stimuli.parameters`
   but does not constrain inner keys.

2. Drop `num_trials` (integer field). v1 never ships it; was
   aspirational.

3. Restore v1 superclasses dropped by the previous V_delta draft:
     base + app + epochid
   v1 stimulus_presentation documents carry app provenance metadata
   (NDI calculator name + version + interpreter) and an epochid
   linking to the trial epoch. The previous draft only declared
   base, so the multi-inheritance walk in the strict validator
   reported `app` and `epochid` blocks as undeclared.

`presentation_order` is also slightly loosened: dropped the
`element_type: integer` constraint and the integer-array-only
documentation. v1 corpora ship a scalar 1 in every doc seen, not
a per-trial vector; treating it as an open matrix is more faithful.

Projection delta (Python simulator) on the four corpora that ship
stimulus_presentation docs:

  20211116    542 -> 553   migrated  (+11)
  B          4222 -> 5464  migrated  (+1242)
  Dab       12859 -> 14101 migrated  (+1242)
  Soph      37174 -> 37349 migrated  (+175)

Total: 2670 stimulus_presentation docs now migrate cleanly under
strict validation. No paired did-matlab change required.
Same pattern as stimulus_presentation: v1 ships 7 content fields
plus an `app` superclass, V_delta drafted 4 different fields and
declared only base as a superclass. Strict-validation audit
surfaced 1511 docs across 20211116 and Soph hitting
undeclaredField / undeclaredBlock failures.

v1 ships uniformly across all 1511 docs:
  cluster_index                  int
  number_of_channels             int
  number_of_samples_per_channel  int
  mean_waveform                  matrix (samples x channels)
  waveform_sample_times          matrix
  quality_label                  char (e.g., 'good', 'multi')
  quality_number                 int  (sorter-specific grade)
  [superclasses: base, app]

V_delta now declares all of these and lists `app` as a superclass
so v1's app block (provenance: ndi.spike_sorter app name + version
+ interpreter) validates against the app schema.

Dropped from the previous V_delta draft (no v1 source):
  quality (replaced by quality_label, which is the v1 spelling)
  num_spikes (derivable from waveform/spiketimes data downstream)
  mean_firing_rate (derivable; was annotated as a `frequency`
                    composite candidate but no v1 doc ships it)

Projection delta (Python simulator):
  20211116    553 -> 574    migrated  (+21  neuron_extracellular)
  Soph      37349 -> 38839  migrated  (+1490 neuron_extracellular)

Total 1511 neuron_extracellular docs now migrate cleanly under
strict validation. No paired did-matlab change required.
Same pattern as stimulus_presentation / neuron_extracellular.
Strict-validation audit surfaced 11448 openminds-family docs
hitting undeclaredField (openminds itself) and undeclaredBlock
(its subclasses) failures.

v1 design:
  openminds                         (8 docs, JH)
    fields: openminds_type (URL IRI),
            matlab_type   (MATLAB class name),
            openminds_id  (instance IRI),
            fields        (open-shape struct, openminds-type-specific)
  openminds_subject                 (10401 docs, Dab + JH)
    block: {} -- empty; superclass=[base, openminds]
  openminds_element                 (404 docs, Dab)
    block: {} -- empty; superclass=[base, openminds]
  openminds_stimulus                (635 docs, Dab)
    block: {} -- empty; superclass=[base, openminds, epochid]

Changes:

  openminds  -- field set rewritten to match v1 verbatim:
                + matlab_type, openminds_id, fields (struct, open)
                = openminds_type (was already there; doc string
                  updated to reflect the full IRI v1 ships rather
                  than the short `core.Person` style note)
                - openminds_data, openminds_version (no v1 source)

  openminds_subject  -- add `openminds` as superclass.
  openminds_element  -- add `openminds` as superclass.
  openminds_stimulus -- add `openminds` and `epochid` as superclasses.

Subclass blocks stay empty (v1-faithful: the rich content lives in
the inherited openminds block; subclasses are marker types
identifying what kind of entity the openminds metadata describes).

Projection delta (Python simulator):
  Dab      14101 -> 16445  migrated  (+2344  openminds_*)
  JH       62584 -> 71624  migrated  (+9040  openminds + openminds_subject)
  Soph     38839 -> 38903  migrated  (+64    openminds_subject)
  Total:                   +11448 openminds-family docs cleared
                                   (matches the audit count exactly).
…_calc inherits it

The biggest remaining strict-validation cluster: tuningcurve_calc
documents in 20211116 (84) and Soph (34606) carry a v1
`stimulus_tuningcurve` superclass block that V_delta did not
declare. The previous V_delta stimulus_tuningcurve schema only
declared 4 fields (independent_variable, independent_values,
response_mean, response_stderr), but v1 ships 16, with different
naming conventions.

v1 stimulus_tuningcurve uniformly ships across 34690 docs:
  independent_variable_label                string array
  independent_variable_value                matrix (N_stim x N_var)
  stimid                                    matrix integer
  response_mean                             matrix
  response_stddev                           matrix
  response_stderr                           matrix
  response_units                            string array
  individual_responses_real                 matrix (N_trial x N_stim)
  individual_responses_imaginary            matrix (N_trial x N_stim)
  stimulus_presentation_number              matrix integer
  control_stimid                            matrix integer
  control_response_mean                     matrix
  control_response_stddev                   matrix
  control_response_stderr                   matrix
  control_individual_responses_real         matrix (N_trial x N_stim)
  control_individual_responses_imaginary    matrix (N_trial x N_stim)

Changes:

  stimulus_tuningcurve  -- rewritten to declare all 16 v1 fields
    verbatim. The previous schema's `independent_variable` and
    `independent_values` (camel-style aspirational draft) are
    dropped in favour of the v1-faithful `independent_variable_label`
    and `independent_variable_value`. response_mean and response_stderr
    are preserved as-is (already matched). response_stddev, the
    control_* family, individual_responses_*, stimid, stimulus_
    presentation_number, and response_units are added.

  tuningcurve_calc.superclasses += stimulus_tuningcurve. The v1
    inheritance puts stimulus_tuningcurve in the tuningcurve_calc
    chain; the previous V_delta draft had only base + calculator,
    so the strict validator flagged the v1 stimulus_tuningcurve
    block as undeclared.

Paired with did-matlab change to relax `type=string` validator to
also accept cell-of-chars (MATLAB's jsondecode produces cells for
JSON arrays of strings; both `independent_variable_label` here and
`epochfiles_ingested.files` need that).

Projection delta on the discovery corpora (Python simulator):
  20211116   574  -> 658    migrated  (+84   tuningcurve_calc)
  B         5464  -> 7948   migrated  (+2484 epochfiles_ingested)
  Dab      16445  -> 20533  migrated  (+4088 epochfiles_ingested
                                              + co-resident classes)
  Soph     38903  -> 73858  migrated  (+34955 tuningcurve_calc +
                                               epochfiles_ingested)
  JH       71624  unchanged (no tuningcurve_calc / epochfiles_ingested
                             affected here; JH's remaining quarantines
                             are image_stack and a few small clusters)

Single largest schema cleanup so far: ~42K docs cleared across four
corpora.
Strict-validation audit surfaced 7007 image_stack docs in JH all
failing on undeclaredField image_stack.label. v1 ships a completely
different field set than the V_delta draft, and v1 inherits from
imageStack_parameters (camelCase block name that wasn't snake-cased
on the v2 side).

v1 imageStack ships uniformly:
  label            (char)  free-text caption
  formatOntology   (char)  ontology CURIE classifying the format
  [superclasses:   base, imageStack_parameters]

v1 imageStack_parameters ships uniformly:
  dimension_order        (char)   axis order, one char per axis
  dimension_labels       (char)   comma-separated per-axis labels
  dimension_size         (matrix) per-axis pixel/sample counts
  dimension_scale        (matrix) per-axis physical scale
  dimension_scale_units  (char)   comma-separated per-axis units
  data_type              (char)   pixel data type ("uint16", etc.)
  data_limits            (matrix) [min, max] pixel range
  timestamp              (double) acquisition timestamp
  clocktype              (char)   clock identifier

V_delta drafts had unrelated fields:
  image_stack:            num_frames / x_pixels / y_pixels / image_format
  image_stack_parameters: z_step / z_units / x_pixel_size / y_pixel_size
                          / pixel_units

Rewrites both to match v1 verbatim and adds image_stack_parameters
as a superclass of image_stack so the v1 inheritance is honoured.

The v1 block-name `imageStack_parameters` (camelCase) is snake-cased
to `image_stack_parameters` by the paired did-matlab change to
universalRenames (which now snake-cases all top-level block keys,
not just the concrete-class key).

Projection delta on JH (the only corpus with image_stack):
  71624 -> 78631 migrated  (+7007 image_stack docs cleared).

PRED stays 14/14; other corpora unchanged (no image_stack docs).
Two strict-validation clusters resolved together:

1. stimulus_response_scalar_parameters_basic field rewrite

  v1 ships uniformly across 11440 docs:
    temporalfreqfunc           (char)
    freq_response              (integer 0/1)
    prestimulus_time           (matrix)
    prestimulus_normalization  (matrix)
    isspike                    (integer 0/1)
    spiketrain_dt              (double)
    [superclasses: base, stimulus_response_scalar_parameters]

  V_delta drafted:
    response_window_start, response_window_end, freq_response

  Rewritten to declare all 6 v1 fields verbatim; dropped the
  aspirational response_window_* fields. Added
  stimulus_response_scalar_parameters as a superclass so the v1
  inheritance is honoured.

2. spatial_frequency_tuning / temporal_frequency_tuning rename

  V_delta drafts both had a `fit_sgauss` field; v1 corpora
  uniformly use `fit_gausslog`. Same field, drafted under a
  shorter spelling that v1 never adopted. Renamed both
  occurrences (field name + cross-reference in the `abs`
  documentation).

Projection delta on the corpora that carry these classes:
  20211116    658 -> 931    migrated  (+273  stimulus_response_*_basic)
  Soph      73858 -> 90629  migrated  (+16771 spatial_/temporal_calc
                                        +stimulus_response_*_basic)

~17K docs cleared. JH stays at 78631/57; B and Dab unchanged
(no spatial/temporal calc docs in those).
Brings all 6 v1 corpora (PRED, 20211116, B, Dab, JH, Soph; 221,827 docs
total) to 100% clean migration with zero quarantines.

Per-class changes (all to match v1-faithful shapes the corpora actually
ship):

- stimulus_response: replace `response_type` with the two v1 fields
  `stimulator_epochid` and `element_epochid` (response_type already lives
  on the child stimulus_response_scalar block).
- stimulus_response_scalar: pre-existing class header; no field change.
- control_stimulus_ids: add `app` superclass (v1 ships a populated app
  block on these documents).
- probe_location: switch from composite `location: ontology_term` back
  to v1's flat `ontology_name` + `name` chars.
- treatment: switch from composite `treatment_name: ontology_term` back
  to v1's flat `ontology_name` + `name` chars (keeping numeric_value /
  string_value unchanged).
- daqreader_epochdata_ingested: drop ingestion_status marker; add the
  `epochtable` struct (epochclock string-array + t0_t1 [t0,t1] pair).
- daqreader_mfdaq_epochdata_ingested: add `daqreader_epochdata_ingested`
  and `epochid` to superclasses; drop redundant local depends_on.
- daqmetadatareader_epochdata_ingested: add `epochid` superclass.
- jrclust_clusters: replace aspirational num_clusters/jrclust_version
  with v1's `res_mat_md5_checksum`; add `app` superclass.
- dataset_remote: add `organization_id` field.
- app: relax mustBeNonEmpty on app_name so legacy v1 docs that ship an
  empty app block (e.g. some jrclust_clusters) still validate.

Validation: pytest 96/96 green.
@stevevanhooser stevevanhooser merged commit eab2c63 into main May 17, 2026
4 checks passed
@stevevanhooser stevevanhooser deleted the claude/did-matlab-v2-import-Rs8AX branch May 17, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants