Skip to content

V_delta: relax mustBeScalar/mustNotHaveNaN flags to match v1 corpus shapes#47

Merged
stevevanhooser merged 1 commit into
mainfrom
claude/did-matlab-v2-import-Rs8AX
May 17, 2026
Merged

V_delta: relax mustBeScalar/mustNotHaveNaN flags to match v1 corpus shapes#47
stevevanhooser merged 1 commit into
mainfrom
claude/did-matlab-v2-import-Rs8AX

Conversation

@stevevanhooser
Copy link
Copy Markdown
Contributor

Two flags loosened to match what v1 corpora actually ship. Surfaced by closing a divergence between the Python conversion simulator (previously skipped mustBeScalar / mustNotHaveNaN entirely) and did-matlab's V_delta validator (which enforces both).

Changes

schema field flag from to
stimulus_presentation stimuli mustBeScalar true false
stimulus_tuningcurve stimulus_presentation_number mustNotHaveNaN true false

Why

stimulus_presentation.stimuli — v1 ships this as either a scalar struct (one stimulus configuration) or a struct array (one entry per presented stimulus, e.g. a 225-element array for a Hartley basis). Across the 6 v1 corpora the array form appears in 2,462 documents, distributed:

corpus array-form stimuli docs
20211116 1
B 1,113
Dab 1,113
Soph 175

The previous mustBeScalar: true was an aspirational tightening that no real v1 corpus respects. Documentation updated to call out the array form explicitly.

stimulus_tuningcurve.stimulus_presentation_number — v1 uses NaN as the missing-trial sentinel in this (N_trials x N_stimuli) index matrix. Concrete example from Soph (4126933d3e1418d3_40b1714033d71086.json):

[..., [149, NaN, 148, NaN, NaN, 159, NaN]]

60 documents in Soph carry NaN sentinels. Documentation updated to state explicitly that NaN marks the absent-presentation slot.

Verification

Python conversion simulator was tightened in lockstep to enforce mustBeScalar, mustNotHaveNaN, and maxLength constraints (previously it only checked mustBeNonEmpty + enum). Under the strict simulator, all 6 v1 corpora migrate cleanly:

corpus total migrated quarantined
PRED 14 14 0
20211116 1,220 1,220 0
B 12,917 12,917 0
Dab 27,561 27,561 0
JH 78,688 78,688 0
Soph 101,427 101,427 0
total 221,827 221,827 0

pytest tests/ 96/96 passing.

Pairs with

did-matlab V2 already has the matching changes: 2e5684c (string-type accepts empty array sentinel) addresses the related response_units: [] reports that came from the same testCorpus20211116 discovery run; PR #130 already merged.

Out of scope

The simulator is a /tmp developer helper, not part of either repo's tree. Codifying it as a shared reference implementation between did-matlab and did-schema is a separate (worthwhile) follow-up.

https://claude.ai/code/session_011wtV7T1TKrxbGBeW71ebQn


Generated by Claude Code

…hapes

Two flags surfaced by closing the divergence between the Python
simulator (which previously skipped mustBeScalar / mustNotHaveNaN)
and did-matlab's V_delta validator (which enforces both).

- stimulus_presentation.stimuli: mustBeScalar true -> false. v1
  ships this as either a scalar struct (one stimulus configuration)
  or a struct array (e.g. a 225-element array for a Hartley basis).
  Across the 6 v1 corpora, 2462 docs ship the array form. The
  scalar-only declaration was incorrect.

- stimulus_tuningcurve.stimulus_presentation_number: mustNotHaveNaN
  true -> false. v1 uses NaN as the missing-trial sentinel in this
  (N_trials x N_stimuli) index matrix. Soph has 60 such docs.
  Documentation now states that explicitly.

Verification: Python simulator (now also enforcing mustBeScalar +
mustNotHaveNaN, mirroring cache.m) reports 0 quarantines across all
6 v1 corpora (PRED 14, 20211116 1220, B 12917, Dab 27561, JH 78688,
Soph 101427 = 221827 docs total). pytest 96/96.
@stevevanhooser stevevanhooser merged commit c6ae52f into main May 17, 2026
4 checks passed
@stevevanhooser stevevanhooser deleted the claude/did-matlab-v2-import-Rs8AX branch May 17, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants