Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
add0604
v2: scaffold +did2 package (document, schema.cache) per PLAN.md §9 st…
stevevanhooser May 11, 2026
cab747f
ci(test-code): also run on V2 and PRs targeting V2
stevevanhooser May 11, 2026
b05361a
ci(codespell): also run on V2 and PRs targeting V2
stevevanhooser May 11, 2026
ca7fceb
ci(test-code): add workflow_dispatch for manual runs
stevevanhooser May 11, 2026
bfca084
ci(codespell): add workflow_dispatch for manual runs
stevevanhooser May 11, 2026
1e4f345
ci(test-code): pin matbox-actions to last known-good SHA
stevevanhooser May 11, 2026
49cdd29
ci(test-code): flatten to NDI-matlab's working pattern
stevevanhooser May 11, 2026
d56900f
ci(test-code): drop check-code step to dodge matbox codecheck breakage
stevevanhooser May 11, 2026
0ae38ac
ci(test-code): bump MATLAB from R2021b to latest
stevevanhooser May 11, 2026
497a999
ci(test-code): restore check-code step now that release: latest is in
stevevanhooser May 11, 2026
1bb7ce2
ci(test-code): run check-code before installRequirements
stevevanhooser May 11, 2026
d2ac002
tools: move codecheckToolbox wrapper into +didtools to stop shadowing
stevevanhooser May 11, 2026
1ae7ad5
tools: delete shadowed bare codecheckToolbox wrapper
stevevanhooser May 11, 2026
2dcc611
v2: add V_gamma test fixtures for +did2 self-tests
stevevanhooser May 12, 2026
905ff8d
v2: implement did2.schema.cache for V_gamma class-scoped wire shape
stevevanhooser May 12, 2026
7bd0912
v2: did2.document — V_gamma class-scoped shape and toJSON rewrite
stevevanhooser May 12, 2026
bf275a7
v2: add testSchemaCache covering V_gamma class-scoped shape
stevevanhooser May 12, 2026
cbc31c3
v2: PLAN.md — record V_gamma class-scoped shape + close step 1
stevevanhooser May 12, 2026
e1a2f32
v2: did2.document — use isscalar in assignNested per Code Analyzer
stevevanhooser May 12, 2026
4da98f8
v2: update fixtures to V_gamma document_class header
stevevanhooser May 12, 2026
cbcc3dd
v2: cache.m — track V_gamma document_class header restoration
stevevanhooser May 12, 2026
397c27b
v2: did2.document — read class metadata from document_class header
stevevanhooser May 12, 2026
0e409d2
v2: fixtures — drop all underscore prefixes per V_gamma SPEC update
stevevanhooser May 12, 2026
ec466b9
v2: cache.m — direct field access for plain-key V_gamma shape
stevevanhooser May 12, 2026
af00929
v2: did2.document — drop toJSON regex rewrite for plain-key shape
stevevanhooser May 12, 2026
a160bcf
v2: testSchemaCache.m — assertions on plain-key V_gamma shape
stevevanhooser May 12, 2026
d5d63ed
v2: PLAN.md — simplify §4.1 + log the plain-key V_gamma shape
stevevanhooser May 12, 2026
56fa3b7
v2: fixtures — move maturity_level inside document_class
stevevanhooser May 12, 2026
40ab0d9
v2: relocate did2 tests under +unittest, reserve +test and +symmetry …
stevevanhooser May 12, 2026
04226f5
v2: remove old top-level tests/+did2/testDocumentScaffold.m
stevevanhooser May 12, 2026
4d6839a
v2: remove old top-level tests/+did2/testSchemaCache.m
stevevanhooser May 12, 2026
6e52dfd
ci(test-code): discover did2 tests via TestSuite.fromPackage
stevevanhooser May 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/workflows/run-codespell.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@ name: Run Codespell

on:
push:
branches: [ "main" ]
branches: [ "main", "V2" ]

pull_request:
branches: [ "main" ]
branches: [ "main", "V2" ]

workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand Down
70 changes: 61 additions & 9 deletions .github/workflows/test-code.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,28 +2,80 @@ name: Test code

on:
push:
branches: main
branches: [main, V2]
paths-ignore:
- '*.md'
- '.github/**'

pull_request:
branches: main
branches: [main, V2]
paths-ignore:
- '*.md'
- '.github/**'

workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
test-code:
# Flat workflow modeled on NDI-matlab/.github/workflows/run-tests.yml.
# Order matters: check-code runs immediately after install-matbox,
# before any other step touches the MATLAB path. An earlier ordering
# that ran matbox.installRequirements() first made check-code fail
# with "codecheckToolbox: Too many input arguments" even on
# release: latest. installRequirements is moved after check-code
# since the tests need mksqlite + vhlab-toolbox-matlab but the
# static analysis does not.
name: Analyse and test code
uses: ehennestad/matbox-actions/.github/workflows/test-code-workflow.yml@v1
with:
matlab_release: R2021b
matlab_use_cache: true
matlab_products: Statistics_and_Machine_Learning_Toolbox
secrets:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4

- name: Set up MATLAB
uses: matlab-actions/setup-matlab@v2
with:
release: latest
cache: true
products: Statistics_and_Machine_Learning_Toolbox

- name: Install MatBox
uses: ehennestad/matbox-actions/install-matbox@v1

- name: Check code
uses: ehennestad/matbox-actions/check-code@v1
with:
source_directory: 'src'

- name: Install repo dependencies (mksqlite, vhlab-toolbox-matlab)
uses: matlab-actions/run-command@v2
if: always()
with:
command: |
addpath(genpath("src"));
addpath(genpath("tools"));
matbox.installRequirements(didtools.projectdir());

- name: Run tests
uses: matlab-actions/run-command@v2
if: always()
with:
command: |
addpath(genpath("src"));
addpath(genpath("tests"));
import matlab.unittest.TestRunner;
import matlab.unittest.TestSuite;
runner = TestRunner.withTextOutput;
% did2 self-tests live under tests/+did2/+unittest/ now (the
% legacy +did/+unittest/ layout). TestSuite.fromPackage with
% IncludingSubpackages=true picks up the function-based test
% files and any future +abstract base classes. Mirrors the
% discovery pattern in test-symmetry.yml.
suite = TestSuite.fromPackage("did2.unittest", "IncludingSubpackages", true);
results = runner.run(suite);
disp(table(results));
nFailed = sum([results.Failed]);
assert(nFailed == 0, sprintf("%d test(s) failed", nFailed));
165 changes: 162 additions & 3 deletions docs/v2/PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ users.
| 4 | Keep the `matlabdumbjsondb` backend. | Useful for tests, trivial deployments, and as a non-SQL reference implementation of the query model. |
| 5 | Validate on insert by default; expose an `unsafe_insert` escape hatch for bulk loads; offer a `revalidate_all` maintenance op. | Schema files are the source of truth for what "valid" means. |
| 6 | Plan lives at `docs/v2/PLAN.md` on the v2 development branch. | This file. |
| 7 | Provisional namespace: `+did2`. | Picked from §10 option A for the scaffold. Revisit before v2 reaches `main`. |
| 8 | Document instances use a top-level `document_class` header plus class-scoped property blocks (one block per class in the chain, keyed by `class_name` verbatim). | See §4.1. Matches V_gamma_SPEC.md "JSON Format: Document Instances" after the SPEC's two-step revision: (i) restore class-scoped blocks; (ii) drop the underscore prefix on all NDI-extension keys. Every key in the wire shape is a plain MATLAB identifier, so the in-memory MATLAB struct is the JSON shape verbatim. |

Open questions are in §10.

Expand Down Expand Up @@ -87,7 +89,7 @@ CREATE INDEX depends_on_name_value ON depends_on(name, value);

Test 4 in the JSON1 probe confirmed that `STORED GENERATED ALWAYS AS
(json_extract(body, '$.foo.bar'))` works with `mksqlite`. So for each scalar
`_queryable: true` path declared by the V_gamma schemas, we add a stored
`queryable: true` path declared by the V_gamma schemas, we add a stored
generated column directly on `documents` plus an index on it.

The set of paths is computed at database open by walking the loaded schemas:
Expand Down Expand Up @@ -154,6 +156,51 @@ Validation timing: explicit, deferred. The database layer calls
bulk loads. A `revalidate_all(db)` maintenance op exists for the case where
schemas change.

### 4.1 In-memory document shape

A `did2.document`'s `documentProperties` is a MATLAB struct that mirrors the
V_gamma JSON shape *as specified in V_gamma_SPEC.md, "JSON Format: Document
Instances"*, exactly. After V_gamma's "drop underscore prefixes" pass,
every key in the wire shape is a plain identifier with no leading
underscore, so the MATLAB struct field names match the JSON keys
one-to-one. `jsonencode` / `jsondecode` round-trip without any rewrite.

Top-level keys populated by `did2.schema.cache.buildBlankDocument`:

| Key | Type | Contents |
|------------------|--------------|----------|
| `document_class` | struct | `class_name` (concrete class), `class_version` (semver), `superclasses` (struct array; each entry has `class_name` + `class_version` — the document-instance form). |
| `depends_on` | struct array | Each entry: `name` (role) and `value` (the referenced document's id). Empty by default. |
| `base` | struct | Property block with the four base fields (`id`, `session_id`, `name`, `datestamp`). `id` auto-minted via `did.ido.unique_id`, `datestamp` set to current UTC millisecond ISO-8601 with trailing `Z`. |
| `<class_name>` | struct | One property block per class in the chain (root through concrete class). Each populated with `blank_value` for the fields *that class* declares. Empty `{}` if it declares none. |

Field identity is `(declaring_class, name)`. Same-named fields in
different classes of the chain are distinct paths (`base.id` vs.
`<subclass>.id`), not an override.

V_alpha → V_gamma at the document level:

```
V_alpha V_gamma
------- -------
document_class.class_name document_class.class_name
document_class.class_version document_class.class_version
document_class.superclasses document_class.superclasses
document_class.property_list_name (gone; block key == class_name)
document_class.definition (gone; schema files own this)
document_class.validation (gone; schema files own this)
base.id, base.session_id, ... base.id, base.session_id, ...
<property_list_name>.<field> <class_name>.<field>
depends_on depends_on
```

The converter (§7) is now a thin per-document data migration: strip the
extra `document_class` sub-keys (`property_list_name`, `definition`,
`validation`); rename each property block whose `property_list_name`
differs from its `class_name` so the block key equals the class name;
done. NDI-matlab consumers that already speak the V_alpha class-scoped
layout need no source-code rewrites for the wire shape itself.

---

## 5. Schema cache
Expand All @@ -162,7 +209,7 @@ A `+did2/+schema/cache.m` (or similar) loads all V_gamma schema files once,
resolves superclass chains, and pre-computes:

- For each classname: the full inherited field list.
- The subset of fields with `_queryable: true`, split into scalar paths and
- The subset of fields with `queryable: true`, split into scalar paths and
array-iteration paths.
- The named composite type expansions (`duration` → `.seconds`,
`.approximate`, `.source_unit`, `.source_value`).
Expand Down Expand Up @@ -210,7 +257,7 @@ A `+did2/+convert/v1_to_v2.m` tool:
the table lives next to the v2 schema package).
3. Renames top-level keys (`base.id` → `id`, etc.), rewrites collapsed fields
on classes that bumped to `2.0.0` (`probe_location`, `treatment`,
`ontology_image`, `ontology_label`), and reshapes `_ontology` annotations
`ontology_image`, `ontology_label`), and reshapes `ontology` annotations
to the V_gamma two-key form.
4. Validates against V_gamma. Successful docs insert into the new DB; failures
land in a `quarantine` table with the original body and a reason string.
Expand Down Expand Up @@ -293,3 +340,115 @@ no-op, so it no longer appears in `compile_options`. The functional tests
Test 4 passing is the decisive simplification: queryable scalar paths can
live as `STORED` generated columns on `documents` with their own indexes
(§3.2), with no separate sidecar table for scalars.

---

## 12. Progress log

### 2026-05-11 — step 1 scaffold

Started step 1 of §9 on branch `claude/start-v2-development-tA41P`.

Added:

- `src/did/+did2/document.m` — V_gamma document object. API surface
in place (construct from JSON / struct / `(className, values)`,
`get` / `set` / `iterate`, `toJSON` / `toStruct`, `className` /
`classVersion`, `validate`, plus static `fromJSON` / `fromStruct` /
`blank`). Dot-path get/set is implemented in full. The `[*]` array
iterator is implemented via `iterate(arrayPath)`; the bare `get`
rejects paths containing `[*]` to keep the scalar/array distinction
honest. `validate` and `blank` delegate to the schema cache.
- `src/did/+did2/+schema/cache.m` — schema cache class. Singleton
bootstrap, schema-path resolution (env override
`DID_SCHEMA_PATH`, or sibling `did-schema/schemas/V_gamma` checkout),
`getClass`, and `superclasses` traversal are implemented; the
heavier methods (`fieldsFor`, `queryablePaths`,
`buildBlankDocument`, `validateDocument`) currently throw
`did2:notImplemented` and will be filled in next.
- `src/did/+did2/Contents.m` — package overview.
- `tests/+did2/testDocumentScaffold.m` — function-based unit tests
covering construction, dot-path get/set, iterate, round-trip JSON,
and the documented error IDs. Tests that depend on the schema cache
beyond what is implemented are deferred.

Provisional decision (logged in §1 as #7): use `+did2` for the v2
namespace during the scaffold, leaving the §10 rename-vs-parallel
question open for resolution before v2 reaches `main`.

Next up: fill in `did2.schema.cache.fieldsFor`,
`queryablePaths`, and `buildBlankDocument`; then `validateDocument`
against the V_gamma meta-schema; then start the in-memory query
evaluator (step 2).

### 2026-05-12 — class-scoped property blocks, then drop underscores

Two upstream did-schema SPEC revisions landed back-to-back and both
required reworking the +did2 in-memory shape:

1. **Class-scoped property blocks restored** (did-schema commit
`137f583`). V_gamma was amended to organise document instances
into per-class property blocks keyed by class name (one per class
in the chain), instead of the earlier flat namespace. Also moved
`class_name`/`class_version`/`superclasses` under a top-level
`document_class` header.
2. **Drop underscore prefixes** (did-schema commit `77c6363`). The
`_<key>` convention for NDI-extension keys was replaced by plain
keys (`maturity_level`, `depends_on`, `file`, `fields`,
`mustBeNonEmpty`, `blank_value`, `ontology`, etc.). The
authoritative reserved-name list moved to upstream
`ndi_reserved_keys.json`.

Combined, every key in a V_gamma wire shape is now a plain MATLAB
identifier, so the in-memory MATLAB struct is the JSON shape verbatim
— no `x_<name>` aliasing, no `jsonencode`-time rewrite pass, no
`extractField` underscore-probe helper. Round-tripping a V_gamma
document is `jsondecode` then `jsonencode`.

Implemented in `src/did/+did2/+schema/cache.m`:

- `classChain(className)` — root-first list including the class itself
(e.g., `demoB -> {base, demoA, demoB}`).
- `ownFields(className)` — the `fields` list the class declares
directly (no inheritance), via direct `s.fields` access.
- `fieldsFor(className)` — merged inherited fields tagged with their
declaring class. Returns a struct array
`{declaringClass, fieldDef}`.
- `superclasses(className)` — walks
`s.document_class.superclasses[i].class_name` up the chain.
- `buildBlankDocument(className)` — class-scoped V_gamma document:
`doc.document_class.{class_name, class_version, superclasses}`
`doc.depends_on` — empty struct array of `{name, value}`
`doc.<class_name>` for each class in the chain
Base block has `id` auto-minted via `did.ido.unique_id()` and
`datestamp` set to current UTC ISO-8601.
- `validateDocument(docOrStruct)` — accepts a `did2.document` or its
underlying struct, walks the class chain, and validates each
class's `fields` against its property block. Error messages use
the qualified `<class>.<name>` form; new error IDs
`did2:validation:missingClassBlock` and `:badClassBlock`.
- `queryablePaths` stays a stub (belongs to steps 3 and 4).

In `src/did/+did2/document.m`:

- `className` / `classVersion` read
`documentProperties.document_class.class_name` /
`documentProperties.document_class.class_version`.
- `toJSON` is a bare `jsonencode` (no rewrite pass). The previous
`rewriteXUnderscoreKeys` helper is removed.

Fixtures at `tests/+did2/fixtures/V_gamma/` (`base.json`,
`demoA.json`, `demoB.json`, `demoC.json`, `demoFile.json`,
`CURIE_lookups_meta.json`, `README.md`) rewritten to the
plain-key V_gamma shape.

`tests/+did2/testSchemaCache.m` updated: 22 tests assert on the
plain-key shape (`doc.document_class.class_name`, `doc.depends_on`,
etc.) and check that a V_gamma document round-trips through
`toJSON`/`fromJSON` unchanged.

Step 1 is complete to the level the rest of the plan needs.
`queryablePaths` is the only intentional stub left in the cache;
detailed per-named-composite validation and dependency-value checks
are deferred to focused follow-ups. Next up: step 2 — the in-memory
query evaluator over the class-qualified dot-paths.
Loading
Loading