Add idempotency short-circuit to v1_to_v2 converter#133
Merged
Conversation
Bodies already in V_delta shape (base.schema_version == 'V_delta' AND no v1-only underscore-prefixed top-level markers) now skip universalRenames and the per-class migrators. ensureClassBlocks and validate still run so the chain rebuild and drift gate happen on every body. Makes the converter safely re-runnable so the database normalisation (issue #3) and migration commands (issues #9, #10) can resume after an interruption without corrupting already-converted docs. Both gates must hold to short-circuit: the schema-version check alone would let a V_delta-tagged body with legacy field shapes slip through, and the underscore-marker check alone would silently skip the bulk of v1 corpora that do not happen to use the legacy markers. Tests: - short-circuit fires on a V_delta body: v1-shaped fields in the epochclocktimes block (clocktype, t0_t1) stay verbatim because the superclass migrator never runs - double-run idempotency: feeding a freshly-migrated body back through v1_to_v2 produces the same struct - mixed batch of v1 and V_delta bodies all migrate, summary counts collapse correctly - short-circuit does not fire when schema_version is absent, even when there are no underscore markers (guards the AND reading) Closes VH-Lab/NDI-matlab#777.
The previous version asserted that the treatment migrator collapsed ontology_name + name into a treatment_name composite, but that migrator was deleted in #130 (did-schema PR #44 reverted the treatment class to the v1 flat shape, so the dispatcher's identity fallback handles it). Verify the universalRenames pass another way: feed a v1-shaped depends_on (carries `id` and the legacy `version` key, no `value`) and assert universalRenames promoted id -> value and dropped the legacy keys. Still exercises the "no schema_version => no short circuit, full pipeline runs" path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements an idempotency short-circuit in the v1-to-v2 document converter to safely handle re-runs and mixed batches of already-converted and unconverted documents. Documents that are already V_delta-shaped skip the expensive universalRenames and per-class migrators, making the converter safely re-runnable after interruptions.
Key Changes
Added
isAlreadyVDelta()helper function that detects V_delta-shaped documents by checking two conditions:base.schema_version == 'V_delta'(set by universalRenames or external writers)Modified main conversion loop to short-circuit when
isAlreadyVDelta()returns true:universalRenamesand per-class migratorsensureClassBlocks(rebuilds superclass chain) and validationAdded comprehensive test coverage:
testShortCircuitOnAlreadyVDeltaBody: Verifies v1-shaped fields are preserved when short-circuitingtestIdempotencyOfDoubleRun: Confirms running converter twice produces identical outputtestMixedBatchOfV1AndVDeltaBodies: Tests batches with both v1 and V_delta documentstestShortCircuitSkippedWhenSchemaVersionMissing: Guards against incomplete short-circuit conditionsImplementation Details
ensureClassBlocksto maintain schema consistencyhttps://claude.ai/code/session_01PcQ9ZBthfXnHaiNcQhQLQd