validate-after-apply with rollback — and the triple-write span-splice bug it caught#2
Open
Wldc4rd wants to merge 1 commit into
Open
validate-after-apply with rollback — and the triple-write span-splice bug it caught#2Wldc4rd wants to merge 1 commit into
Wldc4rd wants to merge 1 commit into
Conversation
…iple-write span splice Two coupled changes to the apply/auto-apply write path: 1. VALIDATE-AFTER-APPLY (new): every applied write is re-parsed as TOML immediately after writing. On a parse failure the pre-write content is restored atomically and the failure is surfaced — apply exits 4 with a rolled_back JSON field; auto-apply reports a new per-agent "rolled-back" status (counted in the summary). A malformed write can no longer be left in place to break the next config load (the bad-pin-silently-stops-the-scheduler footgun). Rollback granularity is deliberately the SINGLE write, not the once-per-run .advisor-bak-*: when several agents share one config file (the city.toml case), an earlier valid apply from the same sweep survives a later agent's failed write. 2. SPAN-SPLICE FIX (latent bug the validator caught on first contact): set_tier_fields does three sequential set_field writes but target.span is resolve-time byte offsets; the first insert grows a block-scoped target, so the next write sliced the body with a stale end — cutting an existing field line mid-string and splicing its remainder after the replacement: model = "claude-haiku-4-5" became model = "claude-opus-4-8"-4-5" (unparseable). The repo's own shared-file scenario (test_single_backup_per_run_for_shared_file) silently produced a broken file; its assertions checked substrings, not parseability. Fix: re-classify the target (fresh span) before each subsequent write; flat targets (span=None) unaffected. Tests: +6 (validator unit; apply rollback human+JSON exit 4; auto-apply rolled-back status + summary; shared-file earlier-apply-survives-rollback; span-splice regression). Suite: 258 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds the missing safety half of
apply/auto-apply: every applied write is re-parsed as TOML immediately after writing; on a parse failure the pre-write content is restored atomically and the failure is surfaced. A malformed write can no longer be left in place to break the next config load — the bad-pin-silently-stops-the-scheduler footgun.apply: exits 4 withFAILED:/restoredtext, or"rolled_back": true+"error"in--json.auto-apply: new per-agentrolled-backstatus (in the summary roll-up), isolated per agent likeerror.cli.py:validate_toml_file(),set_tier_fields_validated(),ApplyValidationError.Rollback granularity is deliberately the single write, not the once-per-run
.advisor-bak-*: when several agents share one config file (the city.toml case your own tests model), an earlier valid apply from the same sweep survives a later agent's failed write. There's a test proving exactly that.The bug the validator caught on first contact
Wiring the validator in made
test_single_backup_per_run_for_shared_filefail — because witness's write has been silently producing unparseable TOML all along (the test asserts substrings, not parseability):set_tier_fieldsdoes three sequentialset_fieldwrites, buttarget.spanis resolve-time byte offsets. The first write (insertingprovider) grows a block-scoped target; the second write slices the body with the stale end offset, cutting mid-way through an existingmodelline near the block tail and splicing its orphaned remainder back after the replacement:Insert-only triple-writes recombine safely, which is why the other block-apply tests stayed green — it needs an existing field line near the tail being replaced after an insert grew the block.
Fix: re-classify the target (fresh span) before each subsequent write; flat targets (
span=None) are unaffected. With the fix, witness applies cleanly and the shared-file test passes again — now with the validator guaranteeing parseability behind it.Tests
+6: validator unit; apply rollback (human + JSON, exit 4); auto-apply
rolled-backstatus + summary count; shared-file earlier-apply-survives-later-rollback; span-splice regression. Suite: 258 passed.Found while evaluating model-advisor for adoption in our gc city — the eval's code review flagged "no validate-after-apply" as the one gap in an otherwise unusually defensive write path (dry-run defaults, Critical pins, evidence gates, backups all already there). Companion catalog PR: jsgerman-oss/provider-forge#1.
🤖 Generated with Claude Code