Skip to content

validate-after-apply with rollback — and the triple-write span-splice bug it caught#2

Open
Wldc4rd wants to merge 1 commit into
jsgerman-oss:mainfrom
Wldc4rd:validate-after-apply
Open

validate-after-apply with rollback — and the triple-write span-splice bug it caught#2
Wldc4rd wants to merge 1 commit into
jsgerman-oss:mainfrom
Wldc4rd:validate-after-apply

Conversation

@Wldc4rd

@Wldc4rd Wldc4rd commented Jun 9, 2026

Copy link
Copy Markdown

What

Adds the missing safety half of apply/auto-apply: every applied write is re-parsed as TOML immediately after writing; on a parse failure the pre-write content is restored atomically and the failure is surfaced. A malformed write can no longer be left in place to break the next config load — the bad-pin-silently-stops-the-scheduler footgun.

  • apply: exits 4 with FAILED:/restored text, or "rolled_back": true + "error" in --json.
  • auto-apply: new per-agent rolled-back status (in the summary roll-up), isolated per agent like error.
  • New primitives in cli.py: validate_toml_file(), set_tier_fields_validated(), ApplyValidationError.

Rollback granularity is deliberately the single write, not the once-per-run .advisor-bak-*: when several agents share one config file (the city.toml case your own tests model), an earlier valid apply from the same sweep survives a later agent's failed write. There's a test proving exactly that.

The bug the validator caught on first contact

Wiring the validator in made test_single_backup_per_run_for_shared_file fail — because witness's write has been silently producing unparseable TOML all along (the test asserts substrings, not parseability):

set_tier_fields does three sequential set_field writes, but target.span is resolve-time byte offsets. The first write (inserting provider) grows a block-scoped target; the second write slices the body with the stale end offset, cutting mid-way through an existing model line near the block tail and splicing its orphaned remainder back after the replacement:

model = "claude-opus-4-8"-4-5"     # ← was model = "claude-haiku-4-5"

Insert-only triple-writes recombine safely, which is why the other block-apply tests stayed green — it needs an existing field line near the tail being replaced after an insert grew the block.

Fix: re-classify the target (fresh span) before each subsequent write; flat targets (span=None) are unaffected. With the fix, witness applies cleanly and the shared-file test passes again — now with the validator guaranteeing parseability behind it.

Tests

+6: validator unit; apply rollback (human + JSON, exit 4); auto-apply rolled-back status + summary count; shared-file earlier-apply-survives-later-rollback; span-splice regression. Suite: 258 passed.

Found while evaluating model-advisor for adoption in our gc city — the eval's code review flagged "no validate-after-apply" as the one gap in an otherwise unusually defensive write path (dry-run defaults, Critical pins, evidence gates, backups all already there). Companion catalog PR: jsgerman-oss/provider-forge#1.

🤖 Generated with Claude Code

…iple-write span splice

Two coupled changes to the apply/auto-apply write path:

1. VALIDATE-AFTER-APPLY (new): every applied write is re-parsed as TOML
   immediately after writing. On a parse failure the pre-write content is
   restored atomically and the failure is surfaced — apply exits 4 with a
   rolled_back JSON field; auto-apply reports a new per-agent
   "rolled-back" status (counted in the summary). A malformed write can
   no longer be left in place to break the next config load (the
   bad-pin-silently-stops-the-scheduler footgun).

   Rollback granularity is deliberately the SINGLE write, not the
   once-per-run .advisor-bak-*: when several agents share one config file
   (the city.toml case), an earlier valid apply from the same sweep
   survives a later agent's failed write.

2. SPAN-SPLICE FIX (latent bug the validator caught on first contact):
   set_tier_fields does three sequential set_field writes but target.span
   is resolve-time byte offsets; the first insert grows a block-scoped
   target, so the next write sliced the body with a stale end — cutting an
   existing field line mid-string and splicing its remainder after the
   replacement: model = "claude-haiku-4-5" became
   model = "claude-opus-4-8"-4-5" (unparseable). The repo's own
   shared-file scenario (test_single_backup_per_run_for_shared_file)
   silently produced a broken file; its assertions checked substrings, not
   parseability. Fix: re-classify the target (fresh span) before each
   subsequent write; flat targets (span=None) unaffected.

Tests: +6 (validator unit; apply rollback human+JSON exit 4; auto-apply
rolled-back status + summary; shared-file earlier-apply-survives-rollback;
span-splice regression). Suite: 258 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant