validate-after-apply with rollback — and the triple-write span-splice bug it caught by Wldc4rd · Pull Request #2 · jsgerman-oss/model-advisor

Wldc4rd · 2026-06-09T19:00:25Z

What

Adds the missing safety half of apply/auto-apply: every applied write is re-parsed as TOML immediately after writing; on a parse failure the pre-write content is restored atomically and the failure is surfaced. A malformed write can no longer be left in place to break the next config load — the bad-pin-silently-stops-the-scheduler footgun.

apply: exits 4 with FAILED:/restored text, or "rolled_back": true + "error" in --json.
auto-apply: new per-agent rolled-back status (in the summary roll-up), isolated per agent like error.
New primitives in cli.py: validate_toml_file(), set_tier_fields_validated(), ApplyValidationError.

Rollback granularity is deliberately the single write, not the once-per-run .advisor-bak-*: when several agents share one config file (the city.toml case your own tests model), an earlier valid apply from the same sweep survives a later agent's failed write. There's a test proving exactly that.

The bug the validator caught on first contact

Wiring the validator in made test_single_backup_per_run_for_shared_file fail — because witness's write has been silently producing unparseable TOML all along (the test asserts substrings, not parseability):

set_tier_fields does three sequential set_field writes, but target.span is resolve-time byte offsets. The first write (inserting provider) grows a block-scoped target; the second write slices the body with the stale end offset, cutting mid-way through an existing model line near the block tail and splicing its orphaned remainder back after the replacement:

model = "claude-opus-4-8"-4-5"     # ← was model = "claude-haiku-4-5"

Insert-only triple-writes recombine safely, which is why the other block-apply tests stayed green — it needs an existing field line near the tail being replaced after an insert grew the block.

Fix: re-classify the target (fresh span) before each subsequent write; flat targets (span=None) are unaffected. With the fix, witness applies cleanly and the shared-file test passes again — now with the validator guaranteeing parseability behind it.

Tests

+6: validator unit; apply rollback (human + JSON, exit 4); auto-apply rolled-back status + summary count; shared-file earlier-apply-survives-later-rollback; span-splice regression. Suite: 258 passed.

Found while evaluating model-advisor for adoption in our gc city — the eval's code review flagged "no validate-after-apply" as the one gap in an otherwise unusually defensive write path (dry-run defaults, Critical pins, evidence gates, backups all already there). Companion catalog PR: jsgerman-oss/provider-forge#1.

🤖 Generated with Claude Code

…iple-write span splice Two coupled changes to the apply/auto-apply write path: 1. VALIDATE-AFTER-APPLY (new): every applied write is re-parsed as TOML immediately after writing. On a parse failure the pre-write content is restored atomically and the failure is surfaced — apply exits 4 with a rolled_back JSON field; auto-apply reports a new per-agent "rolled-back" status (counted in the summary). A malformed write can no longer be left in place to break the next config load (the bad-pin-silently-stops-the-scheduler footgun). Rollback granularity is deliberately the SINGLE write, not the once-per-run .advisor-bak-*: when several agents share one config file (the city.toml case), an earlier valid apply from the same sweep survives a later agent's failed write. 2. SPAN-SPLICE FIX (latent bug the validator caught on first contact): set_tier_fields does three sequential set_field writes but target.span is resolve-time byte offsets; the first insert grows a block-scoped target, so the next write sliced the body with a stale end — cutting an existing field line mid-string and splicing its remainder after the replacement: model = "claude-haiku-4-5" became model = "claude-opus-4-8"-4-5" (unparseable). The repo's own shared-file scenario (test_single_backup_per_run_for_shared_file) silently produced a broken file; its assertions checked substrings, not parseability. Fix: re-classify the target (fresh span) before each subsequent write; flat targets (span=None) unaffected. Tests: +6 (validator unit; apply rollback human+JSON exit 4; auto-apply rolled-back status + summary; shared-file earlier-apply-survives-rollback; span-splice regression). Suite: 258 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validate-after-apply with rollback — and the triple-write span-splice bug it caught#2

validate-after-apply with rollback — and the triple-write span-splice bug it caught#2
Wldc4rd wants to merge 1 commit into
jsgerman-oss:mainfrom
Wldc4rd:validate-after-apply

Wldc4rd commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Wldc4rd commented Jun 9, 2026

What

The bug the validator caught on first contact

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant