Skip to content

refactor: v0.14.0 verified-safe cleanups (close the inflated compression milestone)#96

Merged
JFK merged 1 commit into
mainfrom
89-refactor/v0140-precision-fixes-safe-dedup
Jun 13, 2026
Merged

refactor: v0.14.0 verified-safe cleanups (close the inflated compression milestone)#96
JFK merged 1 commit into
mainfrom
89-refactor/v0140-precision-fixes-safe-dedup

Conversation

@JFK

@JFK JFK commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Closes #89

TL;DR

The v0.14.0 "token + precision optimization" milestone was re-audited against the actual files (using the token-baseline tool shipped in #94). Its premise turned out to be largely invalid. This PR ships the only verified-safe, genuinely-beneficial residue — two small cleanups netting −69 tokens (−0.09%). That negligible number is the finding.

What the re-audit found

Compression is mostly infeasible, not "~28% / ~6,400 tokens":

The precision "bugs" were phantom:

What this PR actually does (the safe residue)

  • start.md: delete a verbatim-redundant lang != "en" localization line (was line 649) that duplicated line 647.
  • goal.md: convert the red-verdict force-continue prose into a compact decision table, preserving every load-bearing detail (the gate2.binary_gate fail exception, phase routing, continue-to steps).
  • Refreshed tests/fixtures/token-baseline.txt (per the test: deterministic token-count baseline for command files #87 workflow): TOTAL ~78,424 → ~78,355 tokens.

Milestone disposition

The real value of this whole effort already shipped: #94 (the token-baseline measurement tool) and #95 (decline-token parser contract coverage). Those are worth keeping; the compression headline was not real.

Verification

  • tests/token-baseline.sh --check → reductions shown; snapshot refreshed.
  • 28/28 verdict-parser fixtures, enum-sync, token-baseline self-test, frontmatter — all green.

🤖 Generated via /gh-issue-driven (milestone v0.14.0 wind-down)

The v0.14.0 "token + precision optimization" milestone was re-audited against
the actual files (using the token-baseline tool from #94). The premise turned
out to be largely invalid:

- The "~28% / ~6,400 token" compression is not achievable: slash commands load
  whole (no runtime include / conditional load), so relocating sections to an
  appendix saves nothing, and the bulk of start.md/ship.md is load-bearing
  executable spec that must not be compressed.
- The claimed precision bugs were phantom: step-18b precedence is already an
  If/Else-if chain; verdict last-wins is already explicit (and now test-guarded
  by #95); the propose.md "parallel Skill" instruction is correct (batched Skill
  calls are supported); the propose.md "regex mismatch" is a harmless subset, not
  a contradiction.

This commit ships the ONLY verified-safe, genuinely-beneficial residue:

- start.md: delete a verbatim-redundant `lang != "en"` localization line (649)
  that duplicated line 647.
- goal.md: convert the red-verdict force-continue prose (phase-aware bullets)
  into a compact decision table, preserving every load-bearing detail (the
  gate2.binary_gate `fail` exception, phase routing, continue-to steps).

Net effect (per tests/token-baseline.sh): TOTAL ~78,424 -> ~78,355 tokens
(-69 tokens, -0.09%). The negligible number is itself the finding — it
demonstrates the milestone's compression premise was unfounded, and the
token-baseline tool (#87/#94) measuring it is working as intended.

Closes #89

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: optimize start.md — compress ~28% + fix dry-run/verdict/step-18b precision bugs

1 participant