test: add deterministic token-count baseline for command files (#87) by JFK · Pull Request #94 · JFK/gh-issue-driven

JFK · 2026-06-13T05:25:28Z

Closes #87

Summary

Add tests/token-baseline.sh — a deterministic size census of commands/*.md (lines / bytes / ~tokens ≈ bytes/4) with a committed snapshot at tests/fixtures/token-baseline.txt. This is the measurement floor for the v0.14.0 compression milestone (refactor: optimize start.md — compress ~28% + fix dry-run/verdict/step-18b precision bugs #89–refactor: optimize config/doctor/status — PMRP->appendix, schema->bullets #92): prove per-command reductions and catch accidental bloat, with no LLM/tokenizer.
--check prints the table + drift vs the snapshot and always exits 0 (informational; a bloat hard-fail guard is deferred per gate1). --update refreshes the snapshot.
tests/token-baseline-test.sh self-tests the tool (non-vacuous: row-count, ~tokens==bytes/4, --update sha1 idempotence, OK-match, and the drift→exit-0+WARN contract).

Implementation notes

Reproducibility across Windows/WSL and Linux CI: the script strips CR before counting, and commands/*.md, the snapshot, and *.sh are pinned to LF via .gitattributes. (The working tree was CRLF — raw wc -c differed from the normalized count, so this is load-bearing for the milestone's "prove reduction" goal.)
Wired into .github/workflows/lint.yml: the self-test gates CI; --check runs as an informational step.
CONTRIBUTING.md documents the refresh procedure.
Baseline at this branch: TOTAL 4573 lines / 313710 bytes / ~78424 tokens (top cost centers: start.md ~21277, ship.md ~12214).

Pre-PR review summary

gate2 mode: advisor-only (gate2.binary_gate = none)
audit: skipped
cso: green
qa-lead: yellow → addressed in commit 8e36504 (added the missing drift-case assertion the review flagged)
cto: green
gate1: green via /claude-c-suite:ask (QA Lead lens)
review provider: code-review

Full reviews are saved in the plugin cache (<branch-flat>.gate1.md / .gate2.md).

🤖 Generated via /gh-issue-driven:ship (autonomous=red-only, milestone v0.14.0)

Add tests/token-baseline.sh — a deterministic size census of commands/*.md (lines, bytes, ~tokens ≈ bytes/4) with a committed snapshot at tests/fixtures/token-baseline.txt, so the v0.14.0 compression milestone (#89-#92) can prove per-command reductions and catch accidental bloat without an LLM. - --check prints the table + drift vs snapshot, ALWAYS exits 0 (informational; a bloat hard-fail guard is deferred per gate1) - --update refreshes the snapshot - CR stripped before counting, and commands/*.md, the snapshot, and *.sh pinned to LF via .gitattributes, so byte counts are reproducible across Windows/WSL and Linux CI - tests/token-baseline-test.sh self-tests the census tool; both wired into .github/workflows/lint.yml (self-test gates, --check is informational) - CONTRIBUTING.md documents the refresh procedure Baseline at this commit: TOTAL 4573 lines / 313710 bytes / ~78424 tokens (start.md ~21277, ship.md ~12214 are the top cost centers). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Gate2 (qa-lead) flagged that the tool's core contract — AC #3, "snapshot drift must still exit 0 (informational) and warn" — was only verified in the match case, never under actual drift. Add a drift-case assertion: append a sentinel row to the snapshot, run --check, assert exit 0 AND a "WARN: size drift" line (captured from stderr), then restore the snapshot via --update. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The scripts were committed mode 100644 because `chmod +x` on Windows does not set git's executable bit. CI invokes them via `bash` (which works), but the self-test's `[ -x ]` contract assertion correctly failed on a clean Linux checkout — the repo convention is executable tests/*.sh (rwxr-xr-x). Set the git exec bit via `git update-index --chmod=+x` on both scripts (blob content unchanged, mode 100644 -> 100755). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The v0.14.0 "token + precision optimization" milestone was re-audited against the actual files (using the token-baseline tool from #94). The premise turned out to be largely invalid: - The "~28% / ~6,400 token" compression is not achievable: slash commands load whole (no runtime include / conditional load), so relocating sections to an appendix saves nothing, and the bulk of start.md/ship.md is load-bearing executable spec that must not be compressed. - The claimed precision bugs were phantom: step-18b precedence is already an If/Else-if chain; verdict last-wins is already explicit (and now test-guarded by #95); the propose.md "parallel Skill" instruction is correct (batched Skill calls are supported); the propose.md "regex mismatch" is a harmless subset, not a contradiction. This commit ships the ONLY verified-safe, genuinely-beneficial residue: - start.md: delete a verbatim-redundant `lang != "en"` localization line (649) that duplicated line 647. - goal.md: convert the red-verdict force-continue prose (phase-aware bullets) into a compact decision table, preserving every load-bearing detail (the gate2.binary_gate `fail` exception, phase routing, continue-to steps). Net effect (per tests/token-baseline.sh): TOTAL ~78,424 -> ~78,355 tokens (-69 tokens, -0.09%). The negligible number is itself the finding — it demonstrates the milestone's compression premise was unfounded, and the token-baseline tool (#87/#94) measuring it is working as intended. Closes #89 Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

JFK and others added 2 commits June 13, 2026 14:20

JFK marked this pull request as ready for review June 13, 2026 05:27

JFK merged commit 583cef7 into main Jun 13, 2026
1 check passed

JFK deleted the 87-test/test-deterministic-token-count-baseline branch June 13, 2026 05:40

JFK mentioned this pull request Jun 13, 2026

refactor: v0.14.0 verified-safe cleanups (close the inflated compression milestone) #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add deterministic token-count baseline for command files (#87)#94

test: add deterministic token-count baseline for command files (#87)#94
JFK merged 3 commits into
mainfrom
87-test/test-deterministic-token-count-baseline

JFK commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JFK commented Jun 13, 2026

Summary

Implementation notes

Pre-PR review summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant