Skip to content

feat(taskrunner): ratchet mode — measure-before-after validation#24

Open
stackbilt-admin wants to merge 3 commits intomainfrom
feat/ratchet-mode
Open

feat(taskrunner): ratchet mode — measure-before-after validation#24
stackbilt-admin wants to merge 3 commits intomainfrom
feat/ratchet-mode

Conversation

@stackbilt-admin
Copy link
Copy Markdown
Member

Summary

Closes #16. Adds an opt-in measurement gate that captures a baseline snapshot of `npm run typecheck` + `npm test` pass/fail on `main` BEFORE the task branch is created, re-runs the same checks on the branch AFTER the task commits, and automatically reverts the task (delete branch, skip push/PR, mark failed) when any check transitioned `pass → fail`. Broken PRs never reach origin.

Opt-in paths

Path How
Per-task `"ratchet": true` in task JSON
Category default `refactor` and `bugfix` tasks ratchet automatically
Environment `CC_RATCHET=1` forces on for every task

Never ratcheted: `docs`, `tests`, `research`, `deploy` — no regression surface or outcomes aren't code-level.

Decision rule

Only `pass → fail` transitions revert. Key edge cases:

  • `fail → fail` → keep (unchanged broken surface, task wasn't expected to fix it)
  • `skip → fail` → keep (first-time check on pre-existing breakage)
  • `fail → pass` → keep (improvement, as expected)

This keeps ratchet from punishing tasks for inheriting broken state.

Snapshot surface

  • `npm run typecheck` exit code → `pass` / `fail` / `skip`
  • `npm test` exit code → `pass` / `fail` / `skip`

Each check degrades to `skip` when `package.json` has no matching script. Zero new dependencies — pure bash + python3.

Integration points

  • Baseline: captured right after `git pull --ff-only`, before branch checkout → measures true main state
  • Post-validation: runs after task commits but before push → regressed branches never reach origin
  • State scoping: `RATCHET_ENABLED` + `RATCHET_BASELINE` are `local` in `execute_task()` and initialized up front so operator-authority tasks (which skip branch creation) don't trip `set -u` unbound-variable errors

Test plan

  • Shell syntax clean (`bash -n`) on both `taskrunner.sh` and `plugin/taskrunner.sh`
  • `ratchet_decision()` smoke-tested against 5 transition cases:
    • `skip → skip` → keep ✓
    • `pass → pass` → keep ✓
    • `pass → fail` → revert (rc=1) ✓
    • `fail → fail` → keep ✓
    • `skip → fail` → keep ✓
  • End-to-end dogfood: queue a synthetic refactor that breaks typecheck, verify ratchet reverts before PR
  • End-to-end dogfood: queue a valid refactor, verify ratchet keeps and PR lands

Applied symmetrically

Both `taskrunner.sh` and `plugin/taskrunner.sh` carry the same ratchet helpers and hooks, following the precedent from #19 (empty-stash fix) and #20 (blast gate).

Env knobs

  • `CC_RATCHET=1|0` — force enable/disable (overrides task fields)
  • `CC_RATCHET_TIMEOUT=` — per-check timeout (default: 180)
  • `CC_DISABLE_RATCHET=1` — legacy alias for `CC_RATCHET=0`

Closes #16

🤖 Generated with Claude Code

Codebeast and others added 3 commits April 10, 2026 06:34
Adds the standardized Stackbilt-dev security reporting template to this
repository. The template is the canonical per-repo security file rolled
out across the entire Stackbilt-dev organization as part of the outbound
disclosure policy (Stackbilt-dev/docs#15).

Key points:
- Primary reporting channel: admin@stackbilt.dev
- GitHub Security Advisory link scoped to this repo
- Response target matrix (critical 24h ack / 7d fix, high 48h / 14d)
- Full policy link at https://docs.stackbilt.dev/security/
- Explicit "do not open public GH issues for vulns" rule

This replaces the implicit policy that existed via the Stackbilt-dev
organization profile with an explicit per-repo file, so the GitHub
security tab surfaces it and external researchers have a clear
reporting path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes #16. Adds an opt-in guard that captures a baseline snapshot of
typecheck + test state on main BEFORE the task branch is created,
re-runs the same checks on the branch AFTER the task commits, and
automatically reverts the branch (delete locally, skip push/PR, mark
failed) when any check transitioned pass→fail.

Opt-in paths
- Per-task: `"ratchet": true` in the task JSON
- Category default: `refactor` and `bugfix` tasks ratchet automatically
- Environment: `CC_RATCHET=1` force-enables for every task

Never ratcheted
- `docs`, `tests`, `research`, `deploy` categories (no regression surface
  or outcomes aren't code-level)

Decision rule
Only pass→fail transitions revert. fail→fail (unchanged broken surface)
and skip→fail (first-time check on a pre-existing breakage) are both
`keep`. fail→pass is `keep`. The goal is to gate regressions, not
punish tasks for inheriting broken state.

Snapshot surface
- `npm run typecheck` exit code → pass/fail/skip
- `npm test` exit code → pass/fail/skip
- Each check is independent and degrades to `skip` when the repo has
  no corresponding script in `package.json`. Zero new dependencies.

Integration points
- Baseline captured right after `git pull --ff-only`, before the task
  branch is checked out (so we measure true main state).
- Post-validation runs after commits but BEFORE push, so a regressed
  branch never reaches origin and never opens a PR.
- Ratchet state is local to each execute_task() call — initialized up
  front so operator-authority tasks (which skip branch creation) don't
  trip unbound-variable errors under set -u.

Applied symmetrically to taskrunner.sh and plugin/taskrunner.sh.

Smoke-tested ratchet_decision() against 5 transition cases:
- skip→skip: keep ✓
- pass→pass: keep ✓
- pass→fail: revert (rc=1) ✓
- fail→fail: keep (no regression) ✓
- skip→fail: keep (first-time surface) ✓

Env knobs
- CC_RATCHET=1|0              force-enable/disable, overrides task fields
- CC_RATCHET_TIMEOUT=<seconds> per-check timeout (default: 180)
- CC_DISABLE_RATCHET=1         legacy alias for CC_RATCHET=0

Closes #16

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Should've been in the prior commit but Edit bailed on an unread file.
Squash candidate on merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: ratchet mode — measure-before-after for autonomous improvements

1 participant