Releases: broomva/bstack
v0.21.1 — Phase 4a content & media migration
Phase 4a content & media migration
Three more skills migrated from standalone broomva/<name> repos into the broomva/skills Tier-2 monorepo (broomva/skills PR #4 merge 2f5aec4):
blog-post— full-stack blog post production (substantial: 28KB SKILL.md + examples/ + references/ + scripts/publish.sh + templates/). NEW registry entry (was previously bundled / not registered separately).brand-icons— brand icon and visual identity asset generation. Registry entry updated fromrepo: broomva/brand-icons→repo: broomva/skills, skillPath: skills/brand-icons/SKILL.md.seo-llmeo— SEO and LLM Engine Optimization (audits, meta tags, structured data, llms.txt). Registry entry updated to monorepo path.
Each source repo carries a redirect-stub README during a 6-month deprecation window (until 2026-11-25):
- broomva/blog-post PR #1 (merge
a7d90b6) - broomva/brand-icons PR #1 (merge
2e20534) - broomva/seo-llmeo PR #1 (merge
2b635d6)
Files changed
references/companion-skills.yaml— 2 entries rewritten (brand-icons,seo-llmeo), 1 new entry added (blog-post)references/skills-roster.md— install commands updated to monorepo pathsVERSION—0.21.0→0.21.1(additive patch — new entries + corrected install paths)CHANGELOG.md— this entry
Pattern note: multi-source-repo migration
Phase 3 migrated 9 sub-skills from ONE bundled source (broomva/strategy-skills/.skills/). Phase 4a tests the multi-source pattern — 3 separate standalone source repos, each with full skill layout (SKILL.md + scripts/ + references/ + assets/). The migration command is now well-rehearsed and ready to crystallize into the bstack skill graduate CLI (Phase 6b).
broomva/blog-post's root canonical content was preserved; 24 IDE-specific dotfile-mirror dirs (.agent/, .claude/, .continue/, etc.) were excluded as deployment artifacts — downstream agents resolve to skills/blog-post/SKILL.md per the agentskills.io spec.
v0.20.0 — Cross-review CLI — restore the P20 mechanism (BRO-1227 Fix B)
Cross-review CLI — restore the P20 mechanism (BRO-1227 Fix B)
The P20 (broomva/cross-review) primitive — cross-model adversarial review on substantive PRs before merge — failed reliably during the 2026-05-21 Wave 3 dispatch session: both Cato sub-agent dispatches stalled within 6-7 tool uses with path-resolution errors. The Cato agent was invoked from the miami workspace but asked to read files at ~/broomva/broomva.tech/... — the working tree was on a different branch than the PR's head SHA, so Read-tool calls drifted into "let me locate the actual repo" loops and never produced output.
This release ships Fix B: a bstack cross-review CLI that reads PR contents via gh pr diff + gh api repos/.../contents/<path>?ref=<sha>. Working-tree state is eliminated as a variable — the CLI can be invoked from any cwd; only --repo <owner/name> + PR number matter.
New files (3)
- NEW
scripts/cross-review.py— argparse CLI. Fetches PR metadata (gh pr view), the diff (gh pr diff), and post-change file contents (gh api …/contents/<path>?ref=<head_sha>); skips lock files and >2000-line adds; bundles into a structured codex prompt; invokescodex exec --sandbox read-only --model gpt-5.4 --skip-git-repo-checkwith a 240s default timeout; parses JSON verdict (with fallbacktry_parse_jsonextractor that recovers a balanced{…}object from prose-wrapped output); writes structured JSON to.bstack-cross-review/<pr>.json+ markdown to<pr>.md. Verdict schema:verdict(pass/concerns/fail/skipped) ×anti_slop_score(0-10) ×criticality(high/medium/low) ×findings[]×blind_spots_surfaced[]×summary. Optional--post-commentposts the markdown verdict back to the PR. Exit codes: 0 pass · 10 concerns · 20 fail · 30 skipped · 2 invocation/gh failure. - NEW
bin/bstack-cross-review— thin shim mirroringbin/bstack-wave: dispatches toscripts/cross-review.py, forwards argv unchanged. - NEW
tests/cross-review.test.sh— 8-test hermetic offline smoke (dispatcher routing, argparse rejection cases, module import,try_parse_jsonrecovery from prose-wrapped JSON,exit_code_for_verdictmapping). No network calls — end-to-end validation is the per-PR--dry-runagainst real PRs documented in the PR body.
Changed files (3)
- CHANGED
bin/bstack— addscross-review)dispatch entry routing tobin/bstack-cross-review. Adds usage section "Review" with a one-liner pointing atcross-review <pr-num> --repo <owner/name>(≥ 0.20.0). Adds the canonical invocation to the Examples block. - CHANGED
SKILL.md— Quick start block listsbstack cross-reviewwith the BRO-1227 Fix B annotation and 0.20.0 introduction marker. - CHANGED
VERSION—0.19.0 → 0.20.0.
Why Fix B over Fix A or Fix C
- Fix A (add
--cwdparameter to Cato dispatch) leaves the working-tree-state coupling intact — every future P20 invocation has to remember to set it, and a stale checkout silently degrades review quality. The failure mode comes back the next time someone uses Cato across repos. - Fix B (always read from git, never the working tree) eliminates the failure mode by construction. The bug surface goes away.
- Fix C (full skill repo + agent definition + tmp-checkout pipeline) is the complete answer but requires writing the
~/.claude/skills/cross-review/skill, deciding whether the Cato agent stays as the codex-exec frontend or gets re-architected, and managing the tmp-checkout cleanup contract. Larger blast radius — deferred to a follow-up once Fix B has soaked.
Test plan executed
bash -n bin/bstack-cross-review # syntax OK
bash -n bin/bstack # syntax OK
python3 -m py_compile scripts/cross-review.py # OK
bin/bstack --help | grep cross-review # 2 lines
bin/bstack cross-review --help | head -20 # argparse usage
bin/bstack cross-review 195 --repo broomva/broomva.tech --dry-run # 9 files fetched, 1 lock skipped
bash tests/cross-review.test.sh # 8/8 pass
What's next (not in this release)
- Apply
bstack cross-reviewto the 3 PRs that merged WITHOUT P20 cross-review last session — broomva.tech#195, #196, life#1427 — and post retro-verdicts as PR comments. Out of scope for this PR (no code change needed; this PR ships the tool). - File a follow-up for the full
~/.claude/skills/cross-review/skill (Fix C scope) once Fix B has soaked through ≥3 P20 invocations.
Backreferences
- BRO-1227 — P20 cross-review mechanism gap (closes via Fix B)
- 2026-05-22 session handoff —
/Users/broomva/conductor/archived-contexts/broomva/wave-3-dispatch-and-linear-updates/handoffs/2026-05-22-SESSION-HANDOFF.md§"Queued + ready to dispatch" - CLAUDE.md §"Cross-Review (P20)" — the discipline rule this mechanism enforces
v0.19.0 — Closure Contract — generalize 5-tuple from 4 RCS layers to N declared arcs
Closure Contract — generalize 5-tuple from 4 RCS layers to N declared arcs
Builds on v0.18.0 (Phase 8 federation, BRO-47) — the federation registry is the substrate that lets per-workspace arc declarations roll up via bstack status --aggregate. Together, v0.18.0 + v0.19.0 close the substrate-completion arc through the user-defined-arcs layer.
v0.14.0 + v0.16.0 already shipped a 5-tuple (plant, sensor, controller, actuator, termination) for 4 hard-coded RCS layers via assets/templates/rcs-parameters.toml.template + scripts/compute-budget-status.sh. This release lifts the same pattern from those 4 layers to N user-declared domain arcs the workspace actually runs every day (PR greenflow, bookkeeping promotion quality, deploy reliability, etc.).
The closure contract: every arc declares (id, plant_surfaces, sensor, actuator, termination, tau_a, shield_ref). The agent's reasoning is the universal Π (controller) — that's not declared, it's the default binding when actuator.kind == "agent_reasoning". Script / mcp_tool / http actuators bind specific mechanisms while keeping the agent in the supervisory role.
Companion: point-in-time → trend monitoring for composite-ω. compute-budget-status.sh --trend appends one snapshot per call to .control/audit/composite-omega-history.jsonl, then reads the last 7 days and reports slope + verdict (stable_flat | drift_up | drift_down | volatile). Doctor §21 surfaces a hard gap only on drift_down — composite stability shrinking is the signal worth interrupting on.
New files (4)
- NEW
schemas/arcs.v1.json— JSON-schema draft-07 for.control/arcs.yaml. Mirrors the style ofschemas/policy.v1.jsonandschemas/workspaces.v1.json. Required arc fields:id(same character class as workspace registry name),plant_surfaces(free-form URIs),sensor(enum:exit_code | json_path | log_match | metric_threshold),actuator(enum:agent_reasoning | script | mcp_tool | http),termination(enum:predicate | wallclock | score_threshold | exit_zero),tau_a(number, seconds). Optional:shield_refpointing at apolicy.yamlgate. - NEW
assets/templates/arcs.yaml.template— declarative arcs template with 2 worked examples and heavy commentary mirroring the rcs-parameters.toml.template intro cadence. Example 1:code-pr-greenflow(json_path sensor, agent_reasoning actuator, predicate termination, tau_a=1800s). Example 2:bookkeeping-promotion-quality(exit_code sensor, agent_reasoning actuator, score_threshold termination, tau_a=86400s). - NEW
scripts/compute-arc-status.sh— per-arc verdict reader; mirrors the shape ofscripts/compute-budget-status.shexactly. Looks at.control/arcs.yaml→ falls back to bundled template. For each arc: runs the sensor (bash -cfor exit_code/json_path/metric_threshold; regex against log file for log_match), reads most recent termination event from.control/audit/arc-<id>.jsonl, evaluates termination predicate, emits verdictgreen | yellow | red | unknown. Outputs JSON (default) or--humantable. Exit codes: 0 all green, 1 ≥1 red, 2 config missing, 3 python3 unavailable. Ships its own inline minimal YAML parser (modeled onscripts/workspace.py_yaml_minimal_parse) — PyYAML preferred, falls back when absent, both code paths exercised by the test suite. - NEW
tests/arcs-validation.test.sh+ NEWtests/omega-drift-trend.test.sh— hermetic bash test suites in thetests/metrics-pipeline.test.shstyle. 6 + 6 tests; both GREEN under system Python (PyYAML, no tomllib path) AND homebrew Python (tomllib, no PyYAML path). Tests exercise schema rejection (schema_version: 99), template loading, override precedence, drift_down / drift_up / stable_flat verdicts on synthetic data, and idempotent history-line writes per--trendcall.
Changed files (2)
- CHANGED
scripts/compute-budget-status.sh— adds--trendflag. Without--trend: existing point-in-time behavior preserved. With--trend: appends{ts, omega, per_layer}snapshot to.control/audit/composite-omega-history.jsonl, then reads last 7 days, computes least-squares slope, baseline (median of first day in window), deviation, volatility (CV), and verdict. Verdict heuristic prefers drift detection over volatility when there's a clear directional signal (slope sign matches relative-deviation sign with magnitude > 1%); volatility is the residual category. Trend block surfaces in--humanas one extra line and as a top-leveltrendobject in--json. - CHANGED
scripts/doctor.sh— adds §20 and §21. §20 reads.control/arcs.yaml(informational when absent), reports arc count + completeness count, surfaces last-termination-event timestamp per arc; hard gap only ifschema_version != 1. §21 readscomposite-omega-history.jsonl, callscompute-budget-status.sh --trend --json, reports last/baseline/slope/verdict; hard gap only ifverdict == drift_down.
Test plan executed
bash -n scripts/compute-arc-status.sh # syntax OK
bash -n scripts/compute-budget-status.sh # syntax OK
bash -n scripts/doctor.sh # syntax OK
python3 -c "import json; json.loads(open('schemas/arcs.v1.json').read())" # schema parses
bash scripts/compute-arc-status.sh --human # reads template, prints table for both arcs
bash scripts/compute-budget-status.sh --trend --human # writes 1 history line, prints trend line
bash tests/arcs-validation.test.sh # 6/6 GREEN under both python envs
bash tests/omega-drift-trend.test.sh # 6/6 GREEN
bash scripts/doctor.sh against ~/broomva # §20 + §21 visible; 87/89 (2 pre-existing gaps unrelated)
Honest scope caveats
- The minimal inline YAML parser inside
compute-arc-status.shcovers exactly the shapeschemas/arcs.v1.jsondeclares. Workspaces that hand-write.control/arcs.yamlwith PyYAML-only features (anchors, multi-doc, flow-style) will need PyYAML installed; otherwise stick to the block-scalar shape shown in the template. arc-<id>.jsonlaudit-event writers are not shipped in this PR. Termination events are read bycompute-arc-status.shwhen present; for now, onlywallclockandexit_zeroterminations evaluate without a prior recorded event.predicateandscore_thresholdarcs surfaceyellow(running) until an event lands. Follow-up: addscripts/arc-event-hook.shso actuators can record verdicts as they close.- Verdict thresholds in
--trend(1% relative deviation, 10% coefficient of variation) are heuristic and calibrated for the broomva workspace's λ range. Tighten / loosen via follow-up policy.yaml block after rule-of-three failure modes accumulate. - The "ω is shrinking" signal in §21 fires only after ≥ 2 history points span the 7-day window. Workspaces that don't periodically invoke
--trend(no scheduled call from/loopor a cron) will see onlystable_flatregardless of underlying drift.
Spec doc + cross-references
- Anchored arcs: prior PR (v0.16.0) shipped the 4-layer hard-coded analogue (
assets/templates/rcs-parameters.toml.template); v0.14.0 shipped the L3 enforcement; this PR generalizes both surfaces to N user-declared arcs. - Why not a new primitive: the Closure Contract is the generalization of the existing (X, U, h, Π, T) substrate that L0–L3 already use. It's a declarative surface lift, not a new reflex. P21 "Closure Contract" promotion candidate logged — promotion to a numbered primitive deferred until rule-of-three concrete failures are recorded (per the L3 stability budget's stability budget for governance churn). The candidate ledger lives in
research/entities/pattern/bstack-engine.mdper CLAUDE.md §Bstack Engine.
v0.18.0 — Phase 8 — Multi-workspace federation registry
Phase 8 — Multi-workspace federation registry
Closes the substrate-completion-arc Phase 8 backlog item: introduces an opt-in
host-level registry (~/.broomva/global/registry.yaml) that catalogues every
bstack-governed workspace on this machine, plus a bstack status --aggregate
rollup that walks the registry and surfaces cross-workspace composite-ω.
Federation is read-only aggregation — each workspace remains the source
of truth for its own state; the registry is the index, not the database.
This release lands on top of v0.16.0 (multi-layer RCS closure) and v0.17.0
(BROOMVA_ROOT convention). Together they form the substrate that PR2
(v0.19.0, BRO-48 — Closure Contract) generalizes from 4 hard-coded RCS layers
to N user-declared domain arcs.
New scripts + bin (3)
- NEW
bin/bstack-workspace— 90-line bash dispatcher delegating to
scripts/workspace.py. Subcommandsregister | list | info | deregister
with--json+--tag+ per-subcommand--path/--namefilters. Exit
codes documented in the help block: 0 ok, 2 invalid args, 3 schema/parse
error, 4 target not found, 5 name conflict at different path. - NEW
scripts/workspace.py— 523-line Python registry manager with an
inline minimal-YAML parser fallback (when PyYAML is absent), atomic
writes (.tmp+replace), and schema-version checking. Honors
BSTACK_REGISTRYenv (default~/.broomva/global/registry.yaml) and
BSTACK_DIRfor VERSION detection. SLO: register/list p50 < 100ms. - NEW
schemas/workspaces.v1.json— JSON-schema draft-07 contract for
the registry.schema_version: 1is the only valid value at v0.18.0;
fieldbstack_versionmatches^[0-9]+\.[0-9]+\.[0-9]+(-[A-Za-z0-9.-]+)?$;
namematches^[A-Za-z0-9][A-Za-z0-9._-]*$(1–64 chars).
New tests (1)
- NEW
tests/workspace.test.sh— 203-line hermetic bash suite, 10 cases:
fresh register, refresh on same path, name conflict at different path (exit
5),info --pathreportsregistered: true, deregister by path, deregister
miss (exit 4), schema_version=99 → exit 3, tag accumulation on refresh,
invalid name (exit 2),--helpblock renders. Every test uses
BSTACK_REGISTRY=$(mktemp)so the host registry is untouched. All 10 GREEN.
Changed (4)
- CHANGED
bin/bstack— dispatcher: newworkspace) exec "$BIN_DIR/bstack-workspace" "$@"case +Federation:usage sectionstatus --aggregatecross-reference under Observability.
- CHANGED
bin/bstack-status— adds--aggregate(alias
--multi-workspace) flag. Reads the registry viabash bin/bstack-workspace list --json, then for each entry attemptsbash <path>/bstack/scripts/compute-budget-status.sh --json
(falls back to reading<path>/.control/audit/composite-omega.jsonl),
emits a table: name × bstack_version × composite_ω × last_seen × verdict.
JSON form via--json --aggregate. Read-only, no writes anywhere. - CHANGED
scripts/doctor.sh— adds §20 (Workspace federation registry):
informational when no registry present (federation opt-in); hard gap on
schema mismatch (schema_version != 1); soft warning on entries with
last_seen_at > 30 days. ReadsBSTACK_REGISTRYenv or default path. - CHANGED
assets/templates/{SKILL,AGENTS,CLAUDE}.md.template— light
surface updates so freshly-bootstrapped workspaces document the
bstack workspacecommands. Federation is not a new primitive (no
P21); it composes existing primitives (Snapshot P15 + multi-layer ω
from v0.16.0 §19).
Doctor section table after this release
| § | Title | Source | Hard gap when |
|---|---|---|---|
| §14 | RCS stability budget | compute-lambda.sh | any λᵢ ≤ 0 |
| §15 | L3 stability gate-flow wiring | install-l3-stability.sh | (informational only) |
| §16 | L0 plant audit | l0-tools.jsonl | >10000 events runaway |
| §17 | L1 autonomic reflex compliance | l1-reflexes.jsonl | compliance < 30% |
| §18 | L2 EGRI promotion throttle | l2-promotions.jsonl | over τ_a₂ budget |
| §19 | Multi-layer composite health | compute-budget-status.sh | any layer unstable |
| §20 | Workspace federation registry | bin/bstack-workspace list | schema_version != 1 |
Test plan executed
bash -nsyntax check across all new + modified scripts → clean.bash bin/bstack-workspace --help→ renders subcommand block.BSTACK_REGISTRY=/tmp/test.yaml bash bin/bstack-workspace register --path /tmp --name test --json
→ exit 0, action: registered.BSTACK_REGISTRY=/tmp/test.yaml bash bin/bstack-workspace list --json→ count: 1.BSTACK_REGISTRY=/tmp/test.yaml bash bin/bstack-workspace deregister --name test --json
→ exit 0, count: 0.bash tests/workspace.test.sh→ 10/10 GREEN.bash scripts/doctor.shagainst worktree → 87/90 passed (baseline; §20
fires as informational with no registry — same total).bash scripts/compute-budget-status.sh --humanagainst ~/broomva
→ composite still stable.
Honest scope caveats
- Federation is local-filesystem only. No network/IPC transport. A
workspace on a remote host has to register itself locally; cross-host
rollup is deferred. - The registry is read-only aggregation.
bstack status --aggregatedoes
not mutate any registered workspace's state. CRDT-replicated cross-workspace
promotion (the swarm-autoresearch-loop primitive) is a future Phase 8.5
spec, not in this release. last_seen_atis updated onregister(which is also "refresh"); it is
not automatically updated by--aggregate. A future hook may bump
last_seen_aton every successfulcompute-budget-status.shinvocation
per workspace.
Spec doc + cross-references
- Linear ticket: BRO-47
- Prior release: v0.17.0 (BROOMVA_ROOT convention, #47, BRO-1223 follow-up)
- Next release: v0.19.0 (Closure Contract — arcs.yaml + composite-ω drift trend,
BRO-48) builds on this substrate. - Concept entity:
research/entities/concept/closure-contract.md(in broomva
workspace) — captures the 5-tuple generalization Phase 8 helps enable.
v0.16.0 — Multi-layer RCS closure — extending the control loop across L0/L1/L2
Multi-layer RCS closure — extending the control loop across L0/L1/L2
Closes the gap identified in PR #45 review: v0.14.0 wired enforcement only at L3 (governance), while L0/L1/L2 had no audit, no per-layer doctor sections, no programmatic feedback into the control loop. The receipts produced by broomva/dogfood were human-read artifacts but did not feed back into the multi-layer stability budget.
This release adds per-layer sensors, audit logs, doctor sections, and a composite multi-layer health report. The dogfood receipt and every tool call become the empirical sensors that calibrate the RCS hierarchy.
New scripts (5)
-
NEW
scripts/l0-tool-audit-hook.sh— Claude Code PostToolUse hook. Logs every tool call (tool name, latency_ms, is_error, file_path when applicable) as one JSONL line to.control/audit/l0-tools.jsonl. Always exits 0; never blocks. -
NEW
scripts/l1-reflex-audit-hook.sh— Claude Code Stop hook. Scans the session transcript for evidence of 21 /autonomous reflexes firing (mechanism cube, lens intake, snapshot, dep-chain, worktree decision, ticket, dogfood plan, validation, first write, empirical, PR opened, watcher, healing, cross-review, deploy verify, receipt, PR comments, auto-merge, cleanup, bridge, bookkeeping). Writes per-session compliance bitmask + anti-rationalization-line evaluation to.control/audit/l1-reflexes.jsonl. -
NEW
scripts/l2-promotion-audit-hook.sh— L2 (EGRI / Crystallize P16) candidate-promotion sensor. Called bybookkeeping.py promotestep with promotion metadata. Counts promotions in last τ_a₂ window (1h default); enforces budget (5 promotions/window default). Exit 2 with warning when over budget — caller SHOULD defer remaining promotions. -
NEW
scripts/compute-budget-status.sh— Multi-layer health reader. Reads all four audit logs (.control/audit/l[0-3]-*.jsonl) + parameters.toml; computes per-layer observed metrics in each layer's τ_a window; emits composite verdict (stable / stable_warn / unstable) per layer + overall. -
NEW
scripts/install-rcs-stability.sh— Unified multi-layer installer. Delegates L3 setup to install-l3-stability.sh (preserves v0.14.0 behavior), then merges PostToolUse (L0) + Stop (L1) hook entries into.claude/settings.jsonvia_bstack_primitivemarkers. Creates.control/audit/directory. Idempotent.
New template (1)
- NEW
assets/templates/settings.json.multi-layer-hooks.snippet— PostToolUse + Stop hook entries. Composes additively with v0.14.0'ssettings.json.l3-stability-hook.snippet(PreToolUse) — each hook entry is uniquely identified by_bstack_primitivemarker (L0-audit,L1-audit,L3-G0) so re-installation is structurally idempotent.
Doctor extensions (4 new sections)
- CHANGED
scripts/doctor.shadds:- §16 L0 plant audit — tool-call count + latency mean + error count over last 10min (informational; hard gap only on >10000 events runaway).
- §17 L1 autonomic reflex compliance — per-session mean compliance rate over last 24h + dogfood-yes count (hard gap if < 30%; soft warn 30-60%).
- §18 L2 EGRI promotion throttle — promotions in last τ_a₂ window vs budget; hard gap when over budget.
- §19 Multi-layer composite health — calls compute-budget-status; surfaces per-layer verdicts (
L0=stable L1=stable L2=stable L3=stableform). Hard gap only if any layer "unstable".
Onboard + repair (wired to install-rcs-stability)
- CHANGED
scripts/onboard.shcallsinstall-rcs-stability.shafter bootstrap (replaces v0.14.0 call toinstall-l3-stability.sh). Falls back to L3-only installer if multi-layer one is absent. - CHANGED
scripts/repair.shrunsinstall-rcs-stability.shwhen doctor reports any L0/L1 audit-log gap, missing G0/G1/G2, or unstable λ. Falls back to L3-only installer.
What changes operationally
Before v0.16.0:
- L0/L1/L2 stability was uncontrolled at the audit level.
- The dogfood receipt was a PR artifact, not a control-loop sensor.
After v0.16.0:
- Every tool call logs to L0 audit. Every session ends with an L1 reflex-compliance record. Every bookkeeping promotion records to L2.
bstack doctor §19andbash scripts/compute-budget-status.sh --humanshow the multi-layer health on demand.- The dogfood receipt's anti-rationalization line is parsed by the L1 Stop hook and recorded as a binary signal — receipts auto-feed back into the control loop.
The 4-gate flow now per-layer
| Layer | Sensor | Audit log | Doctor section | Throttle |
|---|---|---|---|---|
| L0 plant | PostToolUse hook | l0-tools.jsonl |
§16 | informational; runaway detection |
| L1 autonomic | Stop hook (transcript scanner) | l1-reflexes.jsonl |
§17 | hard gap if compliance < 30% |
| L2 EGRI | bookkeeping promote step | l2-promotions.jsonl |
§18 | hard gap if over τ_a₂ budget |
| L3 governance | PreToolUse + git pre-commit + GH Actions (v0.14.0) | l3-edits.jsonl |
§14 + §15 | hard gap if λ₃ ≤ 0 |
Test plan executed
bash -n scripts/*.sh— syntax clean on all new + modified scriptscompute-budget-status.sh --humanagainst ~/broomva → reports L0=stable L1=stable L2=stable L3=stable; composite ω = 0.006398- L0 hook tested with synthetic JSON input → entry appended correctly to l0-tools.jsonl
- L2 hook tested with
--slug --type --score --source→ entry appended; over-budget case (5 + 1 in 1h) → exit 2 with warning install-rcs-stability.sh --dry-runon fresh workspace → reports all install steps without writinginstall-rcs-stability.shreal → L3 (4 files via install-l3-stability) + L0 + L1 hooks merged into settings.json + .control/audit/ created- Re-run installer → all hooks skipped via
_bstack_primitivemarkers (idempotent) - doctor.sh against ~/broomva → §16-§18 informational (audit logs not yet wired in broomva); §19 reports L0=stable L1=stable L2=stable L3=stable composite. 89/91 passed, 2 pre-existing gaps unrelated.
Honest scope caveats
- L0 audit-log retention is unbounded by default; rotation policy (
policy.yaml [audit_retention]) is deferred to a follow-up. - L2 wiring requires
bookkeeping.py promoteto calll2-promotion-audit-hook.sh. The hook itself ships in this PR; the bookkeeping integration is a small follow-up in the broomva/bookkeeping repo (one-line subprocess.run after the promotion write). - L1 transcript-scanner uses heuristic substring matches against the session log. False positives possible (e.g., "interceptor" in a non-validation context counts as r10_empirical). The heuristic is intentionally permissive at v0.16.0; tightening is a follow-up after rule-of-three failure cases accumulate in the audit log.
Spec doc + cross-references
- Spec:
~/broomva/conductor/workspaces/broomva/doha/docs/reports/2026-05-22-multi-layer-closure-spec.html - Prior PR (L3 closure): #45 (v0.14.0, merged 2026-05-22)
- Dogfood skill: github.com/broomva/dogfood v0.1.0
- /autonomous flow record:
~/broomva/conductor/workspaces/broomva/doha/docs/reports/2026-05-22-autonomous-flow-achieved.html
v0.14.0 — L3 stability closure — compute + enforce λ in every bstack workspace
L3 stability closure — compute + enforce λ in every bstack workspace
Closes the gap between the RCS paper (research/rcs/papers/p0-foundations/) and operational reality. Previously, λ₃ ≈ 0.006 was cited in CLAUDE.md and AGENTS.md as static text — no bstack script or hook computed or enforced it. This release wires the math + a four-gate flow (Claude Code hook → git pre-commit → CI → doctor) so every bstack-using workspace inherits computational stability checking on first onboard.
New scripts
-
NEW
scripts/compute-lambda.sh— bash CLI that recomputes per-level λᵢ from a workspace'sparameters.toml(looks at.control/rcs-parameters.toml,research/rcs/data/parameters.toml, or the bundled template). Implements the formulaλᵢ = γᵢ − L_θᵢ·ρᵢ − L_dᵢ·ηᵢ − βᵢ·τ̄ᵢ − ln(νᵢ)/τ_aᵢ. Emits JSON or human-readable; exit 0 if all stable, 1 if any λᵢ ≤ 0, 3 on drift > 1e-4 (--strict). -
NEW
scripts/l3-rate-gate.sh— governance commit rate limiter. Reads L3-class path patterns from.control/rcs-parameters.toml[gates.l3_paths](default: CLAUDE.md, AGENTS.md, .control/policy.yaml, .control/rcs-parameters.toml, METALAYER.md). Counts L3-class commits in the last τ_a₃ window (default 86400s = 1 day). Exits 0 within budget, 1 exceeded. Supports--staged(include uncommitted-but-staged for pre-commit use) and--warn-only. -
NEW
scripts/install-l3-stability.sh— one-shot installer that deploys the L3 gate flow into a workspace:.control/rcs-parameters.toml+.githooks/pre-commit(G1) +.github/workflows/l3-stability.yml(G2) +.claude/settings.jsonPreToolUse hook entry (G0). Idempotent; existing files preserved unless--force. Settings.json merge is structurally idempotent (skips if_bstack_primitive: "L3-G0"already present). -
NEW
scripts/l3-stability-pretool-hook.sh— Claude Code PreToolUse hook backend. Receives tool-call JSON on stdin; emits warning + audit-log entry when an Edit/Write/MultiEdit targets an L3 path. Defaults toapprove(informational) — never blocks the agent; emits areasonstring Claude Code surfaces into context.
New templates
-
NEW
assets/templates/rcs-parameters.toml.template— default workspace parameters (L0–L3 calibrated fromresearch/rcs/data/parameters.toml). Includes[derived.lambda]cached values +[gates.l3_paths]patterns. Self-documenting with the formula and customization notes for non-Life runtimes. -
NEW
assets/templates/githook-pre-commit-l3-rate.sh.template— Gate G1 git pre-commit hook. Callsl3-rate-gate.sh --staged; blocks commit if rate exceeded (override withgit commit --no-verify). Chains to existing.githooks/pre-commit.localif user had a hook before bstack onboarded. -
NEW
assets/templates/gh-workflow-l3-stability.yml.template— Gate G2 GitHub Actions workflow. Triggers on PRs touching L3 paths; runscompute-lambda.sh+l3-rate-gate.sh; comments on the PR with the per-level λ + composite ω + rate verdict; status checkstability-checkfails if any λᵢ ≤ 0 (can be made required via branch protection). -
NEW
assets/templates/settings.json.l3-stability-hook.snippet— Gate G0 Claude Code PreToolUse hook entry. Merged into.claude/settings.jsonbyinstall-l3-stability.sh. Fires on Edit/Write/MultiEdit; backend isscripts/l3-stability-pretool-hook.sh.
Doctor extensions
- CHANGED
scripts/doctor.shadds two sections:- §14 RCS stability budget — calls
compute-lambda.sh; reports composite ω; HARD gap if any λᵢ ≤ 0; SOFT gap on drift > 1e-4 under--strict. - §15 L3 stability gate-flow wiring — verifies G0 (settings.json hook), G1 (.githooks/pre-commit), G2 (.github/workflows/l3-stability.yml), and rcs-parameters.toml are present in the workspace. Each missing piece prints an
[info]line with the install command. Informational only (not a hard gap).
- §14 RCS stability budget — calls
Onboarding + repair
- CHANGED
scripts/onboard.shcallsinstall-l3-stability.shafter bootstrap. New workspaces get the full L3 gate flow on first install. - CHANGED
scripts/repair.shrunsinstall-l3-stability.shwhen doctor reports any G0/G1/G2 piece missing, parameters.toml absent, or λᵢ ≤ 0. Idempotent.
What this changes operationally
Before: λ₃ ≈ 0.006 was a citation. The agent could read it in CLAUDE.md and reason about it, but no machine path computed or enforced it.
After: every bstack-using workspace can run bash scripts/compute-lambda.sh --human and see the live λ values for its own parameters. bstack doctor recomputes on every run. bstack onboard installs the four gates. CI fails on λᵢ ≤ 0. Git pre-commit blocks excess governance churn. Claude Code agents see warnings when editing L3 files.
The 4-gate flow:
| Gate | Trigger | Action | Blocking? |
|---|---|---|---|
| G0 — Claude Code PreToolUse | Edit/Write to L3 path | Warning to agent + audit log entry | No |
| G1 — git pre-commit | Staged L3 path + over-rate | Block commit (bypassable with --no-verify) | Yes (bypassable) |
| G2 — GitHub Actions on PR | L3 path changed | Comment with λ + rate verdict; fail status if λ ≤ 0 | Yes (if branch protection) |
| G3 — bstack doctor §14 + §15 | Every SessionStart + on demand | Recompute λ + verify wiring | Informational |
Test plan executed
compute-lambda.sh --humanagainst broomva'sparameters.toml→ all 4 λᵢ match cached (drift ~ 0)compute-lambda.shwithγ₃perturbed from 0.01 → 0.001 → λ₃ = −0.0026; exit 1l3-rate-gate.sh24h window against ~/broomva → 0 commits, within budget, exit 0l3-rate-gate.sh --window=2592000(30-day) → 142 L3 commits, exceeded, exit 1install-l3-stability.shagainst fake workspace → 4 files installed; re-run skipped 3 + idempotent settings.json mergel3-stability-pretool-hook.shwith{"file_path": ".../CLAUDE.md"}→ emits reason warning; with{"file_path": ".../foo.ts"}→ silent approvedoctor.shagainst ~/broomva → §14 reports composite ω = 0.006398, all stable; §15 reports G0/G1/G2 missing (informational, since broomva hasn't runinstall-l3-stability.shyet)
Why this isn't a new primitive
The four gates are mechanisms, not new primitives. The primitive is the existing RCS L3 stability constraint (cited in CLAUDE.md RCS Hierarchy section and Self-Evolution Protocol). This release implements that constraint in code, where v0.13.0 implemented P11 Empirical operationalization in code. No bstack primitive count change.
v0.13.0 — P11 Empirical operationalization — Dogfood Plan reflex + per-stack cookbook + doctor §13
P11 Empirical operationalization — Dogfood Plan reflex + per-stack cookbook + doctor §13
Closes P11's operationalization gap: the discipline of "validate by interacting" was well-defined, but agents lacked a concrete how keyed to the tech stack the workspace was instantiated from. This release adds:
-
NEW
references/dogfood-patterns.md— per-stack cookbook with surfaces matrix (Tauri+sidecar / Next.js / Expo RN / Rust CLI / REST API / MCP server). Each pattern names the canonical arc, the skill toolkit (Interceptor mandatory for visual deploy verification; gstack, cliclick, screencapture, curl+jq compose per stack), the gotchas observed in production, and the receipt template. Anchored by the Houston dogfood-pattern.html worked example. -
CHANGED
references/primitives.md§P11 — adds reflex rule 7 (Dogfood Plan keyed to detected stack): before substantive work, the agent produces a plan (entry surface · driver · evidence · smoke · end-to-end · receipt anchor) in the response and PR body, citing the per-stack pattern from the cookbook. Companion-reference callout points to the cookbook. -
CHANGED
assets/templates/AGENTS.md.template§P11 — propagates the reflex rule 7 to every new bstack'd project AND stubs a## Dogfood Plan (Stack: TBD)block with the row template, so the first substantive feature work has a concrete anchor to fill. -
CHANGED
SKILL.md— surfacesreferences/dogfood-patterns.mdin the on-demand reference index alongside the canonical primitive contract. -
CHANGED
scripts/doctor.sh— adds §13 P11 Empirical dogfood-readiness. Auto-detects tech stack from repo signals (Cargo.toml + src-tauri/ → tauri-sidecar; next.config.* → nextjs; app.json + expo → expo-rn; Cargo.toml solo → rust-cli; openapi.* or REST-framework deps → rest-api; mcp.{json,yaml} → mcp-server). Verifies a Dogfood Plan anchor exists at one of three accepted locations (AGENTS.md## Dogfood Plan,docs/dogfood-plan.md, or PR body). Informational — never blocks (rule-of-three not yet hit; promotion to blocking gate requires ≥3 documented incidents per Crystallize P16). -
CHANGED
scripts/onboard.sh— after bootstrap, auto-detects stack and substitutes the## Dogfood Plan (Stack: TBD)placeholder with the detected stack name. Surfaces the cookbook reference in the next-step receipt. Persists detected stack in the initialization marker.
L3 stability budget — why this isn't P21
P11 already covers "validate by interacting" with full reflex rules. The gap was operationalization (the how), not coverage. Adding a 21st primitive for the cookbook would consume L3 stability budget (λ₃ ≈ 0.006) for no policy delta. Instead: sub-rule + cookbook + doctor check at L2 operationalization layer.
Promotion gating for §13
The §13 check ships as informational (warn-only). Promotion to policy.yaml blocking gate requires (a) ≥3 documented incidents where missing Dogfood Plan caused user-visible regression, (b) the unambiguous blocking criterion ("PR has no Dogfood Plan in body"), (c) failure mode named (P11 ritual without substance), (d) L3 stability budget available. Logged in research/entities/pattern/bstack-engine.md candidate ledger.
Companion artifact
Human-readable record of /autonomous flow integration at ~/broomva/docs/reports/2026-05-22-autonomous-flow-achieved.html (in workspace repo, separate PR). Per P18 Audience: HTML for human reading; this CHANGELOG + the cookbook + the primitives reference stay markdown for agent loading.
v0.10.0 — Skill-evolution benchmark substrate (BRO-1205)
Skill-evolution benchmark substrate (BRO-1205)
Closes the Empirical (P11) substrate gap. Before 0.10.0, bstack had L3 stability margins (λ₃ ≈ 0.006) and a 20-primitive composition graph but no empirical performance number — every P-primitive promotion was faith-based. 0.10.0 ships the harness that makes those claims falsifiable.
Origin: HKUDS/OpenSpace research dive (see research/entities/project/openspace.md + research/notes/2026-05-20-openspace-evolver-synthesis.md in the consuming workspace). OpenSpace's gdpval_bench/ shipped 4.2× higher earned income vs ClawWork baseline + 46% Phase 2 token reduction on GDPVal; this is the bstack-native port of that substrate, refactored to drop OpenSpace coupling and apply P20 Cross-Review discipline (judge model MUST differ from agent model).
- NEW
scripts/bench/Python package (stdlib only — zero third-party deps):orchestrator.py— two-phase loop (Phase 1 cold → snapshot skills → Phase 2 warm → compare). Subcommands:run | compare | tasks list | status. Resume support via--resume <run-id>. Budget cap via--budget-usd N(exit 4 when exceeded). All-tasks-failed → exit 6.task_loader.py—Taskdataclass + JSONL loader.BSTACK_BENCH_TASKS_DIRenv override for tests.agent_runner.py—DryRunRunner(canned, deterministic, $0 cost) +StubLiveRunner(clear NotImplementedError pointing to spec). Pluggable contract for futureclaude-code/codex/vanilla-anthropicrunners.evaluator.py—RubricMatchEvaluator(deterministic rubric checks:has_section/sentence_count_at_least/bullet_count_at_least/contains_any) +StubLLMJudgeEvaluatorfor the future LLM judge. 0.6 quality cliff (matches OpenSpace + ClawWork policy:quality < 0.6→ payment = 0).tasks/bstack-smoke.jsonl— 3 hand-written bstack-themed tasks (Linear ticket triage, PR diff summary, primitive-symptom matching) with simple rubrics.
- NEW
bin/bstack-bench— bash dispatcher. Mirrorsbstack-crystallize's shape. Robust Python interpreter discovery (PATH lookup + well-known absolute install paths for restricted-PATH environments). - NEW
tests/bench-mvp.test.sh— 14-assertion smoke test verifying: dispatcher--help, task set discovery, exit code shape (2/3/4/5/6), JSONL result schema, comparison + REPORT.md generation, Phase 2 token ratio < 1.0 + Δquality ≥ 0 (canned dry-run deltas),comparewithout args picks latest,statuslists runs, budget cap behavior, live-stub-runner clear migration message, skill-snapshot tarball creation. - CHANGED
bin/bstack— dispatcher:benchsubcommand wired alongsidecrystallize; usage text updated. - CHANGED
SKILL.md— Quick start section lists/bstack benchtriplet.
Design choices
- Stdlib only. No
anthropic, nolitellm, no third-party deps. Matchescrystallize.py's discipline. CI runners ship Python 3.10+; macOS dev fallback probes/opt/homebrew/bin/python3.Xdirectly. - Dry-run is the default. v0.10.0 ships rubric matching + canned responses; live mode is a stub with a clear migration message ("install anthropic SDK + set ANTHROPIC_API_KEY"). This is the responsible /autonomous path — the substrate is exercisable for free; live mode opt-in flips on when SDK+key are wired in a future PR.
- Two-phase protocol from OpenSpace, not the SQLite content-snapshot+diff lineage. Git already gives us lineage on entity pages + skills; we don't need OpenSpace's content-snapshot SQLite schema.
- 0.6 quality cliff preserved (OpenSpace + ClawWork compatibility). Payment is $0 below cliff, full
task_value_usdabove. - Skill snapshot is synthetic in dry-run.
_simulate_phase1_skill_dir()mints a tiny fakephase1-skills/between phases so the snapshot tarball path is exercised without touching~/.claude/skills/. - P20 Cross-Review forward compatibility. The evaluator docstring documents the upcoming constraint: judge model MUST differ from agent model. Enforcement lands when the LLM judge stub is replaced.
- State location follows bstack convention.
~/.config/bstack/bench/runs/<run-id>/(mirrors~/.config/broomva/p7/,~/.config/broomva/p8-janitor/). Override viaBSTACK_BENCH_HOMEenv var (used by tests).
What this enables
- Future PRs can wire FIX/DERIVED/RETIRE sub-modes of Crystallize (P16) — extending P16 from CAPTURED-only to all four sub-modes (the OpenSpace decomposition). Per-skill telemetry counters (
total_selections / total_applied / total_completions / total_fallbacks) and metric-driven evolution triggers are the next layer; this PR is the measurement substrate they build on. - The
liverunner stub is the integration point for the Anthropic SDK /claude --printsubprocess path. Once wired, the same harness runs real-LLM benchmarks against any bstack-instrumented agent. - The substrate composes with P9 (
p9 watchlong-running benches), P12 (persist iteratemulti-hour campaigns), and P19 (Orchestrate cube cell selection for bench shape).
Linked artifacts (in the consuming workspace, not in this repo)
- Spec:
bstack/specs/bench-skill-evolution.md - Project entity:
research/entities/project/openspace.md(9/9 Nous) - Concept entity:
research/entities/concept/skill-self-evolution.md(7/9 candidate) - Synthesis:
research/notes/2026-05-20-openspace-evolver-synthesis.md - Linear ticket: BRO-1205
Cross-Review (P20) round-1 fixes (applied before merge)
A fresh-context subagent under devil's-advocate brief scored the first push at 5/10 (below the ≥7/10 P20 threshold) and surfaced four correctness defects + dead code. Round 1 closes all four with adversarial tests (one per defect, written to falsify the bug existed, not to confirm the fix works):
- Defect #1 — evaluator stub leaked raw traceback.
_run_taskcaughtNotImplementedErrorfromrunner.runbut notevaluator.evaluate.--evaluator llm-judgenow exits 6 with a clean stderr migration message, mirroring the runner path. Test #15. - Defect #2 — budget cap broken on resume.
spentinitialized at 0.0 incmd_run, ignoring prior-session costs in the existing JSONL.--resume --budget-usd 0.01could spend a fresh $0.01 each session. Fixed: when--resume, sumcost_usdfrom every existing phase-results row before the phase loop. Refuses to start if prior cost already exceeds budget. Test #16. - Defect #3 — aggregate double-counted resumed tasks.
_read_existing_resultsreturned every JSONL row; a task that failed then re-ran successfully produced two rows under the sametask_id, inflatingtask_countandtotal_tokens. Fixed: last-write-wins dedup bytask_idin_read_existing_results. Resume-completion contract preserved (success row, if any, is last). Test #17. - Defect #4 — compare emitted phantom regression on phase-1-only runs.
_emit_compareaccepted empty phase2 lists and reported "Phase 2 = 0 tokens, Δquality = -0.8" — noise masquerading as data. Fixed: refuse to compare with exit 7 + clear message when either phase is empty. Test #18. - Cleanups. Removed dead
--workersargparse flag, unusedasdict+EvaluationResultimports, deadtasks.set_defaults(func=cmd_tasks)(subparserrequired=Truemakes it unreachable), and the# pragma: no cover (defensive)slop tell.
Test count: 14 → 18 assertions. New exit code 7 documented in dispatcher + orchestrator headers.
v0.9.5 — Crystallization detection (Phase 7 of substrate completion)
Crystallization detection (Phase 7 of substrate completion)
Closes substrate-completion gap 4.4.1 — until now, P16 (rule-of-three crystallization) ran in the user's head. Phase 7 ships machine-assist: bstack crystallize candidates scans docs/conversations/*.md for patterns that recur in ≥3 distinct sessions with explicit failure-mode and acknowledgement signals. Candidates are surfaced for human approval; the substrate never auto-promotes a primitive.
- NEW
scripts/crystallize.py— rule-of-three pattern detector (Python). Heuristics per spec §6 Phase 7:- Phrase recurs in ≥
--min-sessionsdistinct conversation files (default 3) - ≥1 occurrence co-locates within a 200-char window of a failure-mode keyword (
failed,orphaned,race,regression,shipped broken, …) - ≥1 occurrence co-locates with a repetition-acknowledgement keyword (
again,twice,third time,recurring,had to redo, …) - Substring suppression: keep shorter phrase when it strictly recurs more often than a longer one (the longer is a phrasing variant; the shorter is the recurring kernel)
- n-gram window: 2–4 tokens; 2-grams require both tokens be content (no stop-words); 3+-grams reject stop-word prefix or suffix and require ≥⌈n/2⌉ content tokens
- Citation excerpts are scrubbed for common secret patterns (
sk-…,ghp_…,xoxb-…, AWS/GCP keys, JWTs, genericpassword:/token:assignments) before emission — excerpts may flow into PR comments and CI artifacts - Failure/ack keyword lists overridable via
CRYSTALLIZE_FAILURE_KEYWORDS/CRYSTALLIZE_ACK_KEYWORDSenv vars
- Phrase recurs in ≥
- NEW
bin/bstack-crystallize— thin bash dispatcher; delegates toscripts/crystallize.py. Defaults--conversationsto$BSTACK_CONVERSATIONS→$BROOMVA_WORKSPACE/docs/conversations→$PWD/docs/conversations. - NEW
bstack crystallize candidates [--json] [--conversations <dir>] [--min-sessions <N>] [--limit <N>]— surface detected candidates with citations + signal summaries. - NEW
bstack crystallize promote <slug> [--json]— draft a primitive scaffold (auto-detected pattern + failure mode + ack signals + citations + P16 manual-gate checklist). Explicitly does not auto-merge a primitive; the scaffold is a starting point, not a decision. - NEW
tests/canary/05-crystallize.test.sh— 14 assertions covering: fixtures present,--helplists both subcommands,--jsonshape, known squash-merge-race pattern surfaces at ≥3 sessions,--min-sessions=99returns 0 (no false positives), promote scaffold contains DRAFT + auto-merge disclaimer + P16 reference, unknown subcommand exits 2, missing conversations directory exits 3, unknown promote slug exits 4. - NEW
tests/fixtures/conversations/{positive-1..4,negative-1..2}.md— 4 fixtures sharing thesquash merge racerule-of-three pattern + 2 negative fixtures (no recurrence / no failure-mode signal). - CHANGED
bin/bstack— dispatcher:crystallizesubcommand wired alongsideskills; usage text updated.
Design choices
- Detection, not promotion. Phase 7's contract is that P16 stays a deliberate human decision. The scaffold output explicitly disclaims auto-merge and surfaces the four manual P16 gates (concrete mechanism, stated invariant, stated failure mode, short name).
- Bounded false-positive risk. Per the spec §8 risk table, the failure mode is false-positive ritual detection. Mitigation: candidates are surfaced for human review; the substring-suppression rule keeps the recurring kernel rather than every phrasing variant; the failure-mode + ack co-occurrence filter rejects phrases that recur without an actual problem signal.
- No new dependencies. Pure Python stdlib + standard
jq(already a canary-suite dependency). The detector runs on the same Python interpreter that runsscripts/wave.pyand themeasure-*.shsetpoint scripts.
SLO targets (introduced)
bstack crystallize candidates(fixtures, ~6 files): p50 < 200ms, p99 < 1sbstack crystallize candidates(workspace, ~50 files): p50 < 2s, p99 < 5sbstack crystallize promote <slug>: p50 < 200ms, p99 < 1s (re-runs detection then formats one candidate)
Exit codes
0success (zero or more candidates surfaced)2invalid arguments / unknown subcommand3conversations directory missing4promoteslug not found in current candidate set
Out of scope for v0.9.5 (deferred)
- Setpoint-history-driven trend detection (open question §10.1)
- Cross-workspace candidate aggregation (depends on Phase 8 federation)
- Auto-PR drafting via
gh pr createfromcrystallize promote(deliberate — keeps P16 a manual decision)
v0.9.0 — Vendored upgrade path + canary suite (Phase 6 of substrate completion)
Vendored upgrade path + canary suite (Phase 6 of substrate completion)
Closes two v1.0 blockers from the substrate completion spec (§4.3.2, §4.6.2). Vendored installs (npx skills add produces these — no .git) can now self-upgrade via release tarball + sha256 verification + atomic swap. The canary suite verifies the substrate's load-bearing contracts hold on a fresh install — runs on every PR.
- NEW
bstack upgrade --selffor vendored installs (extendsbin/bstackbstack_upgrade_vendored):- Downloads
bstack-vX.Y.Z.tar.gzfrom the GitHub Release - Downloads matching
.sha256sidecar - Mandatory sha256 verification — no
--skip-sha256flag; fail-closed on mismatch - Atomic swap via
mv current → .bak,mv new → install; rollback on swap failure BSTACK_DRY_RUN=1env override prints the plan without writing- Falls back to manual
npx skills addguidance if tarball missing (pre-v0.9.0 releases) - Structured log at
~/.bstack/auto-upgrade.log
- Downloads
- CHANGED
.github/workflows/release.yml— newPackage + publish vendored upgrade tarballstep:- Builds
bstack-vX.Y.Z.tar.gzfrom the in-repo skill payload (excludes.git,.github,tests, worktree dirs) - Uses
tar --sort=namefor byte-deterministic tarballs - Computes sha256, uploads tarball + sha256 sidecar via
gh release upload --clobber
- Builds
- NEW
tests/canary/01-fresh-bootstrap.test.sh— Plant Contract verification on a fresh workspace (10 assertions: bootstrap exits 0, governance files scaffold, hooks wired for SessionStart/Stop/PreToolUse, doctor produces expected summary, doctor exits 0 per HC-1) - NEW
tests/canary/02-metrics-pipeline.test.sh— Phase 1 (v0.4.0) end-to-end: collect produces valid JSON, latest.json written, observe single-setpoint returns id-matched output - NEW
tests/canary/03-status-surface.test.sh— Phase 2 (v0.5.0) end-to-end: 7 core sections render, --json shape valid, --setpoint deep-view, --aggregate Phase 8 placeholder - NEW
tests/canary/04-schemas-validate.test.sh— Phase 3 (v0.6.0) contracts: 4 schemas valid draft-07, primitives.yaml validates, companion-skills.yaml validates, policy.yaml.template validates (top-level shape + flat-schema parts) - CHANGED
.github/workflows/ci.yml— newcanaryjob gated onlint+doctor; installs jq + jsonschema + PyYAML; runstests/canary/*.test.sh
SLO targets (introduced)
bstack upgrade --self(vendored, cold network): p50 < 30s, p99 < 60s- canary suite (4 tests, sequential): p99 < 30s
Supply-chain safety
- sha256 verification mandatory — no bypass flag
- Atomic swap with
.bakrollback on failure - Tarball excludes ephemeral state and CI tooling — only the canonical skill payload ships
- Backup retained until swap completes successfully
Out of scope for v0.9.0 (deferred to v0.9.1)
- Cosign signature verification — sha256 covers the principal integrity concern; cosign adds publisher identity verification
bstack reproducesubcommand (drift detection vs fresh-install reference)- Canary tests 05-08 (skills auto-install, gates audit, release pipeline E2E) — ship as Phase 4-6 deliverables stabilize