Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions WORKBOOK_v6.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,17 @@
schema_version: 6
product: ordinary-user-loop-os
version_target: loop-os-v1
current_phase: V6-P3 # V6-P0..P6,见 §3;P2 卡片化座舱随本 PR 落地
current_substep: p3_real_proof_closeout_pending
last_session_id: s_v6_0002
current_phase: V6-P6 # V6-P0..P6,见 §3;P4/P5 代码随本 PR 落地
current_substep: p3_real_proof_still_operator_gated # P3 真证明仍待操作员(真 Draft PR + 真判词)
last_session_id: s_v6_0003
open_holds: 0
blocked_on: none
test_baseline: 864 # main 基线,0 fail;本周期任何回归即闸红
test_baseline: 889 # main 基线,0 fail;本周期任何回归即闸红
merge_policy: human_only # 系统永不 merge;auto-merge 本周期禁用
# next_action 硬上限 2 行:
next_action: |
V6-P2 已交付:LoopCard 五卡主表面 + 浏览器 E2E 每步断言卡型与 next_step(证据入库)。
下一步 V6-P3:真实证明收口——真 Draft PR、库内真实 Gemini 判词、fail-closed 回归
V6-P4/P5 代码完成:递归 planner+cycle ledger(loop:plan)、soak 运营化(SOAK_OPERATIONS+soak:status)。
下一步 V6-P6 普通用户验收;V6-P3 真证明仍操作员闸(cycle-1 提案即此 gap)
```

---
Expand Down
9 changes: 9 additions & 0 deletions docs/SESSION_LOG_v3.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# SESSION LOG v3

## s_v6_0003 · 2026-06-11 · V6-P4 + V6-P5 code complete · recursive planner + soak operationalization

- **V6-P4 recursive planner** (TDD, 20 tests first): `packages/daemon/src/recursive-planner.ts` — pure `planNextCycle` refuses on dirty tree / red tests / blocked budget (carries the budget reason) / ambiguous SoT (≠1 live root-workbook claim, `detectSotAmbiguity`) / open holds / unmerged previous-cycle PR / unparseable §0 / empty gap registry, each with a human recovery action; otherwise proposes exactly ONE gap by the fixed order safety_evidence > user_ux > automation > fleet > polish (ties by stable input order). Cycle ledger `appendCycleLedger` → `evidence/loop-cycles/cycle-<n>.json` {decision, timestamp, workbook_phase, chosen_gap} AND event-sources `planner.cycle_planned` so `rebuildCycleLedgerFromEvents` reproduces the on-disk ledger (GR#5, tested). GR#8 kept: zero child_process in daemon src (eslint static guard stays green).
- Shell `scripts/loop-planner.ts` (`pnpm loop:plan`): gathers REAL inputs — `git status --porcelain`, WORKBOOK_v6 §0 yaml block, root-workbook SoT scan, budget via `checkHeadlessBudget` against `$AEDEV_HOME/state.db` when present (else default-allow with explicit `no-db` note), holds = max(db active holds, §0 open_holds), previous-cycle gate fail-closed unless `AEDEV_LOOP_PREV_PR_MERGED=1`, `pnpm test` actually run unless operator-asserted via `AEDEV_LOOP_TESTS_GREEN`. Prints a PlanCard-shaped JSON + human text, writes the ledger entry, and STOPS — it never implements, never pushes, never merges (GR#10).
- **First real run committed as evidence**: `evidence/loop-cycles/cycle-1.json` + full output — on a clean tree with the suite green it PROPOSED `v6-p3-real-proof-closeout` (top safety_evidence priority), i.e. the planner correctly points at the operator-gated V6-P3 closeout instead of inventing automatable work.
- **V6-P5 soak ops**: `docs/operations/SOAK_OPERATIONS.md` — exact one-week command (`AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak`), launchd plist (mirrors `scripts/launchd/` direct-node pattern, KeepAlive crash recovery), evidence dir contract (`evidence/fleet-soak/<ts>/`), one-step resume, failure-recovery table, ntfy wiring (notify-pr-ready.sh pattern; hold pushes already via watchdog), report classification template (GR#7). `soak-pending.json` artifact: `packages/daemon/src/soak-status.ts` (5 tests: build/derive/sticky-terminal/roundtrip/fail-closed reader) + `scripts/soak-status.ts` CLI (`pnpm soak:status [start|complete|fail]`); `running` past `expected_end` honestly reads `overdue`, never silent-completes.
- Gates: `pnpm typecheck` PASS; `pnpm lint` PASS; `pnpm test` PASS with 889 passed, 6 skipped (130 files; +25 over the 864 baseline, zero regressions).
- §0 → `current_phase: V6-P6`. Honesty (GR#7): V6-P4/P5 are **code complete**; V6-P3 real proof stays operator-gated (real Draft PR URL + real Gemini verdict still pending on the operator Mac — exactly what cycle-1 proposes); the one-week soak itself remains **unproven** until its evidence lands.

## s_v6_0002 · 2026-06-11 · V6-P2 complete · card cockpit (UI renders the five loop cards)

- New `apps/dashboard/src/pages/cockpit/LoopCard.tsx` (TDD: 13 component tests first): renders exactly the five GR#11 card types from `overview.operatorView.card`; calm bilingual copy; `next_step` is always the prominent first row (`cockpit-loop-card-next-step`); the `machine` sub-object is never visible text — raw codes live only in `data-card-type` / `data-user-state` / `data-machine-stage` / `data-hold-code` / `data-pr-gate-code`. Blocker card shows `human_explanation` + `why_it_matters` + recovery actions, zero raw codes.
Expand Down
196 changes: 196 additions & 0 deletions docs/operations/SOAK_OPERATIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# SOAK OPERATIONS — one-week real fleet soak (V6-P5)

> Runbook for taking the proven in-container soak harness
> (`scripts/fleet-soak.ts`, 5/5 PASS with simulated executors) to a real,
> unattended, ≥1-week run on the operator's Mac. Closes assessment gap #19's
> apparatus; the soak RESULT itself stays honest: until a week-long run's
> evidence lands in-repo, rubric #19 remains **unproven** (GR#7).
>
> Status artifact contract: `packages/daemon/src/soak-status.ts` (tested).
> CLI: `pnpm soak:status` (`scripts/soak-status.ts`).

## 1. The one-week command

```bash
cd ~/projects/claude-code-247
# 604800000 ms = 7 days. Evidence lands in evidence/fleet-soak/<ISO-timestamp>/.
AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak
```

Recommended unattended wrapper (status artifact + ntfy on exit):

```bash
pnpm soak:status start \
&& if AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak; then
pnpm soak:status complete
else
pnpm soak:status fail
fi
```

Notes:
- The harness itself already enforces the safety/test env: remote writes off,
all external CLIs/APIs disabled — a week-long soak spends **zero** credit
while idle (idle-zero-credit is one of its PASS criteria).
- `AEDEV_SOAK_INTERVAL_MS` (default 200) can be raised to 1000–5000 for a
week-long run to keep CPU negligible.

## 2. `soak-pending.json` status artifact (contract)

Path: `evidence/fleet-soak/soak-pending.json`
(override: `AEDEV_SOAK_PENDING_PATH`). Exact shape:

```json
{
"started_at": "2026-06-11T00:00:00.000Z",
"expected_end": "2026-06-18T00:00:00.000Z",
"status": "running"
}
```

- `status` ∈ `running | completed | overdue | failed`.
- `expected_end = started_at + AEDEV_SOAK_MS` (default one week).
- `running` past `expected_end` **reads as** `overdue` — honest "needs a
human look", never a silent fake-complete. `completed`/`failed` are
terminal and sticky.
- Readers fail closed: a missing/corrupt artifact reads as "no soak pending"
(`pnpm soak:status` exits 1), so nothing acts on half-written state.

Commands:

```bash
pnpm soak:status # read + time-derived status
pnpm soak:status start # window from AEDEV_SOAK_MS (default 1 week)
pnpm soak:status complete # after the report is generated and checked
pnpm soak:status fail # the run died and will not be resumed
```

## 3. Evidence directory contract

```
evidence/fleet-soak/
soak-pending.json status artifact (§2)
<ISO-timestamp>/ one directory per soak run (the harness creates it)
soak-report.md PASS/FAIL per criterion + honesty note
metrics.json machine-readable criteria, drill, idle counters
```

The report MUST keep the harness's real/simulated classification: real
daemon + real HTTP + real Ed25519 vs simulated executors. A week-long run
with simulated executors still does NOT check rubric #19's "real-CLI on
operator machines" box — say so in the report.

## 4. launchd (unattended + crash recovery)

Mirror of `scripts/launchd/com.claude247.daemon.plist.tpl` (node executes the
entry DIRECTLY so launchd tracks the real PID — no pnpm/tsx wrapper chain).
Save as `~/Library/LaunchAgents/com.claude247.fleet-soak.plist`, replacing the
`@@…@@` placeholders like `scripts/install_launchd.sh` does:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key><string>com.claude247.fleet-soak</string>
<key>ProgramArguments</key>
<array>
<string>@@NODE@@</string>
<string>--import</string>
<string>tsx</string>
<string>@@REPO_ROOT@@/scripts/fleet-soak.ts</string>
</array>
<key>WorkingDirectory</key><string>@@REPO_ROOT@@</string>
<key>KeepAlive</key>
<dict>
<key>SuccessfulExit</key><false/>
</dict>
<key>RunAtLoad</key><true/>
<key>StandardOutPath</key><string>@@LOG_DIR@@/fleet-soak.out.log</string>
<key>StandardErrorPath</key><string>@@LOG_DIR@@/fleet-soak.err.log</string>
<key>EnvironmentVariables</key>
<dict>
<key>HOME</key><string>@@HOME@@</string>
<key>PATH</key><string>@@PATH@@</string>
<key>AEDEV_SOAK_MS</key><string>604800000</string>
<key>AEDEV_SOAK_INTERVAL_MS</key><string>1000</string>
<key>AEDEV_NTFY_TOPIC</key><string>@@NTFY_TOPIC@@</string>
</dict>
</dict>
</plist>
```

```bash
launchctl load ~/Library/LaunchAgents/com.claude247.fleet-soak.plist # install + start
launchctl list | grep fleet-soak # check
launchctl unload ~/Library/LaunchAgents/com.claude247.fleet-soak.plist # stop/remove
```

`KeepAlive.SuccessfulExit=false` is the crash-recovery: a crash (or
`kill -9`) restarts the harness; a clean PASS/FAIL exit does not loop.

## 5. Resume after a crash / kill -9

The harness is self-contained per run (each start creates a fresh
`evidence/fleet-soak/<ts>/`): resume = restart. One step back to standby:

```bash
launchctl kickstart -k gui/$(id -u)/com.claude247.fleet-soak
# without launchd:
pnpm soak:status start && AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak
```

Honesty rule for the report: a restarted soak's wall-clock week starts over
(`soak-pending.json` shows the real `started_at`). Do not stitch two partial
runs into one "week" — record both directories and say what happened.

## 6. Failure recovery

| Symptom | Recovery |
|---|---|
| `pnpm soak:status` says `overdue` | The window elapsed without a completion mark. Check `@@LOG_DIR@@/fleet-soak.*.log` and the run's `soak-report.md`; then `pnpm soak:status complete` (report ok) or `fail` (run died) |
| Harness exits non-zero (criterion FAIL) | The report names the failing criterion. Read `metrics.json`, fix, restart (§5). Mark `pnpm soak:status fail` for the dead run |
| launchd crash-loop (`fleet-soak.err.log` repeats) | `launchctl unload`, fix the cause, `launchctl load` again. The plist never restarts after a clean exit, so loops mean a real startup error |
| Mac rebooted mid-soak | `RunAtLoad` restarts it on login; restart-honesty rule of §5 applies |
| Artifact corrupt/missing | Readers fail closed (§2); `pnpm soak:status start` rewrites it (a fresh window — say so in the report) |

## 7. ntfy wiring

Same pattern as `scripts/notify-pr-ready.sh` — topic from `AEDEV_NTFY_TOPIC`
(optional self-hosted base via `AEDEV_NTFY_URL`); without a topic it prints
instead of pushing (never blocks):

```bash
# soak finished (wrap the §1 command):
pnpm soak:status start \
&& if AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak; then
pnpm soak:status complete
curl -fsS -X POST "${AEDEV_NTFY_URL:-https://ntfy.sh}/$AEDEV_NTFY_TOPIC" \
-H "Title: aedev · fleet soak PASS" -H "Priority: high" \
-d "one-week soak complete — report in evidence/fleet-soak/"
else
pnpm soak:status fail
curl -fsS -X POST "${AEDEV_NTFY_URL:-https://ntfy.sh}/$AEDEV_NTFY_TOPIC" \
-H "Title: aedev · fleet soak FAIL" -H "Priority: urgent" \
-d "soak failed — check evidence/fleet-soak/ and logs"
fi
```

Hold-change pushes during the soak are already covered by the daemon-side
watchdog (`packages/daemon/src/watchdog.ts` → `ntfy.ts`): every new
`HOLD-*` (including the forged-evidence drill's `HOLD-EVIDENCE-MISMATCH`)
emits `operator.notify_requested` + an ntfy push when the daemon runs with
`AEDEV_NTFY_TOPIC` set.

## 8. Report template (real/simulated explicit — GR#7)

The harness writes `soak-report.md` per run. For the week-long acceptance,
append this classification block before committing the evidence:

```markdown
## Classification (GR#7)
- real: daemon, HTTP fleet protocol, Ed25519 identities, freeze path, durations
- simulated: task executors (no subscription CLI was spawned)
- unproven-after-this-run: real-CLI multi-machine soak (rubric #19 full check)
- restarts during the week: <n> (directories: <list>)
```
59 changes: 59 additions & 0 deletions evidence/loop-cycles/cycle-1-loop-plan-output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@

> aedev@2.4.0-patch1 loop:plan /home/user/claude-code-247
> tsx scripts/loop-planner.ts

[loop-planner] running the full test suite (set AEDEV_LOOP_TESTS_GREEN=1|0 to assert instead)…

=== loop-planner inputs (real, gathered by this run) ===
- repoDirty = false
via git status --porcelain → empty
- workbook §0 = current_phase=V6-P3
via extracted yaml block from /home/user/claude-code-247/WORKBOOK_v6.md
- sotAmbiguous = false
via root workbooks scanned: WORKBOOK_v4.md, WORKBOOK_v6.md · live SoT claimants: WORKBOOK_v6.md
- testsGreen = true
via ran `pnpm test` in this invocation and used its exit code
- budgetVerdict = {"allowed":true,"reason":"no-db"}
via no /root/.aedev/state.db in this environment — nothing has spent headless credit (default-allow with no-db note)
- openHolds = 0
via max(state.db active holds=0, WORKBOOK §0 open_holds=0)
- prevCyclePrMerged = true
via no previous proposal in evidence/loop-cycles — first cycle

=== PlanCard (decision) ===
{
"type": "plan",
"title": "Planner proposal · 递归 planner 提案",
"next_step": "Human decision: accept this proposal by starting a session on it, or ignore it. The planner stops here — it never implements, never pushes, never merges.",
"machine": {
"user_state": "planner_proposal",
"stage": "loop-planner",
"hold_code": null,
"pr_gate_code": null
},
"objective": "One bounded cycle toward: Operator-gated real-proof closeout: real Draft PR URL + real in-repo Gemini verdict artifact (operator Mac, runbook docs/operations/P4-first-real-draft-pr.md). Output stops at evidence + at most a Draft PR — the system never merges (GR#10).",
"phases": [
"V6-P3"
],
"acceptance_criteria": [
"The chosen gap moves with in-repo evidence (GR#7)",
"Output stops at evidence + at most a Draft PR; merge stays human-only (GR#10)"
],
"risk_level": "low",
"estimated_calls": 0,
"requires_approval": true,
"proposal": {
"gapId": "v6-p3-real-proof-closeout",
"phase": "V6-P3",
"rationale": "Highest-priority open gap by the fixed v6 order (safety_evidence > user_ux > automation > fleet > polish): category=safety_evidence, phase=V6-P3, workbook current_phase=V6-P3.",
"expectedDeliverable": "One bounded cycle toward: Operator-gated real-proof closeout: real Draft PR URL + real in-repo Gemini verdict artifact (operator Mac, runbook docs/operations/P4-first-real-draft-pr.md). Output stops at evidence + at most a Draft PR — the system never merges (GR#10)."
}
}

=== human text ===
PROPOSE cycle 1: gap "v6-p3-real-proof-closeout" (phase V6-P3)
why: Highest-priority open gap by the fixed v6 order (safety_evidence > user_ux > automation > fleet > polish): category=safety_evidence, phase=V6-P3, workbook current_phase=V6-P3.
deliverable: One bounded cycle toward: Operator-gated real-proof closeout: real Draft PR URL + real in-repo Gemini verdict artifact (operator Mac, runbook docs/operations/P4-first-real-draft-pr.md). Output stops at evidence + at most a Draft PR — the system never merges (GR#10).

Ledger entry written: /home/user/claude-code-247/evidence/loop-cycles/cycle-1.json
loop-planner stops here. A human (or a human-started session) acts on this — never this script.
15 changes: 15 additions & 0 deletions evidence/loop-cycles/cycle-1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"cycle": 1,
"decision": {
"action": "propose",
"cycle": {
"gapId": "v6-p3-real-proof-closeout",
"phase": "V6-P3",
"rationale": "Highest-priority open gap by the fixed v6 order (safety_evidence > user_ux > automation > fleet > polish): category=safety_evidence, phase=V6-P3, workbook current_phase=V6-P3.",
"expectedDeliverable": "One bounded cycle toward: Operator-gated real-proof closeout: real Draft PR URL + real in-repo Gemini verdict artifact (operator Mac, runbook docs/operations/P4-first-real-draft-pr.md). Output stops at evidence + at most a Draft PR — the system never merges (GR#10)."
}
},
"timestamp": "2026-06-11T02:15:38.901Z",
"workbook_phase": "V6-P3",
"chosen_gap": "v6-p3-real-proof-closeout"
}
2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
"test:e2e:sandbox": "tsx scripts/e2e-sandbox.ts",
"test:hermus:mission": "tsx scripts/hermus-mission-smoke.ts",
"test:fleet:soak": "tsx scripts/fleet-soak.ts",
"loop:plan": "tsx scripts/loop-planner.ts",
"soak:status": "tsx scripts/soak-status.ts",
"test:mission-os:dry-soak": "node --import tsx scripts/mission-os-dry-soak.ts",
"test:workbook": "tsx scripts/workbook-acceptance.ts",
"typecheck": "pnpm -r typecheck",
Expand Down
24 changes: 24 additions & 0 deletions packages/daemon/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,30 @@ export type {
DefaultValidatorSecretStatus,
ValidatorSecretResolver,
} from './validator-factory.js'
// V6-P4: budget facts for the loop-planner shell (scripts/loop-planner.ts).
export { checkHeadlessBudget, countHeadlessCallsToday } from './headless-budget-guard.js'
export {
CYCLE_PLANNED_EVENT,
GAP_PRIORITY,
appendCycleLedger,
claimsSourceOfTruth,
detectSotAmbiguity,
parseSection0,
planNextCycle,
rebuildCycleLedgerFromEvents,
} from './recursive-planner.js'
export type {
AppendCycleLedgerOptions,
CycleLedgerEntry,
GapCategory,
PhaseGap,
PlanDecision,
PlannerInput,
Section0State,
} from './recursive-planner.js'
// V6-P5: soak-pending status artifact (shell: scripts/soak-status.ts).
export { WEEK_MS, buildSoakPending, deriveSoakStatus, readSoakPending, writeSoakPending } from './soak-status.js'
export type { SoakPending, SoakStatus } from './soak-status.js'
export { InterruptionPolicy } from './interruption-policy.js'
export type {
InterruptionReason,
Expand Down
Loading
Loading