CTlanston · CTlanston · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026
diff --git a/WORKBOOK_v6.md b/WORKBOOK_v6.md
@@ -17,17 +17,17 @@
 schema_version: 6
 product: ordinary-user-loop-os
 version_target: loop-os-v1
-current_phase: V6-P3          # V6-P0..P6，见 §3；P2 卡片化座舱随本 PR 落地
-current_substep: p3_real_proof_closeout_pending
-last_session_id: s_v6_0002
+current_phase: V6-P6          # V6-P0..P6，见 §3；P4/P5 代码随本 PR 落地
+current_substep: p3_real_proof_still_operator_gated   # P3 真证明仍待操作员（真 Draft PR + 真判词）
+last_session_id: s_v6_0003
 open_holds: 0
 blocked_on: none
-test_baseline: 864            # main 基线，0 fail；本周期任何回归即闸红
+test_baseline: 889            # main 基线，0 fail；本周期任何回归即闸红
 merge_policy: human_only      # 系统永不 merge；auto-merge 本周期禁用
 # next_action 硬上限 2 行：
 next_action: |
-  V6-P2 已交付：LoopCard 五卡主表面 + 浏览器 E2E 每步断言卡型与 next_step（证据入库）。
-  下一步 V6-P3：真实证明收口——真 Draft PR、库内真实 Gemini 判词、fail-closed 回归。
+  V6-P4/P5 代码完成：递归 planner+cycle ledger（loop:plan）、soak 运营化（SOAK_OPERATIONS+soak:status）。
+  下一步 V6-P6 普通用户验收；V6-P3 真证明仍操作员闸（cycle-1 提案即此 gap）。
 ```
 
 ---

diff --git a/docs/SESSION_LOG_v3.md b/docs/SESSION_LOG_v3.md
@@ -1,5 +1,14 @@
 # SESSION LOG v3
 
+## s_v6_0003 · 2026-06-11 · V6-P4 + V6-P5 code complete · recursive planner + soak operationalization
+
+- **V6-P4 recursive planner** (TDD, 20 tests first): `packages/daemon/src/recursive-planner.ts` — pure `planNextCycle` refuses on dirty tree / red tests / blocked budget (carries the budget reason) / ambiguous SoT (≠1 live root-workbook claim, `detectSotAmbiguity`) / open holds / unmerged previous-cycle PR / unparseable §0 / empty gap registry, each with a human recovery action; otherwise proposes exactly ONE gap by the fixed order safety_evidence > user_ux > automation > fleet > polish (ties by stable input order). Cycle ledger `appendCycleLedger` → `evidence/loop-cycles/cycle-<n>.json` {decision, timestamp, workbook_phase, chosen_gap} AND event-sources `planner.cycle_planned` so `rebuildCycleLedgerFromEvents` reproduces the on-disk ledger (GR#5, tested). GR#8 kept: zero child_process in daemon src (eslint static guard stays green).
+- Shell `scripts/loop-planner.ts` (`pnpm loop:plan`): gathers REAL inputs — `git status --porcelain`, WORKBOOK_v6 §0 yaml block, root-workbook SoT scan, budget via `checkHeadlessBudget` against `$AEDEV_HOME/state.db` when present (else default-allow with explicit `no-db` note), holds = max(db active holds, §0 open_holds), previous-cycle gate fail-closed unless `AEDEV_LOOP_PREV_PR_MERGED=1`, `pnpm test` actually run unless operator-asserted via `AEDEV_LOOP_TESTS_GREEN`. Prints a PlanCard-shaped JSON + human text, writes the ledger entry, and STOPS — it never implements, never pushes, never merges (GR#10).
+- **First real run committed as evidence**: `evidence/loop-cycles/cycle-1.json` + full output — on a clean tree with the suite green it PROPOSED `v6-p3-real-proof-closeout` (top safety_evidence priority), i.e. the planner correctly points at the operator-gated V6-P3 closeout instead of inventing automatable work.
+- **V6-P5 soak ops**: `docs/operations/SOAK_OPERATIONS.md` — exact one-week command (`AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak`), launchd plist (mirrors `scripts/launchd/` direct-node pattern, KeepAlive crash recovery), evidence dir contract (`evidence/fleet-soak/<ts>/`), one-step resume, failure-recovery table, ntfy wiring (notify-pr-ready.sh pattern; hold pushes already via watchdog), report classification template (GR#7). `soak-pending.json` artifact: `packages/daemon/src/soak-status.ts` (5 tests: build/derive/sticky-terminal/roundtrip/fail-closed reader) + `scripts/soak-status.ts` CLI (`pnpm soak:status [start|complete|fail]`); `running` past `expected_end` honestly reads `overdue`, never silent-completes.
+- Gates: `pnpm typecheck` PASS; `pnpm lint` PASS; `pnpm test` PASS with 889 passed, 6 skipped (130 files; +25 over the 864 baseline, zero regressions).
+- §0 → `current_phase: V6-P6`. Honesty (GR#7): V6-P4/P5 are **code complete**; V6-P3 real proof stays operator-gated (real Draft PR URL + real Gemini verdict still pending on the operator Mac — exactly what cycle-1 proposes); the one-week soak itself remains **unproven** until its evidence lands.
+
 ## s_v6_0002 · 2026-06-11 · V6-P2 complete · card cockpit (UI renders the five loop cards)
 
 - New `apps/dashboard/src/pages/cockpit/LoopCard.tsx` (TDD: 13 component tests first): renders exactly the five GR#11 card types from `overview.operatorView.card`; calm bilingual copy; `next_step` is always the prominent first row (`cockpit-loop-card-next-step`); the `machine` sub-object is never visible text — raw codes live only in `data-card-type` / `data-user-state` / `data-machine-stage` / `data-hold-code` / `data-pr-gate-code`. Blocker card shows `human_explanation` + `why_it_matters` + recovery actions, zero raw codes.

diff --git a/docs/operations/SOAK_OPERATIONS.md b/docs/operations/SOAK_OPERATIONS.md
@@ -0,0 +1,196 @@
+# SOAK OPERATIONS — one-week real fleet soak (V6-P5)
+
+> Runbook for taking the proven in-container soak harness
+> (`scripts/fleet-soak.ts`, 5/5 PASS with simulated executors) to a real,
+> unattended, ≥1-week run on the operator's Mac. Closes assessment gap #19's
+> apparatus; the soak RESULT itself stays honest: until a week-long run's
+> evidence lands in-repo, rubric #19 remains **unproven** (GR#7).
+>
+> Status artifact contract: `packages/daemon/src/soak-status.ts` (tested).
+> CLI: `pnpm soak:status` (`scripts/soak-status.ts`).
+
+## 1. The one-week command
+
+```bash
+cd ~/projects/claude-code-247
+# 604800000 ms = 7 days. Evidence lands in evidence/fleet-soak/<ISO-timestamp>/.
+AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak
+```
+
+Recommended unattended wrapper (status artifact + ntfy on exit):
+
+```bash
+pnpm soak:status start \
+  && if AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak; then
+       pnpm soak:status complete
+     else
+       pnpm soak:status fail
+     fi
+```
+
+Notes:
+- The harness itself already enforces the safety/test env: remote writes off,
+  all external CLIs/APIs disabled — a week-long soak spends **zero** credit
+  while idle (idle-zero-credit is one of its PASS criteria).
+- `AEDEV_SOAK_INTERVAL_MS` (default 200) can be raised to 1000–5000 for a
+  week-long run to keep CPU negligible.
+
+## 2. `soak-pending.json` status artifact (contract)
+
+Path: `evidence/fleet-soak/soak-pending.json`
+(override: `AEDEV_SOAK_PENDING_PATH`). Exact shape:
+
+```json
+{
+  "started_at": "2026-06-11T00:00:00.000Z",
+  "expected_end": "2026-06-18T00:00:00.000Z",
+  "status": "running"
+}
+```
+
+- `status` ∈ `running | completed | overdue | failed`.
+- `expected_end = started_at + AEDEV_SOAK_MS` (default one week).
+- `running` past `expected_end` **reads as** `overdue` — honest "needs a
+  human look", never a silent fake-complete. `completed`/`failed` are
+  terminal and sticky.
+- Readers fail closed: a missing/corrupt artifact reads as "no soak pending"
+  (`pnpm soak:status` exits 1), so nothing acts on half-written state.
+
+Commands:
+
+```bash
+pnpm soak:status            # read + time-derived status
+pnpm soak:status start      # window from AEDEV_SOAK_MS (default 1 week)
+pnpm soak:status complete   # after the report is generated and checked
+pnpm soak:status fail       # the run died and will not be resumed
+```
+
+## 3. Evidence directory contract
+
+```
+evidence/fleet-soak/
+  soak-pending.json              status artifact (§2)
+  <ISO-timestamp>/               one directory per soak run (the harness creates it)
+    soak-report.md               PASS/FAIL per criterion + honesty note
+    metrics.json                 machine-readable criteria, drill, idle counters
+```
+
+The report MUST keep the harness's real/simulated classification: real
+daemon + real HTTP + real Ed25519 vs simulated executors. A week-long run
+with simulated executors still does NOT check rubric #19's "real-CLI on
+operator machines" box — say so in the report.
+
+## 4. launchd (unattended + crash recovery)
+
+Mirror of `scripts/launchd/com.claude247.daemon.plist.tpl` (node executes the
+entry DIRECTLY so launchd tracks the real PID — no pnpm/tsx wrapper chain).
+Save as `~/Library/LaunchAgents/com.claude247.fleet-soak.plist`, replacing the
+`@@…@@` placeholders like `scripts/install_launchd.sh` does:
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+  <key>Label</key><string>com.claude247.fleet-soak</string>
+  <key>ProgramArguments</key>
+  <array>
+    <string>@@NODE@@</string>
+    <string>--import</string>
+    <string>tsx</string>
+    <string>@@REPO_ROOT@@/scripts/fleet-soak.ts</string>
+  </array>
+  <key>WorkingDirectory</key><string>@@REPO_ROOT@@</string>
+  <key>KeepAlive</key>
+  <dict>
+    <key>SuccessfulExit</key><false/>
+  </dict>
+  <key>RunAtLoad</key><true/>
+  <key>StandardOutPath</key><string>@@LOG_DIR@@/fleet-soak.out.log</string>
+  <key>StandardErrorPath</key><string>@@LOG_DIR@@/fleet-soak.err.log</string>
+  <key>EnvironmentVariables</key>
+  <dict>
+    <key>HOME</key><string>@@HOME@@</string>
+    <key>PATH</key><string>@@PATH@@</string>
+    <key>AEDEV_SOAK_MS</key><string>604800000</string>
+    <key>AEDEV_SOAK_INTERVAL_MS</key><string>1000</string>
+    <key>AEDEV_NTFY_TOPIC</key><string>@@NTFY_TOPIC@@</string>
+  </dict>
+</dict>
+</plist>
+```
+
+```bash
+launchctl load  ~/Library/LaunchAgents/com.claude247.fleet-soak.plist   # install + start
+launchctl list | grep fleet-soak                                        # check
+launchctl unload ~/Library/LaunchAgents/com.claude247.fleet-soak.plist  # stop/remove
+```
+
+`KeepAlive.SuccessfulExit=false` is the crash-recovery: a crash (or
+`kill -9`) restarts the harness; a clean PASS/FAIL exit does not loop.
+
+## 5. Resume after a crash / kill -9
+
+The harness is self-contained per run (each start creates a fresh
+`evidence/fleet-soak/<ts>/`): resume = restart. One step back to standby:
+
+```bash
+launchctl kickstart -k gui/$(id -u)/com.claude247.fleet-soak
+# without launchd:
+pnpm soak:status start && AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak
+```
+
+Honesty rule for the report: a restarted soak's wall-clock week starts over
+(`soak-pending.json` shows the real `started_at`). Do not stitch two partial
+runs into one "week" — record both directories and say what happened.
+
+## 6. Failure recovery
+
+| Symptom | Recovery |
+|---|---|
+| `pnpm soak:status` says `overdue` | The window elapsed without a completion mark. Check `@@LOG_DIR@@/fleet-soak.*.log` and the run's `soak-report.md`; then `pnpm soak:status complete` (report ok) or `fail` (run died) |
+| Harness exits non-zero (criterion FAIL) | The report names the failing criterion. Read `metrics.json`, fix, restart (§5). Mark `pnpm soak:status fail` for the dead run |
+| launchd crash-loop (`fleet-soak.err.log` repeats) | `launchctl unload`, fix the cause, `launchctl load` again. The plist never restarts after a clean exit, so loops mean a real startup error |
+| Mac rebooted mid-soak | `RunAtLoad` restarts it on login; restart-honesty rule of §5 applies |
+| Artifact corrupt/missing | Readers fail closed (§2); `pnpm soak:status start` rewrites it (a fresh window — say so in the report) |
+
+## 7. ntfy wiring
+
+Same pattern as `scripts/notify-pr-ready.sh` — topic from `AEDEV_NTFY_TOPIC`
+(optional self-hosted base via `AEDEV_NTFY_URL`); without a topic it prints
+instead of pushing (never blocks):
+
+```bash
+# soak finished (wrap the §1 command):
+pnpm soak:status start \
+  && if AEDEV_SOAK_MS=604800000 pnpm test:fleet:soak; then
+       pnpm soak:status complete
+       curl -fsS -X POST "${AEDEV_NTFY_URL:-https://ntfy.sh}/$AEDEV_NTFY_TOPIC" \
+         -H "Title: aedev · fleet soak PASS" -H "Priority: high" \
+         -d "one-week soak complete — report in evidence/fleet-soak/"
+     else
+       pnpm soak:status fail
+       curl -fsS -X POST "${AEDEV_NTFY_URL:-https://ntfy.sh}/$AEDEV_NTFY_TOPIC" \
+         -H "Title: aedev · fleet soak FAIL" -H "Priority: urgent" \
+         -d "soak failed — check evidence/fleet-soak/ and logs"
+     fi
+```
+
+Hold-change pushes during the soak are already covered by the daemon-side
+watchdog (`packages/daemon/src/watchdog.ts` → `ntfy.ts`): every new
+`HOLD-*` (including the forged-evidence drill's `HOLD-EVIDENCE-MISMATCH`)
+emits `operator.notify_requested` + an ntfy push when the daemon runs with
+`AEDEV_NTFY_TOPIC` set.
+
+## 8. Report template (real/simulated explicit — GR#7)
+
+The harness writes `soak-report.md` per run. For the week-long acceptance,
+append this classification block before committing the evidence:
+
+```markdown
+## Classification (GR#7)
+- real: daemon, HTTP fleet protocol, Ed25519 identities, freeze path, durations
+- simulated: task executors (no subscription CLI was spawned)
+- unproven-after-this-run: real-CLI multi-machine soak (rubric #19 full check)
+- restarts during the week: <n> (directories: <list>)
+```
diff --git a/evidence/loop-cycles/cycle-1-loop-plan-output.txt b/evidence/loop-cycles/cycle-1-loop-plan-output.txt
@@ -0,0 +1,59 @@
+
+> aedev@2.4.0-patch1 loop:plan /home/user/claude-code-247
+> tsx scripts/loop-planner.ts
+
+[loop-planner] running the full test suite (set AEDEV_LOOP_TESTS_GREEN=1|0 to assert instead)…
+
+=== loop-planner inputs (real, gathered by this run) ===
+- repoDirty = false
+    via git status --porcelain → empty
+- workbook §0 = current_phase=V6-P3
+    via extracted yaml block from /home/user/claude-code-247/WORKBOOK_v6.md
+- sotAmbiguous = false
+    via root workbooks scanned: WORKBOOK_v4.md, WORKBOOK_v6.md · live SoT claimants: WORKBOOK_v6.md
+- testsGreen = true
+    via ran `pnpm test` in this invocation and used its exit code
+- budgetVerdict = {"allowed":true,"reason":"no-db"}
+    via no /root/.aedev/state.db in this environment — nothing has spent headless credit (default-allow with no-db note)
+- openHolds = 0
+    via max(state.db active holds=0, WORKBOOK §0 open_holds=0)
+- prevCyclePrMerged = true
+    via no previous proposal in evidence/loop-cycles — first cycle
+
+=== PlanCard (decision) ===
+{
+  "type": "plan",
+  "title": "Planner proposal · 递归 planner 提案",
+  "next_step": "Human decision: accept this proposal by starting a session on it, or ignore it. The planner stops here — it never implements, never pushes, never merges.",
+  "machine": {
+    "user_state": "planner_proposal",
+    "stage": "loop-planner",
+    "hold_code": null,
+    "pr_gate_code": null
+  },
+  "objective": "One bounded cycle toward: Operator-gated real-proof closeout: real Draft PR URL + real in-repo Gemini verdict artifact (operator Mac, runbook docs/operations/P4-first-real-draft-pr.md). Output stops at evidence + at most a Draft PR — the system never merges (GR#10).",
+  "phases": [
+    "V6-P3"
+  ],
+  "acceptance_criteria": [
+    "The chosen gap moves with in-repo evidence (GR#7)",
+    "Output stops at evidence + at most a Draft PR; merge stays human-only (GR#10)"
+  ],
+  "risk_level": "low",
+  "estimated_calls": 0,
+  "requires_approval": true,
+  "proposal": {
+    "gapId": "v6-p3-real-proof-closeout",
+    "phase": "V6-P3",
+    "rationale": "Highest-priority open gap by the fixed v6 order (safety_evidence > user_ux > automation > fleet > polish): category=safety_evidence, phase=V6-P3, workbook current_phase=V6-P3.",
+    "expectedDeliverable": "One bounded cycle toward: Operator-gated real-proof closeout: real Draft PR URL + real in-repo Gemini verdict artifact (operator Mac, runbook docs/operations/P4-first-real-draft-pr.md). Output stops at evidence + at most a Draft PR — the system never merges (GR#10)."
+  }
+}
+
+=== human text ===
+PROPOSE cycle 1: gap "v6-p3-real-proof-closeout" (phase V6-P3)
+  why: Highest-priority open gap by the fixed v6 order (safety_evidence > user_ux > automation > fleet > polish): category=safety_evidence, phase=V6-P3, workbook current_phase=V6-P3.
+  deliverable: One bounded cycle toward: Operator-gated real-proof closeout: real Draft PR URL + real in-repo Gemini verdict artifact (operator Mac, runbook docs/operations/P4-first-real-draft-pr.md). Output stops at evidence + at most a Draft PR — the system never merges (GR#10).
+
+Ledger entry written: /home/user/claude-code-247/evidence/loop-cycles/cycle-1.json
+loop-planner stops here. A human (or a human-started session) acts on this — never this script.
diff --git a/evidence/loop-cycles/cycle-1.json b/evidence/loop-cycles/cycle-1.json
@@ -0,0 +1,15 @@
+{
+  "cycle": 1,
+  "decision": {
+    "action": "propose",
+    "cycle": {
+      "gapId": "v6-p3-real-proof-closeout",
+      "phase": "V6-P3",
+      "rationale": "Highest-priority open gap by the fixed v6 order (safety_evidence > user_ux > automation > fleet > polish): category=safety_evidence, phase=V6-P3, workbook current_phase=V6-P3.",
+      "expectedDeliverable": "One bounded cycle toward: Operator-gated real-proof closeout: real Draft PR URL + real in-repo Gemini verdict artifact (operator Mac, runbook docs/operations/P4-first-real-draft-pr.md). Output stops at evidence + at most a Draft PR — the system never merges (GR#10)."
+    }
+  },
+  "timestamp": "2026-06-11T02:15:38.901Z",
+  "workbook_phase": "V6-P3",
+  "chosen_gap": "v6-p3-real-proof-closeout"
+}
diff --git a/package.json b/package.json
@@ -30,6 +30,8 @@
     "test:e2e:sandbox": "tsx scripts/e2e-sandbox.ts",
     "test:hermus:mission": "tsx scripts/hermus-mission-smoke.ts",
     "test:fleet:soak": "tsx scripts/fleet-soak.ts",
+    "loop:plan": "tsx scripts/loop-planner.ts",
+    "soak:status": "tsx scripts/soak-status.ts",
     "test:mission-os:dry-soak": "node --import tsx scripts/mission-os-dry-soak.ts",
     "test:workbook": "tsx scripts/workbook-acceptance.ts",
     "typecheck": "pnpm -r typecheck",

diff --git a/packages/daemon/src/index.ts b/packages/daemon/src/index.ts
@@ -108,6 +108,30 @@ export type {
   DefaultValidatorSecretStatus,
   ValidatorSecretResolver,
 } from './validator-factory.js'
+// V6-P4: budget facts for the loop-planner shell (scripts/loop-planner.ts).
+export { checkHeadlessBudget, countHeadlessCallsToday } from './headless-budget-guard.js'
+export {
+  CYCLE_PLANNED_EVENT,
+  GAP_PRIORITY,
+  appendCycleLedger,
+  claimsSourceOfTruth,
+  detectSotAmbiguity,
+  parseSection0,
+  planNextCycle,
+  rebuildCycleLedgerFromEvents,
+} from './recursive-planner.js'
+export type {
+  AppendCycleLedgerOptions,
+  CycleLedgerEntry,
+  GapCategory,
+  PhaseGap,
+  PlanDecision,
+  PlannerInput,
+  Section0State,
+} from './recursive-planner.js'
+// V6-P5: soak-pending status artifact (shell: scripts/soak-status.ts).
+export { WEEK_MS, buildSoakPending, deriveSoakStatus, readSoakPending, writeSoakPending } from './soak-status.js'
+export type { SoakPending, SoakStatus } from './soak-status.js'
 export { InterruptionPolicy } from './interruption-policy.js'
 export type {
   InterruptionReason,