Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 27 additions & 32 deletions plugins/codex-loop-engineering/skills/loop-engineering/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: loop-engineering
description: Use when a substantial coding, research, content, product, or other long-running project needs Codex-centered multi-agent orchestration, role/lane design, cross-model Claude/Codex planning or review, cross-model debate, named Codex agent threads, dispatcher-mediated handoffs, thread ledgers, worklogs, batons, repair loops, or evidence-based arbitration.
---

# Codex Loop Engineering
# Loop Engineering

## Purpose

Expand All @@ -29,16 +29,14 @@ For tiny edits, config tweaks, docs-only notes, or simple local bug fixes, do no
First decide the smallest route that controls risk:

1. Choose the route tier before choosing agents.
2. Ask the user to define the topology before creating lanes: desired roles, parallel vs sequential work, communication rules, and whether Codex-only or Codex-plus-Claude is expected.
3. For each new task, derive the project-specific agent roles first, then build the matching identity table and worklog summary before execution starts.
4. Verify the six-interface contract: goal, state, context, act, capture, stop.
5. For T3/T4, create or identify a Strategic Loop Contract; see `references/strategic-loop-contract.md`.
6. If the strategic target or "good enough" completion criterion is missing, write `strategy-gap: <missing decision>` and stop for a user checkpoint.
7. For multi-round work, define how state and feedback will be recorded; see `references/state-feedback-schema.md`.
8. Same active loop correction or continuation? Reuse the existing `loop_id` and owner lane.
9. Choose `claude_policy` for handoffs and lane artifacts; see `references/claude-policy.md`.
10. Critical direction or degraded-tool decision? Use `references/user-checkpoints.md`.
11. Context or handoff risk? Drop a baton before more work or handoff.
2. Verify the six-interface contract: goal, state, context, act, capture, stop.
3. For T3/T4, create or identify a Strategic Loop Contract; see `references/strategic-loop-contract.md`.
4. If the strategic target or "good enough" completion criterion is missing, write `strategy-gap: <missing decision>` and stop for a user checkpoint.
5. For multi-round work, define how state and feedback will be recorded; see `references/state-feedback-schema.md`.
6. Same active loop correction or continuation? Reuse the existing `loop_id` and owner lane.
7. Choose `claude_policy` for handoffs and lane artifacts; see `references/claude-policy.md`.
8. Critical direction or degraded-tool decision? Use `references/user-checkpoints.md`.
9. Context or handoff risk? Drop a baton before more work or handoff.

## Planning Lane Execution Firewall

Expand Down Expand Up @@ -132,7 +130,7 @@ If the strategic target is absent, write `strategy-gap: <missing decision>` and
For T3/T4, the strategic plan and operational route contract should live together as a Strategic Loop Contract, not as duplicate documents. Use `references/strategic-loop-contract.md`, then optionally run:

```bash
python3 skills/loop-engineering/scripts/validate-loop-contract.py <contract-or-merged-plan.md>
python3 /Users/apple/.codex/skills/loop-engineering/scripts/validate-loop-contract.py <contract-or-merged-plan.md>
```

## Execution Batch Sizing And Review Cadence
Expand Down Expand Up @@ -163,6 +161,14 @@ multi-lane process.

For substantial frontend/product-surface work, the execution contract must state the intended visible UI shape before edits: target screen, main panels/cards, empty/degraded states, backend placeholders, and the user calibration point. Prefer landing a visible skeleton tied to stable contracts before filling deep backend behavior when the user needs to judge the interface.

### Parallel Worktree Execution

When a large T4 execution or repair slice has separable surfaces, prefer a main
integrator/arbitrator plus isolated worktree lanes instead of one overloaded
thread. Never let multiple agents edit the same checkout. Detailed contract,
boundary, arbitration-repair, and lane-rotation rules live in
`references/lane-roles.md`.

Do not use long blank windows for ordinary work. Normal execution/review handoffs should use a practical first check and deadline; deadlines above 45 minutes need an explicit reason such as deep planning, whole-phase architecture review, long test/build operations, or slow external tools. For execution lanes, treat the deadline as a recovery threshold only when the lane appears idle, errored, or artifact-missing without active progress; if the execution lane is visibly active and still editing/testing, keep low-frequency artifact/status monitoring instead of interrupting or declaring failure. If the user says a lane is done, blocked, or wrong, treat that as an immediate state signal: check artifacts first, perform one recovery read if needed, update state/feedback, and route the next lane instead of waiting for the old deadline.

## Admission And New-Lane Gates
Expand Down Expand Up @@ -242,14 +248,14 @@ docs/ai-handoffs/YYYY-MM-DD-slug/
For this project, prefer existing plan conventions:

```text
docs/loop-engineering/plans/YYYY-MM-DD-slug-claude-plan.md
docs/loop-engineering/plans/YYYY-MM-DD-slug-codex-plan.md
docs/loop-engineering/plans/YYYY-MM-DD-slug-merged-plan.md
docs/loop-engineering/plans/YYYY-MM-DD-slug-codex-execution-report.md
docs/loop-engineering/plans/YYYY-MM-DD-slug-claude-review.md
docs/loop-engineering/plans/YYYY-MM-DD-slug-codex-subagent-review.md
docs/loop-engineering/plans/YYYY-MM-DD-slug-arbitration.md
docs/loop-engineering/plans/YYYY-MM-DD-slug-final-report.md
docs/superpowers/plans/YYYY-MM-DD-slug-claude-plan.md
docs/superpowers/plans/YYYY-MM-DD-slug-codex-plan.md
docs/superpowers/plans/YYYY-MM-DD-slug-merged-plan.md
docs/superpowers/plans/YYYY-MM-DD-slug-codex-execution-report.md
docs/superpowers/plans/YYYY-MM-DD-slug-claude-review.md
docs/superpowers/plans/YYYY-MM-DD-slug-codex-subagent-review.md
docs/superpowers/plans/YYYY-MM-DD-slug-arbitration.md
docs/superpowers/plans/YYYY-MM-DD-slug-final-report.md
```

Rules:
Expand Down Expand Up @@ -288,16 +294,6 @@ Core defaults:

Loop lanes are role contracts, not fixed job titles. For coding loops the default roles are planning, execution, review, and arbitration; for non-code long projects, map the same pattern to domain roles such as producer, researcher, scriptwriter, editor, publisher, or QA.

Before any lanes exist, the skill should ask the user to define the topology rather than assuming one:

- What are the roles or lane names for this project?
- Which work should run in parallel, and which work must stay sequential?
- Should the manager be the only cross-lane communicator, or are some direct handoffs allowed?
- Do you want a review lane, an arbitration lane, or both?
- Is Claude optional, required, or not part of the topology?
- At which points should the user be brought in before the loop continues?
- What conditions mean the current plan should be revised instead of letting the loop continue?

Read `references/lane-roles.md` when creating, steering, recovering, or reviewing any lane. That reference defines planning, execution, review, arbitration, manager, and dispatcher behavior, including continuous manager monitoring and low-frequency artifact checks.

Essential constraints:
Expand All @@ -310,7 +306,6 @@ Essential constraints:
- Manager/dispatcher does not own planning/execution/review/arbitration decisions. It tracks artifacts, repairs coordination, and routes handoffs.
- Manager/dispatcher monitoring is artifact-driven and deadline-driven. Do not poll active lane threads every few seconds; each handoff should include `check_after`, `deadline`, and expected artifact paths when the lane may run long.
- When the user asks the manager/dispatcher to keep a loop moving, do not stop with a normal final while required lane artifacts are pending and no blocker has been reached.
- If the loop reaches a product, scope, or tradeoff decision that the user should own, stop and ask rather than continuing autonomously.
- Reviews stay independent: Claude review and Codex review do not read each other before arbitration.
- Arbitration repairs implementation defects inside the merged plan. Return to planning only for plan defects, scope-changing fixes, or user-goal mismatches.
- Critical direction changes and degraded Claude-required gates need a user checkpoint; see `references/user-checkpoints.md`.
Expand Down Expand Up @@ -631,7 +626,7 @@ Do not turn a one-off project lesson into a skill unless it generalizes beyond t
- Giving every lane broad project context when only planning needs it.
- Skipping `thread-ledger.md` rows for `send_message_to_thread`.
- Skipping agent worklog entries, losing lessons and repeated pitfalls.
- Making the manager lane the central relay for all messages instead of letting lanes hand off directly.
- Making `经理Agent` the central relay for all messages instead of letting lanes hand off directly.
- Treating the bootstrap thread as a main agent instead of assigning a real lane role.
- Starting a new lane set for a correction to the active loop instead of messaging the existing owner lane.
- Auto-archiving lane threads before the user has finished evaluating the loop.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,40 @@ For multi-round or multi-lane loops, use `state-feedback-schema.md` to record ho

For non-code workflows, map roles to the task instead of forcing coding labels. Example video workflow: topic planner -> researcher -> scriptwriter -> visual planner/editor -> QA/reviewer -> publisher, with manager/dispatcher coordinating artifacts. The durable value is reusable role memory and artifact flow inside Codex, not the specific plan/execute/review labels.

## Human-Readable Lane Naming

Lane names, thread titles, branch names, and worktree folder names are part of
the product surface for human operators. Manager/dispatcher must choose short,
stable, human-readable names before creating or continuing lanes.

Rules:

- Prefer `role + letter/number + purpose`, not long artifact slugs or opaque
generated ids.
- Keep visible thread titles short enough to scan in a sidebar: usually 3-6
words.
- Include the work type first: `Plan`, `Exec A`, `Exec B`, `Review`, `Arbitrate`,
`Repair A`, `QA`.
- Include the human purpose second: `Shell Pocket`, `Paper Mode`, `PPT Region`,
`Agent Panel`, `Safety QA`.
- Use compact branch/worktree names such as `codex/b12-a-shell-pocket`,
`codex/b12-b-paper`, `codex/b12-c-ppt`, `codex/b12-integrator`, and
`codex/b12-repair-a-paper`.
- Record machine ids separately in the ledger. Do not make humans infer purpose
from thread ids, pending worktree ids, UUIDs, or full artifact filenames.
- If a tool creates a pending worktree/thread with an opaque id, immediately map
it to a human label in the ledger and rename the visible thread when the tool
supports it.

Examples:

| lane | good visible title | good branch | avoid |
|---|---|---|---|
| main integration | `Exec Integrator - B12` | `codex/b12-integrator` | `019eef...` |
| shell/pocket | `Exec A - Shell Pocket` | `codex/b12-a-shell-pocket` | `batch12-parallel-preview-first-research-workflow-shell-pocket-relaunch` |
| paper | `Exec B - Paper Mode` | `codex/b12-b-paper` | `worktree-lane-b-phase2-batch12-paper-manuscript-mode` |
| arbitration | `Arbitrate - B12 Repair` | `codex/b12-arbitration` | `standing-arbitration-lane-019eeec2` |

## Lane Reuse Policy

Lanes are persistent roles when continuity helps: planning/product, execution, arbitration/repair, manager, and dispatcher may reuse an existing visible thread for the same loop if the thread is not blocked, polluted by incompatible scope, or too stale to recover. Reuse is preferred when the lane benefits from local project memory and repeated setup would waste context.
Expand Down Expand Up @@ -358,6 +392,64 @@ Rules:
- If a helper discovers a plan gap, scope change, mutation/runtime need, or product-direction
mismatch, the execution lane records the blocker and stops instead of silently expanding scope.

### Parallel Worktree Execution Lanes

For large T4 execution slices, manager/dispatcher may split implementation across
isolated worktrees instead of asking one execution lane to do everything. Use this
when the work naturally separates by product surface, data/model contract,
runtime boundary, fixture set, visual QA surface, or test/evidence layer.

Default shape:

```text
Main Integrator lane
owns shared contracts, final merge/integration, full verification, consolidated report

Parallel Worktree lanes A/B/C/...
each owns one isolated worktree/branch, one write scope, one lane report

Late QA/Safety lane
starts only after lane outputs or an integrated preview exist
```

Manager/dispatcher contract requirements:

- name every worktree path, branch, lane role, write scope, and forbidden scope;
- forbid concurrent edits to the same checkout/worktree;
- require the integrator to define shared component/data/CSS boundaries before
parallel lanes depend on them, or explicitly state the stable boundaries from
existing code;
- give each lane one expected artifact path and focused verification commands;
- require lane reports to include touched files, tests run, known conflicts,
remaining integration work, safety check, and lifecycle;
- require the integrator to consume lane reports/diffs, resolve conflicts, run
focused then full tests/build/browser evidence, and write one consolidated
execution report;
- state the integration order, such as shell/data boundary -> feature surface A
-> feature surface B -> Agent/interaction -> QA/safety.

Parallel lane rules:

- A lane may edit only its assigned worktree and write scope.
- A lane must not modify the product root, sibling worktrees, shared Git state,
or another lane's report.
- A lane must not stage, commit, push, reset, stash, or run destructive cleanup
unless that operation is explicitly authorized in the handoff.
- If a lane accidentally edits outside its worktree/scope, it must stop,
compare evidence, recover only clearly attributable out-of-scope edits, and
record `Boundary Incident / Recovery` in its report. If attribution is
uncertain, stop and escalate to manager instead of reverting.

Integrator rules:

- The integrator is the only lane that merges/selects patches from parallel
lanes.
- The integrator must not invent completed lane work when lane reports or diffs
are absent; it records blockers or partial integration honestly.
- The consolidated execution report must distinguish lane-local verification
from integrator verification, and must include git/worktree state for the
integration worktree plus any relevant boundary incidents.

## Phase Sizing And Review Cadence

For substantial long-running work, spend more effort in planning and reduce review churn during execution.
Expand All @@ -376,7 +468,7 @@ Planning should define:

If planning cannot state the strategic target, write `strategy-gap: <missing decision>` and stop for a user checkpoint. Do not compensate by writing a longer execution task list.

Execution should group work into larger slices when the pieces belong to the same user workflow. For example, a content production workflow might group research collection, material preparation, draft creation, editing, and QA into one or two coherent execution slices instead of five helper-sized review loops.
Execution should group work into larger slices when the pieces belong to the same user workflow. Example for a biology learning workbench: daily guide/status command center, acquisition refresh facade, queue lifecycle commands, weekly project refresh, and PPT readiness command/report may be grouped into one or two coherent execution slices instead of five helper-sized review loops.

An execution slice is too small for a full review loop when it only changes one helper, one internal function, or one local cleanup without landing a visible workflow state, product surface, durable contract, migration boundary, or risk boundary.

Expand Down Expand Up @@ -460,6 +552,43 @@ Arbitration lane should:
- stay reusable for the active loop unless the full milestone closes, a replacement
lane is confirmed, the lane is corrupted/stale, or the user explicitly asks.

### Parallel Arbitration / Repair Worktrees

Large repair phases may use isolated worktree repair lanes, but adjudication
authority stays centralized. Use this when accepted findings span separable
surfaces, many files, or conflicting patches that would overload one arbitration
thread.

Default shape:

```text
Chief Arbitration lane
owns findings disposition, repair contract, integration, final report

Repair Worktree lanes A/B/C/...
each owns one accepted finding group or product surface

Optional QA/Safety lane
checks integrated repairs after the chief arbitration lane has a preview
```

Rules:

- Only the chief arbitration lane decides finding disposition:
`accept`, `reject`, `defer`, `third path`, or `needs more evidence`.
- Repair lanes may implement only already-dispositioned, scoped repairs. They do
not independently reinterpret review findings or expand product direction.
- Each repair lane needs its own worktree/branch, write scope, expected repair
report, focused verification, known conflict list, and stop condition.
- The chief arbitration lane alone integrates repair outputs, resolves
conflicts, runs required verification, and writes arbitration/final artifacts.
- If repair lanes reveal a plan gap, scope change, or uncertain evidence, they
stop and report back to chief arbitration. The chief lane decides whether to
gather evidence, return to planning, or defer.
- Boundary incidents follow the same rule as execution worktrees: recover only
clearly attributable out-of-scope edits and record the incident in the repair
report, ledger, and final arbitration summary.

## Repair Routing

After review finds a problem, route it by defect type:
Expand Down Expand Up @@ -498,6 +627,39 @@ Manager should:

Manager must not silently execute business-code changes, overrule arbitration without evidence, or centralize all lane communication as hidden chat.

### Manager / Planning Lane Rotation

Long-lived manager and planning lanes can become coordination debt when chat
history accumulates old heartbeats, closed batches, stale lane ids, and
superseded routes. Rotate them deliberately before context corruption affects
dispatch.

Rotate a manager or planning lane when any of these are true:

- repeated context compression or interrupted turns make current state
unreliable;
- old route instructions, obsolete heartbeats, or stale lane ids keep resurfacing;
- the lane has accumulated multiple completed phases and the next phase needs a
clean dispatch context;
- the user reports the manager/planner is stuck, slow, confused, or reviving old
work;
- the lane is near context limits and exact dirty state, active lanes, or
blockers matter.

Rotation contract:

- Write a baton/context pack before handoff. It must include loop id, roots,
canonical artifacts, active lanes, retired lanes, current phase status,
pending expected artifacts, safety boundaries, stale monitors to delete/update,
dirty git/worktree state, and next routing decision.
- Record the rotation in state-feedback, worklog, and thread-ledger.
- Mark the old manager/planning lane retired, archived, or checkpointed with
`next_expected_use: none` unless a specific future use is named.
- The new lane must start artifact-first from the baton/context pack and current
ledgers, not from inherited chat memory.
- Do not let both old and new manager lanes actively dispatch the same loop.
Overlapping managers are allowed only during explicit handoff validation.

## Dispatcher Role

Dispatcher is a physical delivery role.
Expand Down
Loading
Loading