Local-first scaffolding for a company-style subagent orchestration system. The repo implements the initial runtime contracts, prompt-layer assembly, phase gating, bundle review, collaboration requests, change re-entry, and a smoke-test workflow.
- A Python CLI for creating runs, decomposing goals, suggesting teams, generating role files, rendering prompts, executing tasks through Codex CLI, running objective/phase manager flows, reviewing bundles, creating phase reports, and managing change requests.
- A Codex CLI executor adapter that renders task prompts, runs
codex exec --json, validates structured output, writes completion reports, and creates collaboration requests when an agent blocks on another team. - On-disk JSON contracts for phase plans, objective maps, team registries, task assignments, completion reports, review bundles, planned collaboration handoffs, collaboration requests, phase reports, change requests, and change proposals.
- Markdown role assets for base roles, capability overlays, phase overlays, and reusable templates.
- A smoke-test scaffold that creates two isolated objectives and verifies prompt inheritance, context echo reporting, and acceptance review.
- A test suite covering phase locks, objective isolation, bundle rejection on blocking collaboration, phase advancement gating, and change re-entry.
Initialize a run from a goal file:
company-orchestrator init-run run-001 path/to/goal.md
company-orchestrator decompose-goal run-001
company-orchestrator suggest-teams run-001
company-orchestrator generate-roles run-001 --approve
company-orchestrator plan-phase run-001 --sandbox read-only --max-concurrency 3
company-orchestrator run-phase run-001 --sandbox read-only --max-concurrency 3Or use the higher-level bootstrap path to initialize the run, generate roles, and immediately plan the first phase:
company-orchestrator bootstrap-run run-001 path/to/goal.md --sandbox read-only --max-concurrency 3 --watchOr let the orchestrator keep going autonomously with policy guardrails:
company-orchestrator bootstrap-run run-001 path/to/goal.md --autonomous --approval-scope planning-only --stop-before-phase mvp-build --watch
company-orchestrator run-autonomous run-001 --approval-scope all --stop-before-phase polish --stop-on-recovery --watchStart from the fill-in template at orchestrator/templates/goal-template.md. The example todo-app goal lives at goal-draft.md.
- Keep the
## Objectivesheading exactly as written. - Put one objective per bullet.
- Use concrete wording in those bullets so capability inference has something useful to work with.
- Add real detail in the later sections so discovery workers can identify boundaries, unknowns, risks, and dependencies without inventing facts.
Scaffold and verify the minimal communication test:
company-orchestrator scaffold-smoke-test --run-id smoke-demo
company-orchestrator run-phase smoke-demo --sandbox read-only
company-orchestrator approve-phase smoke-demo discovery
company-orchestrator advance-phase smoke-demoCreate and analyze a change request:
company-orchestrator create-change smoke-demo chg-001 "Need interface updates" --interface-changed
company-orchestrator analyze-change smoke-demo chg-001
company-orchestrator approve-change smoke-demo chg-001
company-orchestrator scaffold-delta smoke-demo chg-001Prompts are assembled in this order:
orchestrator/roles/base/company.mdorchestrator/roles/base/<manager|worker|acceptance-manager>.mdorchestrator/roles/capabilities/<capability>.mdwhen presentorchestrator/roles/objectives/<objective-id>/approved/<role>.mdorcharter.md- A prompt-kind-specific phase overlay:
orchestrator/phase-overlays/<current-phase>/objective-planning.mdorchestrator/phase-overlays/<current-phase>/capability-planning.mdorchestrator/phase-overlays/<current-phase>/task-execution.md
- The rendered planning or task JSON payload
Every render writes a prompt log under runs/<run-id>/prompt-logs/.
Objective-specific roles can live either in the generic tree above or under an app-local tree such as apps/<app>/orchestrator/roles/objectives/<objective-id>/....
execute-task uses the local codex exec binary in non-interactive JSON mode.
- It removes
CODEX_API_KEYandOPENAI_API_KEYfrom the subprocess environment so execution stays on the local ChatGPT-login CLI path. - It passes the schema at
orchestrator/schemas/executor-response.v1.jsonto Codex and converts the final structured response intocompletion-report.v1. - If the response contains a
collaboration_request, the adapter writes a newcollaboration-request.v1file and links it from the completion report. - Raw stdout and stderr from each Codex execution are logged under
runs/<run-id>/executions/.
run-objective and run-phase provide deterministic manager orchestration on top of the live executor.
run-objectiveschedules all active-phase tasks for one objective, executes ready tasks, assembles the objective bundle, and runs acceptance review.run-phasedoes the same across every objective in the active phase, then writes the end-of-phase report automatically.- Task dependencies declared in
depends_onare respected before execution. - Blocking collaboration handoffs are now treated as scheduler gates. Downstream tasks wait until the producing task/report satisfies the handoff deliverables.
--max-concurrencycontrols how many safe tasks the controller may run at the same time. The default is3.- Tasks that are not safe to parallelize fall back to serialized execution with a warning instead of failing the run.
- Code-writing tasks use run-scoped git worktree isolation. Accepted work lands on a dedicated run integration branch
codex/run-<run-id>, not directly on your current branch. - Manager summaries are written under
runs/<run-id>/manager-runs/.
The orchestrator now writes live monitoring state under runs/<run-id>/live/ while planning and execution are in progress.
run-state.jsontracks the active phase plus aggregate activity counts.activities/<activity-id>.jsontracks the status, stage, progress, prompt path, and artifact paths for each planning or execution activity.events.jsonlcaptures normalized lifecycle events from the scheduler, planner, executor, bundle review, and phase reporting.llm-calls.jsonlrecords one structured observability entry per Codex call attempt, including latency, prompt size, queue wait, retries, timeout flags, and token usage.observability.jsonstores the aggregated run-level observability summary used by the dashboard and phase reports.autonomy-history.jsonlrecords one operator-readable audit entry per autonomous decision, including the applied policy, guidance snapshot, and any adaptive tuning decision.
Use the terminal monitor to inspect a run while it is active:
company-orchestrator watch-run run-001
company-orchestrator inspect-activity run-001 APP-A-DISC-001
company-orchestrator inspect-activity run-001 plan:discovery:app-a --followFor convenience, the main execution commands can run with the dashboard attached in the same terminal and then print their normal JSON result when they finish:
company-orchestrator execute-task run-001 APP-A-DISC-001 --watch
company-orchestrator plan-phase run-001 --sandbox read-only --watch
company-orchestrator run-phase run-001 --sandbox read-only --watchwatch-run shows the run header, a Next Action panel, summary counts, objective progress, active planning activities, active task activities, queued work, blocked work, collaboration handoffs, parallelism warnings, and a phase-level progress bar.
The Autonomy panel now also shows:
- approval scope
- stop-before phases
- whether the run stops on recovery
- whether adaptive tuning is enabled
- the last tuning decision, when one was applied
- the autonomy audit-log path
It also includes an LLM Observability panel with:
- total/completed/failed/timed out calls
- input, cached-input, and output token totals
- total prompt chars and lines
- average/max latency
- average queue wait
- retry counts
- active process count
- call counts by activity kind
The Next Action panel and command JSON output both surface:
run_statusrun_status_reasonnext_action_commandnext_action_reasonreview_doc_path
This is intended to tell the human exactly what to do next when a run is:
- actively working
- recoverable after interruption or stale planning
- ready for review
- ready to advance
Operator commands now print a compact human summary by default. Pass --json to any CLI command when you need the full machine-readable payload instead.
inspect-activity shows the activity metadata, full rendered prompt, latest live events, parallel fallback warnings, and the paths to stdout, stderr, workspace, branch, and the final output artifact.
Completed activity history is persisted under:
runs/<run-id>/live/activity-history.jsonl
When an objective belongs to an app-local orchestrator tree under apps/<app>/orchestrator/roles/objectives/..., the same history entries are also mirrored to:
apps/<app>/orchestrator/activity-logs/<run-id>.jsonl
Recovery-aware monitoring is also built in:
- activities can be marked
interrupted,recovering,recovered, orabandoned - interrupted and recovered work appears in dedicated dashboard sections
- activity detail views show attempt count, status reason, and recovery action
- normal long-running commands reconcile stale state before starting new work
Operator recovery commands:
company-orchestrator reconcile-run run-001
company-orchestrator reconcile-run run-001 --apply
company-orchestrator resume-phase run-001 --sandbox read-only --max-concurrency 3
company-orchestrator retry-activity run-001 APP-A-DISC-001 --sandbox read-onlyreconcile-run is a dry run by default and reports what would be reclassified. resume-phase applies reconciliation and continues safe incomplete work for the active phase. retry-activity starts a new attempt for a specific interrupted activity while preserving attempt lineage in the live dashboard and event history.
run-autonomous uses the same planning/execution/recovery engine as the manual CLI, but it can move a run forward without manual approval steps when policy allows it.
Autonomy policy controls:
--approval-scope {all,planning-only,none}--stop-before-phase <phase>(repeatable)--stop-on-recovery--no-adaptive-tuning--max-iterations N
Examples:
company-orchestrator run-autonomous run-001 --approval-scope planning-only --stop-before-phase mvp-build
company-orchestrator run-autonomous run-001 --approval-scope all --stop-before-phase polish --stop-on-recoveryAutonomous mode writes persistent state to:
runs/<run-id>/autonomy.jsonruns/<run-id>/live/autonomy-history.jsonl
The audit history is intended to answer:
- what action autonomy took
- why it took that action
- which policy was active
- what the run guidance said at the time
- whether adaptive tuning changed concurrency
plan-objective and plan-phase now run a two-stage planning flow through Codex:
- The objective manager returns an
objective-outline.v1describing capability lanes and cross-lane coordination. - One capability manager per lane then returns
capability-plan.v1task bundles for that lane. - Python aggregates those capability plans into the final
objective-plan.v1, validates it, writes it underruns/<run-id>/manager-plans/, and materializes the generatedtask-assignment.v1files. - Capability managers are expected to emit concrete
owned_paths,shared_asset_ids, andcollaboration_handoffs, not just lane-local task lists. - Planned cross-lane handoffs are materialized under
runs/<run-id>/collaboration-plans/ascollaboration-handoff.v1artifacts, consumed by the scheduler as real readiness gates, and summarized in phase reports. plan-objective --max-concurrency Ncan run multiple capability-manager planners at once for the same objective.plan-phase --max-concurrency Ncan run multiple objectives at once while sharing a bounded pool of planning slots across nested objective and capability managers.run-phasecontinues to honor the resultingbundle_planduring deterministic acceptance review.- Planning prompts are intended to be self-contained. Objective and capability managers should use the injected runtime context and planning inputs directly rather than exploring the repository.
- Planning prompts now use a compact goal/context view for manager reasoning, while worker input resolution still retains the full run payload.
- Planning prompt compaction is observability-driven. When recent planning calls in the same phase are large, slow, or timed out, later prompts automatically switch from
standardtocompactoraggressivepayload shaping. - Autonomous mode uses the same observability to tune runtime behavior. If recent planning or execution calls in the active phase timed out or became slow, autonomy can reduce its effective concurrency before launching the next action.
- Use
--replaceif you want a new manager plan to overwrite the current objective's tasks for the active phase.
python3 -m unittest discover -s tests -vRun the integrated todo MVP locally with one command:
npm run todo-runtime:startThe runtime starts the backend first, then a small frontend host that serves the React todo page and points it at the backend API without source edits. The wiring externalizes the live backend URL into the frontend runtime and applies the backend allow-origin setting for that frontend origin automatically.
Optional runtime environment variables:
TODO_BACKEND_HOSTdefaults to127.0.0.1TODO_BACKEND_PORTdefaults to3000TODO_FRONTEND_HOSTdefaults to127.0.0.1TODO_FRONTEND_PORTdefaults to4173TODO_BACKEND_DB_PATHdefaults toapps/todo/backend/data/todos.sqlite
The command prints one JSON line with the backend URL, frontend URL, and resolved database path, then keeps both servers running until interrupted.
Runtime-specific validation commands:
npm run validate:todo-e2e-smoke
npm run validate:todo-release-readiness
npm run validate:todo-review-evidence
npm run validate:todo-runtime-connectivity
npm run validate:todo-runtime-startupReview bundle notes for the integrated todo MVP live in mvp-integration-review-evidence.md.
Polish-phase release handoff notes live in:
Generic orchestration assets stay at the repo root:
company_orchestrator/orchestrator/runs/tests/
App-specific assets for the todo example now live under:
apps/todo/backend/apps/todo/frontend/apps/todo/runtime/apps/todo/scripts/apps/todo/docs/apps/todo/orchestrator/apps/todo/goal-draft.md
This repo now includes a live Codex CLI executor path, Codex-powered objective-manager and capability-manager planning for task decomposition, parallel-safe task execution with worktree isolation, live monitoring, and deterministic manager orchestration for task scheduling, bundle assembly, acceptance review, and phase-report generation. Acceptance remains deterministic Python logic; it is not yet a live Codex reviewer.