autopilot-headless-compute provides three headless runtime surfaces for the current NIP-90 + Spark flow:
relay: tiny local websocket relay for deterministic buyer/provider smoke runsprovider: headless provider mode with a separate Nostr/Spark identity pathbuyer: headless Buy Mode loop using the current wallet by default
The same repo now also has an app-owned desktop control plane for the running GUI:
apps/autopilot-deprecated/src/desktop_control.rsapps/autopilot-deprecated/src/bin/autopilotctl.rsapps/autopilot-deprecated/src/compute_mcp.rsapps/autopilot-deprecated/src/bin/autopilot_compute_mcp.rs
That control plane is intentionally UI-synced. autopilotctl drives the running
desktop app through the same split-shell truth model the user sees on screen:
hotbar shell, Provider Control, Buy Mode, Wallet, Log Stream, and the mirrored
runtime status surfaces all come from the same app-owned snapshot instead of a
separate headless-only state machine.
The next Tauri shell has a separate, narrower control plane for the Pylon and
proof surfaces now being built in apps/autopilot. Use
docs/codex/AUTOPILOT_TAURI_CONTROL.md and
scripts/autopilot/tauri-control-smoke.sh for that app. Do not use the legacy
WGPUI autopilotctl docs as proof that the Tauri shell was exercised.
autopilotctl is the thin CLI client for the running desktop-control runtime.
It can:
- fetch the current desktop-control snapshot
- stream desktop-control event batches
- inspect advanced provider inventory truth for local, clustered, and sandbox compute surfaces, including projection source, open quantities, and section blockers
- inspect buyer procurement truth for spot and forward RFQs, quote selection, accepted orders, and the topology/proof/environment posture of quoted compute
- inspect cluster, sandbox, training, proof, and challenge status through the same app-owned snapshot the desktop uses, including delivery acceptance, validator outcomes, accepted training outcomes, checkpoint lineage, and settlement history when kernel authority is configured
- inspect and mutate the bounded
gemma4:e4bfinetuning surface through authenticated desktop control, including project binding, explicit split identity, tokenizer/template compatibility receipts, async jobs, checkpoint and artifact truth, and bounded promotion state - list, open, focus, close, and inspect panes in the running desktop shell
- inspect the bounded contributor-beta control surface and its governed submission state through the same pane snapshot contract as the GUI
- emit structured desktop perf snapshots from the app-owned
Frame Debuggersurface throughautopilotctl perf - inspect the active local-runtime truth model (
local_runtime) and raw GPT-OSS runtime state - inspect the pooled-inference mesh surface the desktop currently sees, including local serving versus proxy posture, routed model inventory, warm replicas, and mesh membership
- refresh the active local runtime and wallet state
- inspect Apple FM adapter inventory and current session attachment truth from the same desktop snapshot the workbench uses
- load, unload, attach, and detach Apple FM adapters through desktop control
- refresh, warm, unload, and wait on GPT-OSS directly
- bring the provider online or offline
- inspect active-job and buy-mode state
- create, start, pause, reset, retarget, and inspect desktop-owned AttnRes lab runs without opening the pane UI
- create, upload, start, wait on, and inspect desktop-owned sandbox jobs
- download sandbox workspace files and declared sandbox artifacts through the control plane
- inspect the current Tailnet roster (
tailnet) sourced fromtailscale status --jsonthrough the same desktop-control snapshot the provider-status pane now uses - inspect the current tunnel-status surface (
tunnels) that will eventually reflect Psionic-backed service exposure when the desktop app starts owning those flows directly - select the managed NIP-28 main channel
- list NIP-28 groups, channels, and recent messages
- send or retry NIP-28 chat messages
- inspect and operate the current internal Forge shared-session lane through
autopilotctl forge, including hosted session discovery, hosted attach, and controller handoff commands for the Probe-backed coding shell, with no-window Forge-host autostart when no live desktop-control target is reachable - start and stop Buy Mode against the same in-app state used by the GUI
- pull a daily or explicit-window NIP-90 sent-payments report backed by the app-owned buyer payment-attempt ledger instead of raw relay rows
autopilot-compute-mcp is the model-facing companion surface for the same
desktop-control contract. It speaks MCP over stdio and intentionally sits on
top of the running app's manifest, auth token, and action schema instead of
creating a second hidden compute RPC path.
The current MCP tool surface exposes:
- full compute snapshots and inventory summaries
- provider online/offline requests
- cluster status and topology inspection
- sandbox create/get/upload/start/wait/download operations
- proof and challenge inspection
It does not bypass desktop-control policy or kernel authority. If the desktop
would reject an operation through autopilotctl, the MCP layer returns the same
failure through the corresponding tool call.
The desktop control runtime writes and exposes:
desktop-control.jsonmanifestlatest.jsonlsession-log alias- per-session JSONL logs
Those files are the source of truth for programmatic verification because they prove the UI, the control plane, and the runtime logs stayed in sync.
For NIP-90 payment history, those JSONL files are audit and backfill inputs,
not the primary product read model. The desktop imports recoverable payment
facts from session logs into the app-owned
~/.openagents/autopilot-nip90-payment-facts-v1.json ledger with degraded
log-backfill provenance, and panes query that ledger instead of reparsing raw
logs on demand. During live desktop operation, that background import
intentionally defers hot session logs and caps the imported byte budget so UI
redraws do not block on large or actively growing JSONL files.
Useful autopilotctl starting points:
autopilotctl status
autopilotctl local-runtime status
autopilotctl local-runtime refresh
autopilotctl apple-fm status
autopilotctl apple-fm list
autopilotctl gpt-oss status
autopilotctl gpt-oss warm --wait
autopilotctl pooled-inference status
autopilotctl pooled-inference topology
autopilotctl pooled-inference export-join-bundle /tmp/mesh-home.join.json --mesh-root /tmp/mesh-home
autopilotctl pooled-inference import-join-bundle /tmp/mesh-home.join.json --mesh-root /tmp/mesh-joiner
autopilotctl wait gpt-oss-ready
autopilotctl attnres status
autopilotctl attnres start
autopilotctl attnres view inference
autopilotctl attnres view loss
autopilotctl attnres sublayer set 4
autopilotctl attnres speed set 5
autopilotctl wait attnres-completed
autopilotctl provider online
autopilotctl chat status
autopilotctl chat messages --tail 20
autopilotctl forge status
autopilotctl forge hosted sessions
autopilotctl forge hosted attach-shared forge-session-1
autopilotctl forge handoff request "taking over triage"
autopilotctl forge handoff accept "accepted from desktop-b"
autopilotctl buy-mode status
autopilotctl nip90-payments daily --date 2026-03-14
autopilotctl nip90-payments window --start 2026-03-14T05:00:00+00:00 --end 2026-03-15T05:00:00+00:00 --json
autopilotctl tailnet status
autopilotctl tunnels status
autopilotctl cluster status
autopilotctl sandbox status
autopilotctl training status
autopilotctl gemma-finetune status
autopilotctl gemma-finetune tenant create design-partner --display-name "Design Partner"
autopilotctl gemma-finetune tenant status --api-key <tenant-api-key>
autopilotctl gemma-finetune project create "Support agent" --tenant-id design-partner --api-key <tenant-api-key> --base-served-artifact-digest sha256:gemma4-e4b-base --hidden-size <hidden-state-width>
autopilotctl gemma-finetune dataset register support-agent-1760000000000 dataset://openagents/support-agent@2026.04 --api-key <tenant-api-key> /tmp/train.jsonl /tmp/validation.jsonl /tmp/holdout.jsonl --baseline-short-path /tmp/baseline-short.jsonl --final-report-path /tmp/final-report.jsonl --chat-template-digest sha256:gemma4-e4b-template --benchmark-ref benchmark://psionic/gemma4/e4b/finetune_eval
autopilotctl gemma-finetune job create support-agent-1760000000000 --api-key <tenant-api-key> --dataset-id support-agent-2026-04-1760000000500
autopilotctl gemma-finetune job get support-agent-1760000000000-1760000000600 --api-key <tenant-api-key>
autopilotctl gemma-finetune job cancel support-agent-1760000000000-1760000000600 --api-key <tenant-api-key>
autopilotctl gemma-finetune promote support-agent-1760000000000-1760000000600 --api-key <tenant-api-key> --checkpoint-id support-agent-r1760000000600-final --reviewer-id operator-1 --review-state approved
autopilotctl remote-training status
autopilotctl remote-training list
autopilotctl remote-training run parameter-golf-runpod-single-h100-live-sample
autopilotctl remote-training stale
autopilotctl remote-training refresh
autopilotctl training launch /tmp/train.jsonl /tmp/held-out.jsonl weather-helper --author "OpenAgents" --description "Repo-native Apple adapter operator run" --license Apache-2.0
autopilotctl training export weather-helper-1760000000000 /tmp/weather-helper.fmadapter
autopilotctl training accept weather-helper-1760000000000
autopilotctl proof status
autopilotctl challenge status
autopilotctl logs --tail 50
autopilotctl perf
autopilotctl perf --json
autopilotctl pane list
autopilotctl pane status provider_control
autopilotctl pane status frame_debugger --json
autopilotctl pane open contributor_beta
autopilotctl pane status contributor_beta --json
autopilotctl pane close provider_control
autopilotctl pane open provider_controlautopilotctl forge is the first honest programmatic control surface for the
current internal Forge MVP. It does not try to expose every future Forge object
yet. It covers the shared coding-session seam that already exists in the app:
For a narrow user-facing and agent-facing reference focused only on that Forge
CLI, use docs/codex/AUTOPILOTCTL_FORGE_CLI.md.
The repo now also ships a no-window Forge host:
cargo run -p autopilot-desktop --bin autopilot_headless_forge -- \
--manifest-path /tmp/openagents-forge-desktop-control.jsonautopilotctl forge ... will autostart that host shape automatically when:
- you did not pass an explicit
--base-urland--auth-token - the resolved manifest is missing or stale
- no desktop-control target is reachable at the recorded endpoint
If you want the hidden host isolated from a normal desktop session, pass an
explicit --manifest path to autopilotctl or start
autopilot_headless_forge manually with --manifest-path.
The repo-owned smoke script for that path is:
scripts/autopilot/headless-forge-smoke.shautopilotctl forge status [--thread-id <probe-session-id>]- prints the current shared-session/controller state for the active or named Probe thread
autopilotctl forge hosted sessions- lists visible hosted Forge sessions from the shared shell state
autopilotctl forge hosted attach-shared <shared-session-id>autopilotctl forge hosted attach-probe <probe-session-id>- binds the current desktop to a hosted shared session and loads its Probe session through the same attach flow the UI uses
autopilotctl forge handoff status [--thread-id <probe-session-id>]autopilotctl forge handoff request <summary> [--thread-id <probe-session-id>]autopilotctl forge handoff accept <summary> [--thread-id <probe-session-id>]autopilotctl forge handoff take <summary> [--thread-id <probe-session-id>]autopilotctl forge handoff note <summary> [--thread-id <probe-session-id>]autopilotctl forge handoff human <summary> [--thread-id <probe-session-id>]autopilotctl forge handoff agent <summary> [--thread-id <probe-session-id>]- all of these operate on the same app-owned participant roster and controller-lease truth the desktop shell already projects
The AttnRes command group drives the same persisted Psionic-backed lab state the
desktop pane uses. autopilotctl attnres status hydrates the current lab state
from the app-owned controller, and the mutating commands update that same
desktop-owned state machine rather than creating a separate headless AttnRes
runtime.
autopilotctl perf is a convenience wrapper over the frame_debugger pane
snapshot. It emits the rolling redraw cadence plus grouped and recent timing
data for:
- per-pane paint CPU
- per-runtime background pump work
- desktop-control, provider-admin, and Codex-remote snapshot/signature/sync phases
That output is intended for harness automation, so prefer --json when you are
collecting or diffing perf runs.
autopilotctl pane status contributor_beta --json is the operator and
contributor-facing read path for the bounded external beta. It exposes the
current contributor identity posture, contract and worker-role posture, trust
tier, accepted or rejected or review counts, confirmed and provisional credit
state, review-hold posture, review-queue depth and SLA posture, the current
manual review owner, the Tailnet-first M5 plus NVIDIA pilot roster, the known
external-operator roster, the retained governed-run and XTRAIN and operational
report digests, recent submission state machines, credit dispositions, and the
latest runtime disagreement receipt lineage without creating a separate beta-only
control protocol.
The contributor-beta pane snapshot now also carries the narrow lineage chain contributors actually need:
- source receipt id
- staging state
- quarantine state
- replay status
- training impact
- held-out impact
- credit disposition and credit reason
That is the bounded answer to "what happened to my submission?" without asking contributors to read repo code or raw ledgers.
autopilotctl status now prints the same app-owned inventory projection summary
the Provider Control pane uses, including the projection source, kernel snapshot
ID when present, per-section product and open-quantity counts, and any current
blocker reason for the local, cluster, or sandbox inventory sections.
It also prints the app-owned buyer procurement summary for compute RFQs and quotes, including the active quote mode, selected quote IDs, and the quoted backend, topology, proof posture, environment ref, and sandbox profile where those fields are present.
autopilotctl pooled-inference status is the first dedicated operator view for
the Psionic mesh-backed pooled-inference lane. Set
OPENAGENTS_PSIONIC_MESH_MANAGEMENT_BASE_URL to the Psionic management host,
then use the command to inspect the local machine's current pooled-inference
posture: whether this node is serving locally, standing by, or proxying into
the pool, which models are currently warm and targetable, how many warm
replicas exist across the visible mesh, and which reasons are currently driving
the local contribution posture. autopilotctl pooled-inference topology
extends that same view with the visible member list and per-node warm-model
inventory.
That same pooled-inference status now feeds the desktop provider inventory and kernel-market launch product identity. When the mesh is configured, the cluster inventory section no longer shows a placeholder. Instead it projects four distinct pooled inference products:
psionic.cluster.inference.pooled.remote_whole_requestpsionic.cluster.inference.pooled.replicatedpsionic.cluster.inference.pooled.dense_splitpsionic.cluster.inference.pooled.sparse_expert
Those product IDs are published as generic pooled-inference products rather
than gpt_oss-only products. Each one carries explicit clustered-execution
truth, topology, provisioning, and live mesh state. The desktop and
autopilotctl pooled-inference status surfaces now show the models that are
actually targetable through the pool, the execution modes each model supports,
the topology used for that mode, and the participating machines currently
behind it.
The four pooled products map to plain operator-visible behaviors:
psionic.cluster.inference.pooled.remote_whole_requestOne machine in the pool handles the full request, even if the request entered through a different machine.psionic.cluster.inference.pooled.replicatedThe pool keeps multiple warm copies of the same model ready so requests can fail over or be promoted into serving faster.psionic.cluster.inference.pooled.dense_splitOne dense model request is split across multiple machines instead of requiring a single machine to hold the whole serving workload.psionic.cluster.inference.pooled.sparse_expertAn expert-sharded model runs across multiple machines, with different machines responsible for different expert shards.
The inventory and active-job surfaces also spell out the current revenue rule for each product:
psionic.local.inference.*.single_nodeearns when a local delivery is accepted and wallet settlement is confirmed.psionic.cluster.inference.pooled.remote_whole_requestearns when the pool serves the request and the clustered delivery proof is accepted.psionic.cluster.inference.pooled.replicatedpublishes standby capacity, but warm replicas alone do not earn. Revenue starts only when a reserve window sells or a replica is promoted into accepted delivery.psionic.cluster.inference.pooled.dense_splitearns when a single request is actually split across multiple machines and the clustered delivery proof is accepted.psionic.cluster.inference.pooled.sparse_expertearns when an expert-sharded request is executed across the pool and the clustered delivery proof is accepted.
When an active job or a persisted history row already carries pooled product
identity, the desktop now preserves the same compute_product_id,
market_receipt_class, and earnings summary through the live status view,
kernel receipt tags, and authoritative history rehydrate. autopilotctl status
and autopilotctl pooled-inference status therefore tell the operator whether
the current contribution is direct local serving, whole-request pooled serving,
standby replica capacity, split-across-machines serving, or expert-sharded
pooled serving.
The same command group now also owns the multi-machine join and invite path above Psionic mesh-lane truth:
autopilotctl pooled-inference export-join-bundle <output-path>autopilotctl pooled-inference import-join-bundle <input-path>
Those commands shell out to psionic-mesh-lane, read or update the file-backed
lane root, and then project the resulting join state back through the normal
desktop-control snapshot. The desktop and CLI keep Psionic as the source of
truth for:
- the owned pooled-inference service binary and management surface; this path does not depend on a separate external mesh sidecar runtime
- whether the local lane is still
standaloneor hasjoined - the last selected mesh preference, including label, namespace, cluster id, trust digest, and advertised control-plane addresses
- the last imported join bundle, including admission kind, trust posture, discovery posture, trust-bundle version, and export/import timestamps
Configure the lane root with either:
--mesh-root /absolute/path/to/mesh-rootOPENAGENTS_PSIONIC_MESH_LANE_ROOT=/absolute/path/to/mesh-root
If autopilotctl cannot find a colocated psionic-mesh-lane binary, point it
at one explicitly with:
OPENAGENTS_PSIONIC_MESH_LANE_BIN=/absolute/path/to/psionic-mesh-lane
Otherwise it first looks for a sibling binary beside the current desktop
artifacts and then falls back to running the sibling ../psionic checkout with
cargo run -p psionic-serve --bin psionic-mesh-lane -- ....
Join-bundle export uses the mesh lane's configured service port and derives
advertised control-plane addresses from the current Tailnet self-device IPs
when you do not pass explicit --advertise <ip:port> values. That keeps the
common "share this machine with another machine on my Tailnet" path short:
export OPENAGENTS_PSIONIC_MESH_LANE_ROOT=/tmp/mesh-home
autopilotctl pooled-inference export-join-bundle /tmp/mesh-home.join.jsonIf you need to publish a different address set, override it directly:
autopilotctl pooled-inference export-join-bundle /tmp/mesh-home.join.json \
--mesh-root /tmp/mesh-home \
--mesh-label mesh-home \
--advertise 100.90.1.10:47470 \
--advertise 100.90.1.11:47470Import records the join bundle into the target lane root by running
psionic-mesh-lane install --join-bundle .... If the local pooled-inference
lane is already running, the file-backed join state may require a lane restart
before the management host reflects the new posture. autopilotctl reports
that explicitly through restart_required and restart_hint so operators do
not mistake "bundle recorded" for "running process already reloaded."
export OPENAGENTS_PSIONIC_MESH_LANE_ROOT=/tmp/mesh-joiner
autopilotctl pooled-inference import-join-bundle /tmp/mesh-home.join.json
autopilotctl pooled-inference statusautopilotctl tailnet status is the focused operator view for the same Tailnet
roster now surfaced in the desktop Tailnet Status pane. It shells out to
tailscale status --json, normalizes the local node plus peer devices into the
desktop-control snapshot, and prints the full relay, IP, traffic, and
last-seen fields without requiring a second remote-only control path. The pane
itself is the concise discovered-device roster, not the full raw dump.
autopilotctl training status now surfaces the app-owned training operator
projection sourced from kernel authority and the current desktop runtime. The
payload distinguishes lightweight control-plane orchestration from heavy
artifact-plane staging and includes run counts, accepted outcomes, environment
versions, checkpoint refs, stale-rollout and duplicate-handling counters,
validator verdict totals, sandbox pool readiness, and the currently visible
training runs or participants when those surfaces are available.
The same status surface now also projects the first decentralized adapter training read model. It includes:
- contributor readiness and inventory truth for the local
adapter_trainingcontributor lane - the latest projected adapter windows, including window status, upload counts, accepted or quarantined or replay-required contribution totals, validation gates, and promotion readiness
- projected contribution outcomes, including contributor node, validator disposition, aggregation eligibility, and payout-relevant settlement state
- a contributor-focused summary for the current desktop node so operators can see whether the local runtime is blocked, awaiting assignment, submitted, accepted, quarantined, replay-required, or settlement-ready
Desktop control now also carries a separate bounded gemma_finetune surface
for the first gemma4:e4b adapter-SFT MVP. This is intentionally narrower than
the Apple adapter operator and narrower than a general finetuning platform:
- it is a design-partner and operator-facing prep surface for the bounded
CUDA-only
gemma4:e4blane - it now has an app-owned tenant registry with one provisioned API key per design partner, explicit quota state, and fail-closed auth on tenant-scoped project, dataset, job, and promotion actions
- it records project identity, tenant identity, served-base binding, and the frozen Psionic training-family and eval-pack contract the project targets
- it records explicit
train,held_out_validation,baseline_short, andfinal_reportsplit identity before a job exists - it surfaces tokenizer/template compatibility, assistant-mask posture, overlap review refs, and validation receipts through the same authenticated snapshot contract as the desktop shell
- it projects accepted finetune outcomes and customer-visible promoted-model inventory rows once a checkpoint clears the bounded promotion gate
This surface does not yet claim broad upload or raw conversation ingestion.
Today the bounded lane expects preprocessed hidden-state supervision files in
the Psionic GemmaE4bLmHeadSupervisionSample shape, provided either as JSON
arrays or JSONL. The operator must declare train, validation, and holdout files
explicitly, and the desktop-control layer truthfully rejects template drift
before any training job is created.
The new desktop-control actions are:
gemma-finetune-status- returns the operator aggregate view: tenant quota rows, current project and dataset state, visible validation receipts, jobs, accepted-outcome rows, and promoted-model inventory
gemma-finetune-tenant-provision- creates one bounded design-partner tenant record and returns the single API key used for tenant-scoped Gemma actions
gemma-finetune-tenant-view- returns the authenticated tenant-scoped read model, including quota usage, projects, jobs, accepted outcomes, and published inventory rows
gemma-finetune-project-create- binds one design-partner project to the bounded
gemma4:e4btraining lane and its canonical eval pack, but only when the supplied tenant API key is valid for that tenant and the bounded quota is still open
- binds one design-partner project to the bounded
gemma-finetune-dataset-register- registers explicit split files, computes split digests and counts, and records a validation receipt that captures tokenizer/template compatibility plus overlap-review posture under the authenticated tenant
gemma-finetune-job-create- queues one bounded async Gemma job against the admitted dataset bound to a project, but fails closed if the tenant or the single-lane bounded runtime is already saturated
gemma-finetune-job-get- returns per-job state, recent events, checkpoints, exported artifacts, and the current promotion record if one exists, scoped by the tenant API key
gemma-finetune-job-cancel- requests cancellation and leaves the job in explicit
cancel_requestedor terminalcancelledtruth instead of hiding the operator intent
- requests cancellation and leaves the job in explicit
gemma-finetune-checkpoint-promote- records the operator review, scores the bounded Psionic promotion decision,
and either publishes a discoverable promoted-model ref plus accepted-outcome
and model-inventory truth or leaves the job in explicit
hold_for_review/rejectposture
- records the operator review, scores the bounded Psionic promotion decision,
and either publishes a discoverable promoted-model ref plus accepted-outcome
and model-inventory truth or leaves the job in explicit
autopilotctl status now prints a top-level gemma finetune: line alongside
the broader desktop snapshot, and autopilotctl gemma-finetune status expands
that into per-tenant quota, per-project, per-dataset, per-job, accepted-outcome,
and inventory detail. autopilotctl gemma-finetune tenant status --api-key ...
returns the same read model but scoped to one design partner. The job getter
prints recent events plus checkpoint and artifact rows. The create, register,
queue, cancel, and promote commands all operate against the same state the
desktop keeps on disk under
~/.openagents/logs/autopilot/gemma-finetune.json.
The current promotion path is intentionally narrow. OpenAgents now records the real Psionic baseline sweep, trainer progress, checkpoint digests, exported adapter digests, and operator review packet. The public Psionic scorer is not yet exported through this repo, so the current eval receipts are explicitly derived from the bounded held-out loss lane plus the frozen dataset/template contract checks. The promotion decision surfaces that truth directly instead of pretending the desktop already owns a broader finetune control plane.
Desktop control now also carries a separate remote_training mirror for the
Psionic Google and RunPod lanes. That surface is app-owned and provider-neutral:
- it mirrors the Psionic remote-training run index and retained bundles into an app-local cache instead of making panes parse provider storage layouts
- it refreshes active runs on a one-second cadence
- it keeps
stale,summary_only, and full-series posture explicit through normalized run-list and selected-run detail state - it surfaces bundle provenance, cached-bundle paths, heartbeat age, and stale reasons through the same desktop snapshot contract the GUI uses
The desktop-control action surface now exposes that same mirror directly:
remote-training-statusreturns the normalized run list, selected run summary, refresh cadence, cache paths, provenance source, stale counts, and the last live-source errorremote-training-runreturns the selected or requested run together with the retained visualization bundle when the app has it cached locallyremote-training-refreshforces an immediate mirror refresh before returning the normalized status payload
autopilotctl remote-training is the operator-facing wrapper over those
actions:
statusprints the same top-level remote-training summary the full desktop snapshot now includeslistprints every mirrored run with provider, lane, result, series posture, heartbeat age, cache state, and stale diagnosticsrun [run_id]prints selected-run detail and retained bundle counts, freshness, provenance, and source-root informationstalefilters the mirrored run list down to stale entries and their freshness failure reasonsrefreshforces a one-shot sync before printing the same normalized status payload
For the Apple adapter lane, the training command group is now an operator workflow rather than a status-only surface:
autopilotctl training launch ...imports train and held-out datasets, runs the current repo-owned Apple operator flow, runs the Rust-native Psionic training/export lane to stage an Apple-valid.fmadapter, and records local held-out plus runtime-smoke results, including the bridge-reported runtime compatibility state used during validationautopilotctl training export ...materializes the staged package at an explicit export path without treating export as accepted market truthautopilotctl training accept ...is the authority boundary: it registers environment, benchmark, validator, checkpoint-family, training-policy, eval-run, training-run, and accepted-outcome records through kernel authority, but only after rerunning a bridge-backed Apple runtime drift check against the exported package so runtime or Background Assets changes are surfaced explicitly before publication
The status payload now renders the Apple operator stages separately so launched, evaluated, exported, and accepted states stay distinct across restart or replay. It also renders decentralized adapter window and contribution lines so scripted operation no longer requires dropping down to raw kernel API calls to inspect the latest contributor outcomes.
For the Apple adapter operator specifically, the same status surface now carries
live run telemetry rather than only coarse terminal states. While a run is
active, autopilotctl training status renders:
- a
training operator live:line with the current phase, heartbeat timestamp, run and phase elapsed time, ETA, epoch and step counters, held-out eval progress, latest observed loss, and the latest checkpoint path - recent typed
training operator event:lines for major lifecycle points such as training start, epoch start, step completion, held-out eval samples, export, runtime smoke, and acceptance
autopilotctl training launch ... now returns immediately after the app-owned
operator run is created. The long-running train/eval/export/runtime-smoke work
continues in the background, and the status/event surfaces above are the
supported way to follow progress without waiting for the original launch command
to exit.
For the Apple adapter lane, that operator status is lifecycle truth, not
benchmark-usefulness truth. The text output now labels authority publication as
authority_accept and authority_outcome, and it prints an explicit note that
export, runtime smoke, and authority acceptance do not by themselves prove
benchmark-useful adapter quality. The canonical benchmark-useful gate remains
scripts/release/check-psionic-apple-architecture-explainer-acceptance.sh.
The latest live acceptance receipt on 2026-03-16 passed only the weak overfit
stage (520 score bps, 1428 pass-rate bps, 1 improved case) and still
rejected the standard stage (571 score bps, 1428 pass-rate bps, 1
improved case), so do not read autopilotctl training status as a substitute
for the acceptance harness.
The packaged Apple-lane release checks now also run
scripts/release/check-psionic-apple-rust-only-gate.sh before claiming Apple
training readiness. That gate fails if the shipped Apple operator path regresses
back to toolkit-root discovery, Python-interpreter discovery, or authoritative
toolkit shell-outs.
autopilotctl proof status now drills into recent delivery proofs instead of
stopping at a count. The payload includes proof posture, topology and
provisioning labels, linked challenge state, settlement outcomes, and review
refs such as proof-bundle refs, activation fingerprints, validator refs, and
runtime/session identity refs when those are present in kernel truth.
autopilotctl challenge status now surfaces the validator history against the
same kernel objects: linked delivery proofs, verdicts, reason codes, challenge
result refs, and the current settlement impact summary for the challenged
delivery.
autopilotctl nip90-payments daily --date YYYY-MM-DD interprets the supplied
date as the local calendar day on the machine running the CLI, converts it into
an explicit [start, end) epoch window, and asks desktop control for the same
wallet-authoritative NIP-90 send report the app can use elsewhere. The payload
includes exact window start/end timestamps, top-line payment_count and
total_sats_sent, fees and wallet debit totals, the currently connected relay
URLs considered, and the degraded-binding count for any recovered non-top-line
rows.
Useful MCP starting point:
cargo run -p autopilot-desktop --bin autopilot-compute-mcp -- --manifest \
~/.openagents/autopilot/desktop-control.jsonTypical MCP clients should launch that stdio server after the desktop app is already running and the desktop-control manifest exists.
Apple-specific bridge flows still exist for the shipped macOS release path:
autopilotctl apple-fm status
autopilotctl apple-fm refresh --wait
autopilotctl apple-fm list
autopilotctl apple-fm load /absolute/path/to/adapter.fmadapter --adapter-id fixture-chat-adapter
autopilotctl apple-fm attach sess-1 fixture-chat-adapter
autopilotctl apple-fm detach sess-1
autopilotctl apple-fm unload fixture-chat-adapter
autopilotctl apple-fm smoke-testautopilotctl apple-fm status now includes bridge reachability, model readiness,
adapter-inventory support, attach support, loaded adapter count, and the current
active-session adapter if the workbench has one selected. autopilotctl apple-fm list
prints the loaded bridge inventory entries with compatibility and attached-session
truth. The mutating load, unload, attach, and detach verbs queue the same
Apple FM workbench operations the desktop pane uses, so headless operators and the
GUI stay replay-safe against one app-owned control surface.
Sandbox lifecycle examples:
autopilotctl sandbox status
autopilotctl sandbox create pythonexec-profile job-1 /tmp/openagents-sandbox \
--entrypoint-type workspace-file \
--entrypoint scripts/job.py \
--expected-output result.txt
autopilotctl sandbox upload job-1 scripts/job.py ./job.py
autopilotctl sandbox start job-1
autopilotctl sandbox wait job-1 --timeout-ms 30000
autopilotctl sandbox job job-1
autopilotctl sandbox download-artifact job-1 result.txt --output /tmp/result.txtThe app now exposes one app-owned local_runtime contract across the split
desktop shell, desktop control, and autopilotctl, but the supported host
stories are still lane-specific:
- macOS Apple Silicon: Apple FM via
foundation-bridge - supported non-macOS NVIDIA hosts: GPT-OSS via the in-process Psionic CUDA lane
- retained GPT-OSS Metal/CPU backends can still appear in status/readiness
views, but
Go Onlinecurrently unlocks sell-compute only for CUDA
Provider Control now renders the active local-runtime lane inline. On supported
NVIDIA/CUDA hosts that means the local-runtime area can show GPT-OSS readiness,
artifact state, load state, model path, and REFRESH / WARM / UNLOAD
actions directly, while the separate GPT-OSS workbench remains the prompt
playground and detailed model-management pane.
Use this on a supported non-macOS NVIDIA/CUDA host.
The GPT-OSS runtime reads:
OPENAGENTS_GPT_OSS_BACKEND=auto|cuda|metal|cpuOPENAGENTS_GPT_OSS_MODEL_PATH=/path/to/gpt-oss-20b-mxfp4.gguf
If OPENAGENTS_GPT_OSS_MODEL_PATH is unset, the runtime defaults to:
~/models/gpt-oss/gpt-oss-20b-mxfp4.gguf
Recommended bring-up flow:
export OPENAGENTS_GPT_OSS_BACKEND=cuda
export OPENAGENTS_GPT_OSS_MODEL_PATH=/absolute/path/to/gpt-oss-20b-mxfp4.gguf
cargo install --path .
cargo autopilot
autopilotctl local-runtime status
autopilotctl local-runtime refresh
autopilotctl gpt-oss status
autopilotctl gpt-oss warm --wait
autopilotctl wait local-runtime-ready
autopilotctl wait gpt-oss-ready
autopilotctl provider onlineUseful follow-up checks:
autopilotctl gpt-oss unload --wait
autopilotctl logs --tail 100Repeatable scripted form:
scripts/release/check-gpt-oss-nvidia-mission-control.shCanonical seller-lane perf harness:
scripts/release/check-gpt-oss-nvidia-seller-lane-perf.shThat harness expects a local standalone Psionic checkout at ../psionic by
default. Override it with OPENAGENTS_PSIONIC_REPO=/absolute/path/to/psionic
if your checkout lives elsewhere.
That harness launches the desktop app with the configured CUDA GGUF, captures
the cold app-owned local-runtime preflight snapshot, runs the canonical
Psionic-only CUDA benchmark pass, warms the seller lane through autopilotctl,
and writes a machine-readable artifact at:
target/gpt-oss-nvidia-seller-lane-perf/seller-lane-perf.json
The run directory also preserves raw desktop-control snapshots, benchmark case
summaries, app logs, and a short summary.txt. Set
OPENAGENTS_GPT_OSS_NVIDIA_BASELINE_ARTIFACT=/absolute/path/to/prior/seller-lane-perf.json
to include a baseline/regression comparison section in the new artifact.
Cross-stack launch validation:
scripts/release/check-compute-launch-program.shThe launch-program harness shells into the standalone Psionic repo for
Psionic-owned test legs. By default it expects that checkout at ../psionic;
override it with OPENAGENTS_PSIONIC_REPO=/absolute/path/to/psionic.
That launch-program harness writes a summary plus per-step logs for the desktop
control plane, Psionic sandbox and cluster lanes, validator service, kernel
authority compute flows, and optional funded/platform-specific legs. See
docs/COMPUTE_LAUNCH_PROGRAM_RUNBOOK.md.
Operational notes:
local-runtime refreshalways targets the active Provider Control lane, but on GPT-OSS it does not load the GGUF by itselfautopilotctl local-runtime statusnow surfaces seller-lane execution posture (cold,warming,warm,compile_failed,cache_invalidated), scheduler posture, last compile-path temperature, execution-plan/kernel-cache occupancy, selected device inventory, last cold-compile or warm-refresh duration, typed cache invalidation reason, and the last compile-failure summary when one existsgpt-oss warmandgpt-oss unloadact directly on the configured GGUF modellocal-runtime refreshre-readsOPENAGENTS_GPT_OSS_BACKENDandOPENAGENTS_GPT_OSS_MODEL_PATH; if the backend, configured model, or local artifact metadata changed, the runtime invalidates prior warm execution state and reports the typed invalidation reason through desktop control andautopilotctlprovider onlinewill still block if the backend is notcuda, the GGUF is missing, or the configured model is not loaded- on retained Metal/CPU GPT-OSS hosts, Provider Control and
autopilotctlstay truthful about runtime state but currently point you back to the GPT-OSS workbench instead of unlocking sell-compute
This uses the current default buyer wallet and creates a fresh provider account under target/headless-compute-smoke/provider:
scripts/autopilot/headless-compute-smoke.shUseful env overrides:
OPENAGENTS_HEADLESS_PROVIDER_BACKEND=auto|apple-fm|cannedOPENAGENTS_HEADLESS_MAX_REQUESTS=1OPENAGENTS_HEADLESS_BUDGET_SATS=2OPENAGENTS_HEADLESS_BUYER_HOME=/path/to/funded-homeOPENAGENTS_HEADLESS_RUN_DIR=/path/to/run-dirOPENAGENTS_HEADLESS_PROVIDER_HOME=/path/to/provider-homeOPENAGENTS_SPARK_NETWORK=mainnet|regtest
The smoke script now performs a funding preflight with spark-wallet-cli
before it boots the relay/provider pair. If the default buyer wallet does not
have at least the requested budget on the selected Spark network, the script
fails early and tells you which HOME it inspected.
This runs multiple paid requests from the default wallet into a fresh provider wallet, then flips the roles and spends the earned sats back the other way:
scripts/autopilot/headless-compute-roundtrip.shUseful env overrides:
OPENAGENTS_HEADLESS_FORWARD_COUNT=6OPENAGENTS_HEADLESS_REVERSE_COUNT=3OPENAGENTS_HEADLESS_INTERVAL_SECONDS=8OPENAGENTS_HEADLESS_TIMEOUT_SECONDS=75OPENAGENTS_HEADLESS_PROVIDER_BACKEND=canned|auto|apple-fmOPENAGENTS_HEADLESS_BUYER_HOME=/path/to/funded-homeOPENAGENTS_HEADLESS_RUN_DIR=/path/to/run-dirOPENAGENTS_SPARK_NETWORK=mainnet|regtest
The roundtrip smoke script defaults to the deterministic canned backend so the payment path stays
stable even on machines without Apple Foundation Models. It still uses real NIP-90 requests,
real Spark invoices, and real Lightning settlement.
The forward leg spends from the buyer wallet selected by OPENAGENTS_HEADLESS_BUYER_HOME
(or the current shell HOME if unset). The script now checks that wallet up front and
fails early if it cannot cover the requested forward leg on the selected Spark network.
OPENAGENTS_HEADLESS_REVERSE_COUNT is treated as a ceiling, not a guarantee. After the forward leg,
the script measures the actual sats burned per send from the default wallet and trims the reverse leg
to what the fresh secondary wallet can really afford under the current Lightning fee conditions.
The script emits:
summary.txthuman summarysummary.jsonmachine-readable request/payment report- requested vs executed reverse job counts
- per-phase buyer/provider logs
- Spark status snapshots before, between, and after the two phases
This launches the real bundled Autopilot.app, points it at a deterministic local relay,
drives it through autopilotctl, and verifies the production shell completes the provider
side of the paid loop all the way through settlement:
scripts/release/check-v01-packaged-compute.shWhat it does:
- builds
Autopilot.app,autopilotctl,autopilot-headless-compute, andspark-wallet-cli - bundles
FoundationBridge.appintoAutopilot.app/Contents/Helpers - launches the packaged app executable with isolated
HOMEandOPENAGENTS_AUTOPILOT_LOG_DIR - configures the bundle against a local deterministic relay via its settings file
- brings the provider online through
autopilotctl - starts a controlled headless buyer targeted to the packaged provider
- asserts on the bundled app's
latest.jsonland per-session JSONL logs:- request accepted
- request running
- request delivered
provider.result_publishedprovider.payment_requestedprovider.settlement_confirmed
Useful env overrides:
OPENAGENTS_PACKAGED_RUN_DIR=/path/to/run-dirOPENAGENTS_PACKAGED_FUNDER_HOME=/path/to/funded-homeOPENAGENTS_PACKAGED_FUNDER_IDENTITY_PATH=/path/to/funded/identity.mnemonicOPENAGENTS_PACKAGED_BUYER_FUNDING_SATS=50OPENAGENTS_PACKAGED_BUDGET_SATS=2OPENAGENTS_PACKAGED_SKIP_BUILD=1OPENAGENTS_SPARK_NETWORK=mainnet|regtest
The packaged smoke script is intentionally app-owned verification, not a library-only harness. It proves the production shell, desktop control runtime, and file-backed logs stay in sync through the v0.1 paid compute loop.
Before generating funding invoices, the script now checks the configured funder wallet
and fails early if OPENAGENTS_PACKAGED_FUNDER_HOME / OPENAGENTS_PACKAGED_FUNDER_IDENTITY_PATH
do not point at a wallet that can cover the buyer seed amount on the selected Spark network.
The packaged .app verification flow is still the macOS Apple FM release path.
There is not yet a separate packaged GPT-OSS bundle check in this repo. For the
supported NVIDIA/CUDA lane, current operator verification is the running desktop
app plus autopilotctl, desktop-control snapshots, and the session JSONL logs.
This is the stronger packaged verification path for the current release cut:
scripts/release/check-v01-packaged-autopilotctl-roundtrip.shWhat it does:
- builds the bundled
Autopilot.app - launches both a bundled app and a separate runtime app
- drives both apps entirely through
autopilotctl - selects the managed NIP-28 main channel in both apps
- verifies bidirectional NIP-28 chat
- brings both providers online
- funds Spark wallets as needed
- runs paid buyer/seller flows in both directions
- asserts on desktop-control snapshots plus
latest.jsonl/ session logs for:- NIP-28 presence and message delivery
- targeted NIP-90 request dispatch
- buyer payment settlement
- provider settlement confirmation
snapshot.session.shell_mode=hotbarsnapshot.session.dev_mode_enabled=false
Useful env overrides:
OPENAGENTS_AUTOPILOTCTL_RUN_DIR=/path/to/run-dirOPENAGENTS_AUTOPILOTCTL_FUNDER_HOME=/path/to/funded-homeOPENAGENTS_AUTOPILOTCTL_FUNDER_IDENTITY_PATH=/path/to/funded/identity.mnemonicOPENAGENTS_AUTOPILOTCTL_FUND_SATS=100OPENAGENTS_AUTOPILOTCTL_BUDGET_SATS=2OPENAGENTS_AUTOPILOTCTL_SKIP_BUILD=1OPENAGENTS_SPARK_NETWORK=mainnet|regtest
This is the closest thing in the repo to a production-shell end-to-end test. Use it when the question is not "does the library path work?" but "does the actual app bundle work when steered programmatically?"
The roundtrip script also preflights the configured funder wallet and will stop before app launch if it cannot cover the bundle/runtime seed amounts. That keeps funding failures distinct from split-shell or desktop-control regressions.
Run a local relay:
cargo run -p autopilot-desktop --bin autopilot-headless-compute -- relay --listen 127.0.0.1:18490Run a provider on a separate identity/wallet:
cargo run -p autopilot-desktop --bin autopilot-headless-compute -- provider \
--relay ws://127.0.0.1:18490 \
--identity-path ~/.openagents/headless-provider/identity.mnemonic \
--backend autoRun a buyer with the current default wallet:
cargo run -p autopilot-desktop --bin autopilot-headless-compute -- buyer \
--relay ws://127.0.0.1:18490 \
--max-settled-requests 1 \
--fail-fastTargeting a specific provider on a shared relay:
cargo run -p autopilot-desktop --bin autopilot-headless-compute -- buyer \
--relay wss://your-relay.example \
--target-provider-pubkey <provider-npub-or-hex>The desktop app bootstraps from the configured default channel, then keeps the live NIP-28 subscription set aligned with the managed-chat projection. Runtime subscription scope is all discovered managed-chat channel ids currently present in the projection, and the lane subscribes across the app's configured relay set instead of a single relay.
The primary channel is set by:
OA_DEFAULT_NIP28_RELAY_URL— relay WebSocket URL (default:wss://relay.damus.io)OA_DEFAULT_NIP28_CHANNEL_ID— 64-char hex kind-40 event ID (default:ebf2e35092632ecb81b0f7da7d3b25b4c1b0e8e7eb98d7d766ef584e9edd68c8)
A second channel for team testing can still be seeded without touching the default:
OA_NIP28_TEAM_CHANNEL_ID— 64-char hex kind-40 event ID of the team test channel
When set, the lane worker bootstraps with both channel ids available to the projection. Both channels appear in the managed chat workspace rail, and the runtime will continue subscribing to any additional managed-chat channels the projection discovers later.
Creating a team channel (one-time, manual):
# Using nak (https://github.com/fiatjaf/nak)
nak event --sec <team-nsec> --kind 40 \
--content '{"name":"oa-team-chat","about":"OpenAgents team test channel","picture":""}' \
wss://relay.damus.io
# Record the output event ID (64 hex chars) — that is OA_NIP28_TEAM_CHANNEL_IDRunning the app with the team channel:
export OA_NIP28_TEAM_CHANNEL_ID=<64-hex-id>
cargo autopilot