This guide is for contributors working in the Python engine.
It documents the runtime entrypoints, module boundaries, artifact/state contracts, and the extension points you are expected to use when adding behavior.
For deeper topic-specific guides, use:
- Config and Bootstrap
- Command Surface and Routing
- UI and Interaction Architecture
- Runtime Lifecycle
- State and Artifacts
- Debug and Diagnostics
- Testing and Validation
Current control flow:
bin/envctlenvctl_engine.runtime.launcher_clienvctl_engine.runtime.cli:mainenvctl_engine.runtime.command_router.parse_routeenvctl_engine.runtime.engine_runtime.EngineRuntimeenvctl_engine.runtime.engine_runtime_dispatch.dispatch_command- domain orchestrator or inspection handler
Key boundary decisions:
- The launcher resolves repo root and prepares the Python runtime handoff.
- Source checkouts should use
bin/envctl; direct module execution from the repo root requiresPYTHONPATH=python. runtime/cli.pyowns prereq checks, local config bootstrap policy, and exit code normalization.EngineRuntimeis the runtime facade that wires the domains together.- Orchestrators own behavior; helper modules own reusable policy and contract logic.
runtime/cli.py does more than just call dispatch.
Before dispatching it:
- parses the initial route with enough config context to honor
ENVCTL_DEFAULT_MODE - discovers repo-local config state
- bootstraps
.envctlwhen required - loads
EngineConfig - reparses the route with the final environment/config view
- performs command-specific prereq checks
- normalizes exit codes to:
0: success1: actionable failure2: controlled quit / interrupt
Commands that intentionally skip config bootstrap are limited. That is why show-config, show-state, explain-startup, and the --list-* inspection commands are safe in unconfigured repos, while startup commands are not.
EngineRuntime in python/envctl_engine/runtime/engine_runtime.py is the composition root.
It wires together:
EngineConfigProcessRunnerProcessProbeRuntimeStateRepositoryRuntimeTerminalUIStartupOrchestratorResumeOrchestratorDoctorOrchestratorLifecycleCleanupOrchestratorDashboardOrchestratorStateActionOrchestratorActionCommandOrchestrator- debug recorder and event sinks
Design rule:
- add new behavior to a domain module or orchestrator first
- only add glue methods or aliases to
EngineRuntimewhen the runtime facade truly needs to expose that behavior across domains
The package layout under python/envctl_engine/ is now the main architecture boundary.
runtime/: facade, dispatch, route parsing, shared runtime helpersstartup/: start/restart/resume preparation, execution, progress, restoreactions/: explicit action commands such as tests, PRs, commits, analysis, worktree actionsstate/: typed state models, load/dump compatibility, runtime map, repositoryrequirements/: dependency registry, adapters, orchestration, compose assetsconfig/: config load/merge/bootstrap/edit/savedebug/: debug bundle packaging, sanitization, diagnostics, doctor supportui/: dashboard loop, selector flows, textual integration, spinners, inputshared/: low-level helpers such as ports, process probing, parsing, environment access, hooksshell/: remaining release/readiness checks during the final cutover periodplanning/: plan discovery, selection, worktree setup support
Use the grouped package paths. The flat compatibility shims remain only for migration tolerance.
runtime/command_router.py is the canonical command contract.
It owns:
- command aliases
- supported command list
- mode tokens
- boolean/value/pair/special flag binding
- env-style assignment handling
- route normalization into the
Routedataclass
runtime/engine_runtime_dispatch.py is intentionally small and should stay that way. It maps commands to:
- inspection helpers
- debug helpers
- lifecycle orchestrators
- startup/resume orchestrators
- state/action orchestrators
When adding a command:
- add aliases and support in
command_router.py - make sure
list_supported_commands()includes it - dispatch it from
engine_runtime_dispatch.py - add tests for parsing and dispatch
- update docs and, if relevant, parity/cutover artifacts
The config system has three important layers:
- environment variables
- repo-local config discovery/bootstrap
- managed persistence/editing
Core files:
config/__init__.py: parsing, defaults, alias resolution,EngineConfigconfig/persistence.py: managed.envctlblock format and save logicconfig/wizard_domain.py: interactive bootstrap and edit flowconfig/command_support.py: headless and interactiveconfigcommand behavior
Read Config and Bootstrap when changing discovery, bootstrap gating, or managed save semantics.
Current semantics:
.envctlis the canonical repo-local file- legacy config files can prefill the modern config
- the config wizard writes managed canonical keys
- older aliases remain accepted for compatibility
Developer rule:
- do not scatter ad hoc env parsing across the codebase if the setting belongs in the config model
- if the setting affects startup shape, mode/profile behavior, or persisted config UX, model it in
EngineConfigandconfig/persistence.py
The Python runtime is scope-aware.
Important paths:
config.runtime_dir: global envctl runtime root, usually/tmp/envctl-runtimeconfig.runtime_scope_dir: active scoped runtime directory for the current reporuntime.runtime_root: same asruntime_scope_dirruntime.runtime_legacy_root: compatibility view at<runtime_dir>/python-engine
The runtime also maintains scoped locks and scoped state pointers. This is why repo isolation and cross-repo tests now matter more than they did in the shell flow.
The runtime dispatch surface is intentionally divided across command families with different responsibilities.
Main families today:
- direct inspection:
list-*,show-config,show-state,explain-startup - debug helpers:
debug-pack,debug-report,debug-last - lifecycle cleanup:
stop,stop-all,blast-all - startup/resume:
start,plan,restart,resume - dashboard/config:
dashboard,config - state actions:
logs,clear-logs,health,errors - project actions:
test,pr,commit,review,migrate, worktree actions
Relevant orchestrators:
startup/StartupOrchestratorstartup/ResumeOrchestratorruntime/LifecycleCleanupOrchestratorui/dashboard/DashboardOrchestratorstate/StateActionOrchestratoractions/ActionCommandOrchestrator
Developer rule:
- put behavior in the command family that owns the user contract instead of reusing a nearby orchestrator just because it is convenient
Typed state lives in state/models.py.
Main dataclasses:
PortPlan: requested, assigned, final port plus source and retry countServiceRecord: service process/runtime truth payloadRequirementsResult: normalized dependency result by component idRunState: top-level run contract
Important RunState fields:
run_idmodebackend_modeservicesrequirementspointersmetadata
Compatibility details:
- legacy shell state can still be loaded through
state/__init__.py - legacy payloads are normalized into modern models
- legacy state is marked in metadata and treated more strictly on resume
Developer rule:
- extend the typed models first
- keep JSON payloads backward-tolerant where practical
- preserve shell-read compatibility until the cutover explicitly removes it
state/repository.py and runtime/engine_runtime_artifacts.py define the runtime artifact contract.
Latest pointers:
run_state.jsonruntime_map.jsonports_manifest.jsonerror_report.jsonevents.jsonlruntime_readiness_report.json
Per-run history:
runs/<run-id>/run_state.jsonruns/<run-id>/runtime_map.jsonruns/<run-id>/ports_manifest.jsonruns/<run-id>/error_report.jsonruns/<run-id>/events.jsonlruns/<run-id>/runtime_readiness_report.json
Repository responsibilities:
- persist the latest view
- persist per-run history
- maintain scoped pointers
- optionally mirror compatibility artifacts into the legacy root
When you add an artifact:
- decide whether it is latest-only, per-run, or both
- write it through the repository/artifact layer, not ad hoc from orchestrators
- update doctor/debug/reporting surfaces if operators need to inspect it
state/runtime_map.py builds the user-facing and tooling-facing projection.
It provides:
projectsport_to_serviceservice_to_actual_portprojection
The projection block is the stable place for backend/frontend URLs and statuses.
If you change service naming or readiness semantics, update:
state/runtime_map.py- any tests consuming URL/status projection
- any docs that instruct users to read
runtime_map.json
Startup is coordinated by startup/startup_orchestrator.py, but large parts of the logic live in support modules.
Main responsibilities:
- restart pre-stop and preservation behavior
- effective mode validation
- shell-budget gate enforcement
- project discovery and selection
- port reservation
- requirements startup
- service startup
- artifact persistence
- post-start dashboard entry decision
Supporting modules to know:
runtime/engine_runtime_startup_support.py: shared start-mode and project/port helpersstartup/startup_execution_support.py: requirements and service execution helpersstartup/startup_progress.py: progress/spinner policystartup/startup_selection_support.py: selection and restart targetingstartup/requirements_startup_domain.py: dependency-start policy and timing eventsstartup/service_bootstrap_domain.py: backend/frontend bootstrap and env projection
Developer rule:
- do not pile more branchy policy into
StartupOrchestrator.execute() - move new decisions into support/domain helpers and keep
execute()mostly orchestration glue
Recent startup/resume refactors also introduced an explicit progress layer.
Important files:
startup/startup_progress.pystartup/resume_progress.pystartup/progress_shared.py
Current design intent:
- orchestrators decide what is happening
- progress helpers decide how to report it
- shared spinner group code owns rich progress UI and lifecycle events
That separation matters for maintainability and for interactive diagnostics. Avoid re-embedding spinner or progress-print logic directly into orchestration branches.
startup/resume_orchestrator.py owns the resume contract.
Resume phases:
- load latest state
- enforce cutover budgets for resume
- reconcile state against runtime truth
- optionally restore missing services
- save the reconciled state back through the repository
- optionally enter interactive dashboard
Important helpers live in startup/resume_restore_support.py.
Key difference from shell-era behavior:
- resume is truth-driven, not just pointer-driven
- missing services are surfaced explicitly
- legacy state is sanitized and treated as a stricter case
The dependency system is now registry-driven.
Core pieces:
requirements/core/models.py: dependency metadata contractsrequirements/core/registry.py: built-in dependency registryrequirements/dependencies/*: dependency definitions and env projectorsrequirements/orchestrator.py: retry/failure classification and outcome contractrequirements/*.py: native adapters for postgres, redis, supabase, n8n
Important concepts:
DependencyDefinitiondescribes ids, resources, enable keys, and native/projector hooksRequirementComponentResultis the normalized per-component payload written to stateRequirementsOrchestratorclassifies failures into retryable bind conflicts, transient probe failures, soft bootstrap failures, and hard failures
When adding a new built-in dependency:
- define a
DependencyDefinition - register it in
requirements/core/registry.py - implement env projection if services need derived env vars
- implement native start/cleanup behavior if managed by Python runtime
- update config persistence and docs
- add unit and e2e coverage
The UI surface is split on purpose.
Dashboard backend selection:
ui/backend_resolver.pyui/backend.pyui/dashboard/orchestrator.py
Selector implementation:
ui/textual/screens/selector/*ui/prompt_toolkit_cursor_menu.py
Current behavior matters:
ENVCTL_UI_BACKEND=autodefaults to the legacy dashboard backendENVCTL_UI_EXPERIMENTAL_DASHBOARD=1makesautoprefer Textual when available- selector screens default to the Textual plan-style selector
ENVCTL_UI_SELECTOR_IMPL=planning_styleis the prompt-toolkit rollback path
Do not assume "dashboard backend" and "selector backend" are the same policy surface.
There are two related but separate event streams:
- runtime events written to
events.jsonl - debug flight recorder events written under
debug/session-*/events.debug.jsonl
Important modules:
runtime/engine_runtime_event_support.py: recorder configuration and event emission bridgeui/debug_flight_recorder.py: session recorder, anomaly files, TTY artifactsdebug/debug_contract.py: event normalization schemadebug/debug_bundle_support.py: bundle assembly and redaction helpersdebug/debug_bundle_diagnostics.py: summarized diagnosisruntime/engine_runtime_debug_support.py:debug-pack,debug-report,debug-last
Privacy model:
- command strings are hashed
- string payloads are scrubbed
- printable raw input is opt-in even in deep mode
Developer rule:
- if you add user-input-adjacent telemetry, route it through the existing redaction model
- if the new signal is needed in bundles, make sure the bundle redaction and diagnostic layers know how to handle it
debug/doctor_orchestrator.py is not only a health dump. It is also a migration/cutover gate.
It combines:
- runtime path/status diagnostics
- parity manifest validation
- runtime truth reconciliation
- synthetic-state detection
- runtime readiness contract evaluation
- recent structured failure surfacing
This means doctor changes are high leverage and high risk.
If you change doctor inputs or gate semantics:
- update tests in
tests/python/runtime/andtests/python/debug/ - update
contracts/python_engine_parity_manifest.jsonor related ledgers if the gate contract changes - update user-facing diagnostics docs
The repository now uses a Python-native runtime readiness contract.
Main files:
shell/release_gate.pyruntime/runtime_readiness.pycontracts/python_runtime_gap_report.jsoncontracts/python_engine_parity_manifest.json
Practical meaning:
- release/shipability checks validate Python readiness, not shell migration budgets
- doctor and release gates use the runtime gap report plus parity manifest
- compatibility surfaces are removed only when the generated readiness contract says they are no longer blockers
Runtime truth comes from:
shared/process_runner.pyshared/process_probe.pyruntime/engine_runtime_service_truth.pyruntime/engine_runtime_state_truth.pyruntime/engine_runtime_dashboard_truth.py
Important distinctions:
- process truth: is the PID alive
- listener truth: does the expected port have a live listener
- state truth: does saved state match process/listener reality
- dashboard truth: what should be surfaced to operators without overprobing every render
If you change service readiness behavior, you usually need to update more than one of those layers.
- Add aliases and support in
runtime/command_router.py. - Dispatch in
runtime/engine_runtime_dispatch.py. - Implement behavior in the appropriate orchestrator/domain.
- Add parser, dispatch, and behavior tests.
- Update
docs/reference/commands.mdand any user/developer guide sections affected.
- Define its data contract.
- Persist it through
state/repository.pyorruntime/engine_runtime_artifacts.py. - Decide whether it belongs in latest view, per-run history, or both.
- Update doctor/debug tooling if operators need it.
- Add tests for persistence and reload behavior.
- Register a
DependencyDefinition. - Implement projection/start/cleanup support.
- Wire config keys and defaults.
- Add startup/resume/cleanup tests.
- Document the dependency in user docs and configuration docs.
- Emit through the runtime/debug recorder path.
- Ensure payload is redactable.
- Decide whether it belongs in bundles, diagnostics, or both.
- Add tests for schema and redaction.
- Update the user debugging guide if operators should act on it.
This repository now has meaningful coverage for:
- parser and command routing
- startup/resume/runtime truth
- state repository and compatibility
- debug bundle generation and analysis
- selector/UI behavior
- runtime readiness and shipability gates
- BATS end-to-end parity flows
Minimum bar for non-trivial runtime changes:
- targeted Python unit tests for the affected domain
- any needed BATS coverage when behavior is externally visible or parity-sensitive
- docs updates when behavior or operator workflow changed
If you are unsure where new code belongs:
- config shape or persistence:
config/ - route/flag/command semantics:
runtime/command_router.py - startup or resume behavior:
startup/ - action command logic:
actions/ - dependency management:
requirements/ - artifact or state model:
state/ - runtime health/truth/diagnostics:
runtime/*truth.py,debug/,shell/ - interactive flow:
ui/
If your change needs more than one of those, put the policy in the domain package and keep EngineRuntime as the composition layer.