Implement supervised runner lane maintainer

## Objective

Implement the durable Launchplane runner lane maintainer required before product repositories, including `cbusillo/odoo-tenant-cm-website`, can depend on Launchplane-managed self-hosted runner lanes.

This is the focused follow-up to #414 after PR #1234 disabled the transient registration shortcut.

## Current hard rule

No product repo agent should target a Launchplane-managed self-hosted runner lane until this issue is complete.

The disabled proof path registered `cm-website-chris-testing` and briefly made it appear online, but it started `run.sh` from inside a GitHub Actions job. That runner went offline after job cleanup. PR #1234 removed that apply behavior. The product repo runner inventory is currently expected to be zero runners.

## Track split

Keep these tracks separate so future agents do not treat documentation or route-contract progress as runner readiness.

### Track A: cm-website route-contract work

Owner: `cbusillo/odoo-tenant-cm-website`.

This track is unblocked by runner infrastructure. It may continue on GitHub-hosted `ubuntu-latest` workflows that call Launchplane over HTTPS/OIDC. It verifies the current Odoo preview/publish/apply route contract and product/runtime records.

Track A does not create, adopt, configure, or require a Launchplane-managed self-hosted runner.

### Track B: self-hosted runner adoption

Owner: Launchplane.

This issue implements Track B. cm-website self-hosted runner adoption stays blocked until Launchplane completes the supervised maintainer proof in the live apply slice below.

Runner adoption in Launchplane reusable workflows is also Launchplane-side work, not a cm-website repo edit.

## Definition of done

`cbusillo/odoo-tenant-cm-website` has a Launchplane-managed runner lane only after all of this evidence exists:

- Launchplane desired-state record exists for `repository=cbusillo/odoo-tenant-cm-website`, `host_name=chris-testing`, `lane_name=cm-website-chris-testing`.
- The runner service is owned by systemd or an equivalent persistent supervisor, not by a GitHub Actions job process.
- The runner process runs as the expected constrained service user.
- GitHub inventory shows the lane online.
- Labels include `self-hosted`, `launchplane`, `launchplane-managed`, `chris-testing`, and `cm-website`.
- Baseline readiness passes after the service is running.
- A completed Launchplane audit record includes service state, GitHub inventory, labels, baseline evidence, and redacted provider evidence.
- A remove/restart path exists for the same managed lane and refuses unmanaged runners.

## Global slice rules

Do this as ordered PR slices. Do not skip ahead to live apply.

PRs 1-4 must be shippable without live host mutation. Their tests should assert no GitHub registration/remove token fetch, no `config.sh`, no privileged helper verb, no process spawn, and no host mutation unless explicitly mocked.

The first slice allowed to return a real `completed` maintainer audit must verify all completion gates in the definition of done. Anything less is a false green lane.

No code path may start a runner with `nohup ./run.sh`, plain `./run.sh &`, PID-file backgrounding, or any GitHub Actions job-owned process lifecycle.

## Execution path

### PR 1: Maintainer desired-state contract and planner only

Goal: define the durable desired-state model and fail-closed plan. No host mutation, no GitHub registration token fetch.

Create or modify:

- `control_plane/contracts/runner_lane_maintainer.py`
- `tests/test_runner_lane_maintainer.py`
- `docs/runner-lane-baseline.md`
- `docs/records.md`

Contracts:

- `RunnerLaneMaintainerDesiredState`
  - `repository`
  - `host_name`
  - `lane_name`
  - `registration_root`
  - `service_user`
  - `systemd_unit_name`
  - `labels`
  - `runner_version_policy`
  - `managed=true`
- `RunnerLaneMaintainerObservedState`
  - GitHub inventory lane, if any
  - local runner directory state
  - service/unit state
  - baseline readiness state
- `RunnerLaneMaintainerPlan`
  - status: `ready | blocked`
  - action: `create | adopt | reconcile | restart | remove`
  - observed shape: `absent | github_only | local_service_only | supervised_active | supervised_inactive | mismatched_labels | unknown_conflict`
  - blockers
  - next_steps
- `RunnerLaneMaintainerAuditRecord`
  - status: `planned | completed | failed`
  - desired state
  - observed pre/post state
  - redacted provider evidence

Fail-closed blockers:

- repository not allowlisted
- host not approved
- registration root outside allowlist
- missing `launchplane-managed` label
- unmanaged existing runner found
- duplicate lane name
- service user mismatch
- unit name mismatch
- path traversal or unsafe lane name
- local/GitHub stale state cannot be safely adopted or removed
- baseline missing or not ready for completed state
- mutate requested without idempotency/confirmation

Tests:

```bash
uv run python -m unittest tests.test_runner_lane_maintainer
uv run --extra dev ruff check control_plane/contracts/runner_lane_maintainer.py tests/test_runner_lane_maintainer.py
uv run --extra dev mypy control_plane/contracts/runner_lane_maintainer.py tests/test_runner_lane_maintainer.py
```

Acceptance:

- Dry-run never requests a GitHub token.
- Planner selects `create` for zero GitHub runners and no local managed service.
- Planner distinguishes absent, GitHub-only, local-service-only, active supervised, inactive supervised, mismatched-label, and unknown-conflict states.
- Planner selects blocked for unmanaged matching lane, unsafe path, duplicate lane, or unknown conflict.
- Planner cannot produce completed state.

### PR 2: Storage and service audit evidence

Goal: make maintainer audits durable before any live host mutation exists.

Create or modify:

- storage migration for `launchplane_runner_lane_maintainer_audits`
- `control_plane/storage/filesystem.py`
- `control_plane/storage/postgres.py`
- `control_plane/service.py`
- `tests/test_filesystem_store.py`
- `tests/test_postgres_store.py`
- `tests/test_service.py`
- `docs/service-boundary.md`
- `docs/records.md`

Route:

- `POST /v1/evidence/runner-lane-maintainer/audits`
- authz action: `runner_lane_maintainer_audit.write`
- idempotency key required

Tests:

```bash
uv run python -m unittest tests.test_filesystem_store tests.test_postgres_store tests.test_service
```

Acceptance:

- Planned, failed, and completed audit records persist.
- Filesystem keys are collision-safe and cannot path traverse.
- Service rejects unauthorized writes.
- Token strings are not persisted.

### PR 3: Host service model and privileged helper contract

Goal: specify and test the systemd boundary before any systemd apply path exists. No live host mutation.

Create or modify:

- `control_plane/workflows/runner_lane_maintainer_executor.py`
- helper contract module, for example `control_plane/contracts/runner_lane_host_service.py`
- tests for service renderer/helper validation
- docs for host helper install policy

Required model:

- Use a persistent systemd system unit, for example `launchplane-runner@cm-website-chris-testing.service`.
- Unit runs as the constrained service user.
- Unit working directory is exactly `<registration_root>/<lane_name>`.
- Unit `ExecStart` points to `<runner-dir>/run.sh`.
- Unit has restart policy.

Privileged boundary:

- Prefer a tiny root-owned helper with explicit verbs:
  - `install-or-update-unit`
  - `daemon-reload`
  - `enable-now`
  - `restart`
  - `stop`
  - `disable`
  - `remove-unit`
- If sudo is used, only allow the helper and fixed validated verbs.
- Do not grant arbitrary `systemctl`, arbitrary file write, or shell access.

Tests:

```bash
uv run python -m unittest tests.test_runner_lane_maintainer
```

Acceptance:

- Helper rejects unsafe lane names, path traversal, mismatched user, mismatched host, roots outside allowlist, and arbitrary unit names.
- Helper exposes only explicit lane-scoped verbs.
- Helper dry-run output is structured and redacted.
- Unit renderer never embeds GitHub registration/remove tokens.
- No code path contains `nohup ./run.sh`, `./run.sh &`, or PID-file backgrounding.

### PR 4: Executor dry-run and mocked apply

Goal: implement maintainer executor behavior with injected/mocked adapters only. No live workflow dispatch yet.

Create or modify:

- `control_plane/workflows/runner_lane_maintainer_executor.py`
- `control_plane/cli_runner_lanes.py`
- `tests/test_runner_lane_maintainer.py`
- `docs/runner-lane-baseline.md`

Executor sequence:

1. Load desired state and pre-observed state.
2. Plan action.
3. If dry-run, write planned audit and stop.
4. If apply, require idempotency/confirmation.
5. Fetch GitHub registration token only when plan needs create/adopt/reconfigure.
6. Run `config.sh` only inside approved root.
7. Use helper to install/update/start systemd service.
8. Read service status.
9. Re-read GitHub inventory.
10. Run/read baseline readiness.
11. Write completed audit only if all verification passes; otherwise failed audit.

Tests:

```bash
uv run python -m unittest tests.test_runner_lane_maintainer tests.test_runner_lane_registration
```

Acceptance:

- Apply success requires mocked service enabled/active + process user + GitHub online + labels + baseline ready.
- Service active but GitHub offline fails.
- GitHub online but service inactive fails.
- Baseline missing/not ready fails.
- Token value never appears in JSON/log payloads.
- Existing `runner-lane-registration-executor --mutate` remains failed or delegates only to this maintainer path; it must not revive the shortcut.

### PR 5: Manual workflow for maintainer dry-run only

Goal: expose a manual ops workflow that can collect dry-run evidence for cm-website.

Create or modify:

- `.github/workflows/runner-lane-maintainer.yml`
- `docs/operations.md`
- `docs/runner-lane-baseline.md`

Workflow properties:

- manual `workflow_dispatch` only
- runs on Launchplane ops lane, not product repo lane
- dry-run default
- apply inputs may exist, but live apply must fail closed until PR 6
- apply requires confirmation phrase and idempotency key
- uploads maintainer result artifact
- dry-run accepts only expected planned/blocked result shapes
- completed result is accepted only after the executor implements all completion gates

Dry-run proof command should target:

- repository: `cbusillo/odoo-tenant-cm-website`
- host: `chris-testing`
- lane: `cm-website-chris-testing`
- labels: `self-hosted`, `launchplane`, `launchplane-managed`, `chris-testing`, `cm-website`

Acceptance:

- Dry-run produces planned audit and artifact.
- No token fetch happens during dry-run.
- cm-website GitHub runner inventory remains zero runners after dry-run.

### PR 6: Live cm-website apply proof

Goal: create the first durable product runner lane only after PRs 1-5 are merged, deployed, and reviewed.

Steps:

1. Confirm `cbusillo/odoo-tenant-cm-website` runner inventory is zero.
2. Dispatch maintainer dry-run and archive artifact.
3. Dispatch maintainer apply with explicit confirmation and idempotency key.
4. Verify systemd unit enabled and active on `chris-testing`.
5. Verify process user is expected service user.
6. Verify GitHub inventory shows `cm-website-chris-testing` online.
7. Verify labels include all required labels.
8. Run baseline readiness after service is active, including Docker credential isolation and Buildx/toolchain evidence when the lane will receive build work.
9. Write completed audit only after every completion gate passes.
10. Run a tiny product-repo no-op workflow on the lane, if product routing is required.

Acceptance:

- Completed audit evidence exists.
- Remove/restart dry-runs are available for the managed lane.
- No product workflow has been changed to require the lane before audit evidence exists.
- cm-website issue is updated with the runner lane name and evidence links.

## cm-website gate

Until PR 6 completes, the cm-website agent may continue only Track A work that uses GitHub-hosted runners and Launchplane HTTPS APIs. It must not edit product workflows to require `self-hosted` or `cm-website-chris-testing`.

The cm-website agent may resume Track B self-hosted-runner-dependent work only when this issue has a completed audit artifact proving:

- lane online
- systemd service active
- labels correct
- baseline ready
- remove/restart control path present

## Validation commands

For each PR, run the focused tests above plus the repo gate appropriate to changed files. Before merge of any slice that touches service/storage:

```bash
uv run python -m unittest tests.test_runner_lane_maintainer tests.test_runner_lane_registration tests.test_service tests.test_filesystem_store tests.test_postgres_store
uv run --extra dev ruff check <changed-python-files>
uv run --extra dev mypy <changed-python-files>
git diff --check
```

Before the live apply PR/dispatch, require main branch CI, Security, CodeQL, and Deploy Launchplane to pass.

## References

- Umbrella: #414
- Shortcut removal: #1234
- cm-website tracking issue: cbusillo/odoo-tenant-cm-website#6
- Docs: `docs/runner-lane-baseline.md`, especially the supervised maintainer section
- Current disabled executor: `control_plane/workflows/runner_lane_registration_executor.py`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement supervised runner lane maintainer #1235

Objective

Current hard rule

Track split

Track A: cm-website route-contract work

Track B: self-hosted runner adoption

Definition of done

Global slice rules

Execution path

PR 1: Maintainer desired-state contract and planner only

PR 2: Storage and service audit evidence

PR 3: Host service model and privileged helper contract

PR 4: Executor dry-run and mocked apply

PR 5: Manual workflow for maintainer dry-run only

PR 6: Live cm-website apply proof

cm-website gate

Validation commands

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement supervised runner lane maintainer #1235

Description

Objective

Current hard rule

Track split

Track A: cm-website route-contract work

Track B: self-hosted runner adoption

Definition of done

Global slice rules

Execution path

PR 1: Maintainer desired-state contract and planner only

PR 2: Storage and service audit evidence

PR 3: Host service model and privileged helper contract

PR 4: Executor dry-run and mocked apply

PR 5: Manual workflow for maintainer dry-run only

PR 6: Live cm-website apply proof

cm-website gate

Validation commands

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions