Skip to content

feat(docker): Manjaro smoke test harness with debug controls and CI summary#27

Open
isac322 wants to merge 27 commits into
mainfrom
opencode/cosmic-wolf
Open

feat(docker): Manjaro smoke test harness with debug controls and CI summary#27
isac322 wants to merge 27 commits into
mainfrom
opencode/cosmic-wolf

Conversation

@isac322
Copy link
Copy Markdown
Owner

@isac322 isac322 commented May 5, 2026

Summary

Adds a reproducible, isolated Docker-based smoke test harness for kwin-mcp running on a Manjaro (Arch-family) base image, with developer debug controls (--pause-at, --keep) and a CI-friendly terminal summary block printed at the end of every run.

The harness builds a wheel from the current tree, installs it inside a manjarolinux/base container, runs an end-to-end KWin / AT-SPI / EIS smoke scenario (launch QML app, screenshot, click, type, screenshot again, verify accessibility tree), and writes structured evidence (screenshots, a11y dumps, summary.json, install.json, stdout/stderr logs) under .sisyphus/evidence/<distro>/<UTC ts>/.

Why

CI and contributors need a single command to verify kwin-mcp end-to-end on a clean distro without touching the host KWin session, and the result must be visible in plain CI logs without fetching evidence files. The base distro choice is Manjaro because the official archlinux:base image on Docker Hub is amd64-only, while manjarolinux/base is multi-arch (linux/amd64 + linux/arm64), uses the same pacman package manager, and ships the Arch keyring needed for upstream packages.

Highlights

  • scripts/test-distro.sh manjaro — one-shot harness wrapper. Builds the wheel and image, runs the container with a deterministic name, and bind-mounts the evidence directory into the container.
  • docker/manjaro.Dockerfile — Manjaro base, setcap removed on kwin_wayland, non-root kwinmcp user, isolated venv at /opt/kwinmcp-venv, pacman-based system deps (KWin, spectacle, AT-SPI2, wl-clipboard, wtype, qt6-declarative).
  • docker/entrypoint.sh — installs the wheel, writes install.json, runs smoke_test.py, captures smoke_exit, prints the CI summary block, then honours SMOKE_KEEP. Errexit-safe wiring so the printer never alters smoke_exit.
  • docker/smoke_test.py + docker/smoke_app.qml — 14-task smoke scenario (D-Bus session check, KWin readiness, AT-SPI bus, ScreenShot2 D-Bus, EIS handshake, mouse_click ping, keyboard_type, accessibility-tree verification, etc.) emitting summary.json with verdict / scenarios / tasks_passed / screenshot_sha / install / error / error_type.
  • Debug controls (developer-only, opt-in): --pause-at=<step> pauses at a labelled checkpoint and resumes when touch <evidence>/.continue fires; invalid step names exit 2 before Docker. --keep keeps the container alive after the smoke run for inspection, and docker stop exits the wrapper with the original smoke_exit. Both flows are documented in docs/docker-testing.md.
  • CI-friendly terminal summary (docker/print_summary.py) printed at the end of every run. PASS prints ==> Smoke summary: PASS followed by Evidence path, Tasks passed, and Screenshots. FAIL/ERROR additionally print Error type (when present), a sanitized Reason (control-char stripped, ≤500 chars with a truncation suffix), and a See: summary.json, stdout.log, stderr.log pointer. The printer is stdlib-only (json, os, pathlib, re, sys), a read-only consumer of summary.json, and never alters smoke_exit.
  • Runtime contract (docker/runtime-contract.md) defines the cross-distro contract: mount paths, user, venv location, exit codes, evidence layout, and a forbidden-flags list. The harness intentionally does NOT use --privileged, --cap-add=SYS_ADMIN, --device=/dev/uinput, --device=/dev/input, or --device=/dev/dri. Conditional passthrough of /dev/dri/renderD128 and /dev/dri/renderD129 is allowed because they are user-accessible render-only nodes (perms 0666 by udev) and are required by KWin's ScreenShot2 D-Bus pipeline to complete within its async-call timeout, even in software-rendering mode.
  • Docs: docs/docker-testing.md covers usage, the debugging guide for --pause-at and --keep, the terminal-output templates (PASS / FAIL / ERROR), and the host-vs-container evidence path mapping.

Verification

End-to-end on this branch (Manjaro host, rootless docker over tcp://localhost:2375):

  • scripts/test-distro.sh manjaro — exit code 0, prints ==> Smoke summary: PASS, Tasks passed: 14, Screenshots: initial.png, post-click.png, post-typing.png.
  • --keep: same 4-line summary printed before Container kept alive, byte-identical to the default-mode summary modulo timestamp; docker stop exits the wrapper with rc=0.
  • --pause-at=screenshot_initial: pauses, resumes via touch <evidence>/.continue, prints the PASS summary, exit code 0.
  • --pause-at=garbage: exits 2 before Docker, no summary printed.
  • 10 standalone printer fixtures (PASS, FAIL with assertion, ERROR with truncated multiline repr, trap-fallback minimal, install-failure minimal, missing file, malformed JSON, missing tasks_passed, empty screenshot_sha, control-char Reason): all templates render exactly, no Python tracebacks, 50 invocations under 1 second.
  • Static gates: bash -n docker/entrypoint.sh scripts/test-distro.sh, python3 -m py_compile docker/print_summary.py, uv run ruff check docker/print_summary.py, uv run ty check docker/print_summary.py — all clean.

Naming note

The wrapper slot was originally exposed as archlinux even though the base image has always been manjarolinux/base (the official archlinux:base was rejected as amd64-only). The slot is renamed to manjaro to match reality. archlinux:base and archlinux-keyring references inside the Dockerfile are intentionally preserved — the former is a Docker Hub image identifier used in the rationale comment, and the latter is a pacman keyring package name needed for trusting upstream Arch repositories. Historical orchestration artifacts under .sisyphus/plans/archlinux-* and .sisyphus/notepads/archlinux-* are also preserved as the identifier in use at the time of those work plans.

Out of scope (intentionally NOT changed)

  • src/kwin_mcp/ runtime code, except the screenshot.py and session.py adjustments required by the harness scenario from the initial harness commits.
  • summary.json schema (the printer is a strictly read-only consumer).
  • .github/workflows/ — wiring the harness into CI is a follow-up.

Commits

e22c8c3 chore(docker): scaffold test harness directory + runtime contract
f5e9fb2 feat(docker): arch linux smoke test harness
4871368 test(docker): fix AT-SPI Wayland coord mapping; archlinux T10 POC passes
ab0578c docs(docker): document test harness usage
8d9b30c chore(docker): round-2 fixes per F1-F4 review
984fae4 chore(harness): record final-wave waivers and plan completion
ef1158f fix(docker): restore conditional render-node passthrough (regression from 8d9b30c)
a2c4421 chore(harness): record regression-recovery wave + Round 4 verdicts
474b313 feat(docker): add --pause-at and --keep developer debug flags
8d02a4b docs(docker): debugging guide for --pause-at and --keep
60d9c6e fix(docker): keep-mode stop exits cleanly
bde6f3f feat(docker): standalone smoke summary printer
69bfffe feat(docker): print CI summary in entrypoint
c3869ca docs(docker): document terminal summary output
f51c427 chore(docker): rename harness distro identifier from archlinux to manjaro

How to run

DOCKER_HOST=tcp://localhost:2375 scripts/test-distro.sh manjaro
DOCKER_HOST=tcp://localhost:2375 scripts/test-distro.sh manjaro --keep
DOCKER_HOST=tcp://localhost:2375 scripts/test-distro.sh manjaro --pause-at=screenshot_initial

Evidence: .sisyphus/evidence/manjaro/<UTC timestamp>/.

isac322 and others added 15 commits May 5, 2026 00:29
- Lock manjarolinux/base:20260322 as single multi-arch base (amd64+arm64)
- Add docker/runtime-contract.md: mount paths, uid, venv, screen, evidence layout, exit codes, forbidden flags
- Add docker/smoke_app.qml: vendored QML test app with accessible TextField, Button, Label
- Add docker/README.md: directory purpose + adding-a-distro guide
- Exclude .sisyphus/evidence/ from git via .gitignore
- docker/archlinux.Dockerfile: single multi-arch image (manjarolinux/base:20260322)
  covering linux/amd64 + linux/arm64 via one FROM; no build-deps, kwinmcp uid 1000,
  uv venv at /opt/kwinmcp-venv, XDG_RUNTIME_DIR=/run/user/1000
- docker/entrypoint.sh: wheel discovery + uv pip install + install.json + exec smoke_test.py;
  trap flushes evidence on any exit
- docker/smoke_test.py: AutomationEngine smoke scenario — session_start, launch_app(qml6),
  wait_for_element x3, screenshots x3 (distinct SHA-256), mouse_click, keyboard_type,
  a11y tree diff assertion; writes summary.json with verdict/scenarios/screenshot_sha/install
- scripts/test-distro.sh: uv build --wheel → docker build → docker run; no host-arch
  branching (multi-arch base auto-resolves); DOCKER_HOST=tcp://localhost:2375;
  no forbidden flags (--privileged/--cap-add/--device)
AT-SPI under Qt/Wayland returns window-local coordinates (0,0 = window
content origin), NOT screen-absolute coords. KWin virtual session places
the 320x180 QML window centered on a 1920x1080 screen, giving an offset
of approximately (800, 468).

Fix: smoke_test.py detects the screen offset at runtime by scanning the
initial screenshot for the first run of 20+ consecutive pure-white pixels
(the QML TextField background) in the horizontal mid-band, then subtracts
the AT-SPI-reported TextField position to compute (off_x, off_y).  All
subsequent EIS pointer injections add this offset to the window-local
AT-SPI coordinates.

Evidence (two runs, both exit 0):
  20260504T201603Z — offset=(801,470), verdict=pass, tasks_passed=14
  20260504T201643Z — offset=(801,470), verdict=pass (idempotency confirmed)

Three distinct screenshot SHAs per run, a11y before/after diff present
(Smoke entry gains 'focused'; Status text width 29→37px), install.json
has 5 keys.

Also:
- session.py: remove KDE_FULL_SESSION/KDE_SESSION_VERSION; add
  LIBGL_ALWAYS_SOFTWARE=1 + GALLIUM_DRIVER=llvmpipe for software GL
- screenshot.py: CaptureActiveScreen → CaptureWorkspace
- test-distro.sh: add --device /dev/dri/renderD128 to docker run
- .sisyphus/: boulder, plans, notepads for archlinux-docker-harness task
- docs/docker-testing.md: distro test harness documentation
- ROADMAP.md: add M13 Multi-distro test harness section
  - Arch Linux marked completed; Ubuntu/Debian/Fedora/openSUSE deferred
  - links to docs/docker-testing.md (already committed in 4871368)
- plan: mark T10 (POC end-to-end) and T12 (ROADMAP) checkboxes
- .sisyphus/: orchestrator state + T12 learnings
- Mark T1-T12 + F1-F4 + DoD + Final Checklist all complete
- Append 3 follow-up scope-expansion waivers to decisions.md
  (m0207 pattern follow-ups required by T10 POC reality)
  - Waiver A: docker/smoke_test.py:159,181 sleep(1.5) as render-settle
  - Waiver B: docker/runtime-contract.md 13th section (Package substitutions)
  - Waiver C: src/kwin_mcp/screenshot.py:39 D-Bus routing early-return
- F1-F4 Round 2/3 all APPROVE under waivers
- T10 POC verified twice with idempotency (verdict=pass, tasks_passed=14)
- Boulder complete.
…from 8d9b30c)

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Closes archlinux-docker-harness-regression plan.

Regression: commit 8d9b30c removed dri_args render-node passthrough as
auto-resolution for F1 Round 1 'spirit violation' (interpreted '--device=/dev/dri'
forbidden flag as covering all DRI device passthrough). This broke fresh harness
runs with DBusException('Screenshot got cancelled') because KWin ScreenShot2
needs render-node access even in software-rendering mode.

Recovery (commit ef1158f, R-C1):
- Restored conditional dri_args block guarding /dev/dri/renderD12{8,9} (render-only nodes)
- Distinguished forbidden '--device=/dev/dri' (blanket DRI, includes root-only card0/card1)
  from allowed '--device /dev/dri/renderD128' (render-only, 0666 udev perms, no display/input)
- Documented as Waiver D in parent plan's decisions.md + runtime-contract.md

R2 verification: 3 fresh harness runs all PASS (verdict=pass tasks_passed=14):
- 20260505T043010Z (R2 run 1)
- 20260505T043032Z (R2 run 2 idempotency)
- 20260505T043527Z (F3 Phase D MANDATORY, fresh run during reviewer)

R3 Round 4 verdicts (all APPROVE):
- F1 oracle Plan Compliance: APPROVE — Must Have 8/8 / Must NOT 15/15 / Forbidden flags CLEAN
- F2 unspecified-high Code Quality: APPROVE — bash -n, py_compile, ruff, ty all PASS
- F3 unspecified-high Real Manual QA: APPROVE — Phase D MANDATORY executed, fresh run exit 0
- F4 deep Scope Fidelity: APPROVE — 5 active waivers verified (m0207, A, B, C, D)

Lessons learned: F3 Phase D ('actually execute the harness') was de-facto optional
in Round 2 (reviewer accepted historical evidence). Round 4 made it mandatory and
caught the regression risk early. Plan-driven recovery prevented silent shipping
of broken harness across future audits.

R4 user approval: received.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Print a CI-friendly smoke summary from summary.json for pass, fail, error, trap-fallback, missing, and malformed inputs. Keep the printer exit-safe so smoke exit handling remains unchanged.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Print the CI summary immediately after smoke_exit=$? and before the SMOKE_KEEP branch, while errexit is still disabled with || true, so the smoke result is captured safely. The wrapper bind-mounts the printer read-only at /opt/docker/print_summary.py alongside the other smoke assets.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Document the Terminal output section with PASS, FAIL, and ERROR templates, plus the mapping from container /evidence/<ts> paths to host .sisyphus/evidence/<distro>/<timestamp>/ bundles.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…jaro

The base image has always been manjarolinux/base (archlinux:base was rejected as amd64-only on Docker Hub), but the wrapper slot was still exposed as `archlinux`. Align the user-facing identifier with reality.

Renamed docker/archlinux.Dockerfile -> docker/manjaro.Dockerfile. Updated wrapper SUPPORTED list, image tag, container name prefix, header comments, and docs (docs/docker-testing.md, docker/README.md, ROADMAP.md).

Preserved on purpose: archlinux:base mention (Docker Hub image rationale), archlinux-keyring + `pacman-key --populate archlinux manjaro` (package/keyring identifiers; renaming would break the build), and .sisyphus/plans/archlinux-* and .sisyphus/notepads/archlinux-* historical orchestration artifacts.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

📝 Docs & SEO Review

Source files changed in this PR:

src/kwin_mcp/screenshot.py
src/kwin_mcp/session.py

Consistency check results:

✅  All documentation/plugin SEO checks passed.

Run @docs-seo in Claude Code to perform a full documentation review.

isac322 added 12 commits May 5, 2026 22:49
CI ruff format --check failed on this file (drift introduced by an earlier harness commit). Re-running `uv run ruff format` produces no further diff.
Default to tcp://localhost:2375 for the local Manjaro dev setup, but only when DOCKER_HOST is unset. CI runners (and any environment with a working unix socket) can override it. Removes the hardcoded `DOCKER_HOST=tcp://localhost:2375 docker ...` per-line prefix in favour of a single export at the top.
Runs scripts/test-distro.sh against every docker/<distro>.Dockerfile slot via a fail-fast=false matrix on every push to main and every PR. Currently only manjaro is wired up; new distros can be added by appending the slot name to matrix.distro and providing the corresponding Dockerfile.

Evidence (.sisyphus/evidence/<distro>/) is uploaded as an artifact regardless of pass/fail so PASS/FAIL/ERROR runs both leave inspectable logs and screenshots. Sets DOCKER_HOST=unix:///var/run/docker.sock to override the wrapper's local-dev fallback.
GitHub-hosted runners have no DRM device, so the harness fails with `DBusException("Screenshot got cancelled")` because KWin's ScreenShot2 D-Bus pipeline needs a render node even in software-rendering mode (already documented in docker/runtime-contract.md).

Load the in-kernel `vkms` (Virtual KMS) module before invoking the wrapper to expose /dev/dri/renderD128, then normalise its perms to 0666 so the existing dri_args block in scripts/test-distro.sh picks it up unchanged.
GitHub-hosted runners use an Azure-flavoured Ubuntu kernel that ships vkms only via the linux-modules-extra-<kver> package, so the previous direct `modprobe vkms` failed with `Module vkms not found`. Install the matching modules-extra package first.
…enderD128

vkms alone only creates DRM control nodes (cardN). vgem (Virtual GEM,
render-only driver) is what actually exposes /dev/dri/renderD128, which
KWin's ScreenShot2 pipeline requires for EGL context creation. Replace
the polling loop with udevadm settle and assert the render node exists
before handing off to the wrapper.
Publish and consume ghcr.io/isac322/kwin-mcp-minimal-test-env:manjaro as the prebuilt minimal test environment. scripts/test-distro.sh now pulls KWIN_MCP_TEST_IMAGE when set and only builds docker/<distro>.Dockerfile for local fallback runs.

GitHub-hosted runners do not expose /dev/dri/renderD*, so the matrix job now runs full KWin ScreenShot2 smoke only when a render node is available and otherwise verifies the prebuilt image contract plus wheel installation. This keeps PR CI green on GitHub-hosted runners while preserving full smoke execution for self-hosted/render-capable runners.
Tag and consume ghcr.io/isac322/kwin-mcp-minimal-test-env:manjaro as the prebuilt Manjaro minimal test environment. The Dockerfile now carries org.opencontainers.image.source so GHCR associates the package with this repository.

scripts/test-distro.sh pulls KWIN_MCP_TEST_IMAGE when set and only builds locally when unset. The GitHub-hosted matrix verifies the prebuilt image contract when no DRM render node is available, while render-capable/self-hosted runners still execute the full ScreenShot2 smoke path.
The Manjaro minimal-test-env image is hosted on GHCR, but a freshly
pushed package is not automatically readable by GITHUB_TOKEN until the
package visibility/repo-link settings propagate. Letting CI fail on
that single pull denial blocks every PR.

Both the wrapper (scripts/test-distro.sh) and the workflow contract
step now try docker pull first and fall back to building the same
docker/<distro>.Dockerfile locally on failure. Pull-success path
remains unchanged once GHCR access is granted.

Verified locally:
- KWIN_MCP_TEST_IMAGE=<bad-tag> ./scripts/test-distro.sh manjaro
  → pull denied → local build → full smoke PASS, exit 0.
One-shot probe that gathers ground-truth answers for whether a
/dev/dri/renderD* render node can be provisioned on a GitHub-hosted
ubuntu-latest runner. Reports kernel flavour, module index, kernel
config (DRM_VGEM/DRM_VKMS/MODULE_SIG*), apt package contents
(linux-modules-extra and friends), modprobe behaviour, and source-
build feasibility (linux-headers, kernel source, lockdown state).

Triggers on push to opencode/cosmic-wolf only and on workflow_dispatch;
non-gating, will be removed once the question is settled.
KWin 6 ScreenShot2 has a structural dependency on /dev/dri/renderD*
which GitHub-hosted Azure runners cannot provide (vgem disabled in
the Azure kernel build, vkms only creates card nodes, no env-var
combination bypasses GBM/EGL allocation). Empirically verified by:

- modprobe vgem -> FATAL (CONFIG_DRM_VGEM unset in /boot/config-azure)
- modprobe vkms -> only /dev/dri/cardN, no renderD*
- KWIN_COMPOSE=Q, EGL_PLATFORM=surfaceless,
  MESA_LOADER_DRIVER_OVERRIDE={swrast,kms_swrast} -> all still fail
- KWin starts and all input/a11y/AT-SPI/EIS pipelines work without
  the render node; only ScreenShot2 D-Bus calls cancel.

Decouple the test from that one structural dependency:

- docker/smoke_app.qml: ApplicationWindow becomes FullScreen so its
  origin is (0,0); AT-SPI window-relative coordinates become absolute
  and the screenshot-derived offset translation is no longer needed.
  Status Label exposes its current text via dynamic Accessible.name
  so the a11y tree contains the live value.

- docker/smoke_test.py: removes _screen_offset, sets offset to (0,0).
  Wraps engine.screenshot() in best_effort_screenshot which returns
  (None, None) when KWin cancels the call. Adds two new assertions
  driven by the a11y tree (verify_status_clicked, verify_status_typed
  _value): after each input action, polls the tree until the expected
  Status text substring appears. Adds an extra mouse_click on Ping
  after typing so the QML onClicked handler copies entry.text into
  status -- this turns 'keyboard reached the app' into a render-
  independent observable. SHA distinctness is asserted only when all
  three frames captured (i.e. local runs with --device).

- docker/print_summary.py: filters None sha values so the Screenshots
  line is omitted when no frames were captured.

- .github/workflows/docker-harness.yml: drops the DRM detect / contract
  fallback split. The smoke job now runs scripts/test-distro.sh
  unconditionally with KWIN_MCP_TEST_IMAGE set; the wrapper already
  pulls GHCR with local-build fallback.

- .github/workflows/drm-probe.yml: deleted. Question answered.

Verified locally:
- WITH /dev/dri/renderD* passthrough: 16 tasks pass, 3 screenshots,
  SHA distinctness asserted.
- WITHOUT --device (CI mode): 16 tasks pass, 0 screenshots, all a11y
  state-change assertions hold.
CI on GitHub-hosted Azure runners silently skips engine.screenshot()
because KWin ScreenShot2 needs /dev/dri/renderD*, which the runner
kernel does not provide. The kwin_mcp screenshot stack is therefore
not exercised in CI; regressions there only surface in local runs
with --device. Document the gap in best_effort_screenshot's docstring
with a TODO(screenshot-coverage) marker that lists the two paths to
close it (in-tree software fallback in screenshot.py, or self-hosted
runner with a render node). Pure documentation change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant