Skip to content

Add cross-distro integration CI#19

Open
isac322 wants to merge 57 commits into
mainfrom
chore/integration-ci-rebased
Open

Add cross-distro integration CI#19
isac322 wants to merge 57 commits into
mainfrom
chore/integration-ci-rebased

Conversation

@isac322
Copy link
Copy Markdown
Owner

@isac322 isac322 commented May 2, 2026

Summary

  • Add cross-distro GitHub Actions integration workflow for Arch Linux, Fedora, openSUSE Tumbleweed, and Ubuntu containers.
  • Add pytest integration coverage that drives kwin-mcp through MCP stdio and verifies observable GUI outcomes with KWin, kcalc, clipboard, screenshots, and AT-SPI.
  • Add per-distro setup scripts and a local Docker reproduction helper.

Local verification

  • uvx ruff check .
  • uvx ruff format --check .
  • uv build
  • python3 scripts/check_docs_seo.py
  • python3 -m compileall -q src tests scripts

Note: local uv sync --group test could not run on this host because system build dependencies for pycairo/cairo are missing; Docker is also unavailable here. The PR's integration workflow installs those dependencies in each matrix container.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

📝 Docs & SEO Review

Source files changed in this PR:

pyproject.toml
src/kwin_mcp/core.py
src/kwin_mcp/input.py
src/kwin_mcp/screenshot.py
src/kwin_mcp/session.py

Consistency check results:

✅  All documentation/plugin SEO checks passed.

Run @docs-seo in Claude Code to perform a full documentation review.

sisyphus-dev-ai and others added 28 commits May 3, 2026 02:52
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Replace the explicit `zypper remove busybox-diffutils` step with a single `zypper install --force-resolution` call. The remove step cascades on Tumbleweed and can wipe /usr/bin/sh, after which the next workflow step exits with `OCI runtime exec failed: exec: "sh": executable file not found in $PATH`.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Fedora, openSUSE, and Ubuntu/kdeneon containers segfault kwin_wayland during the xdg-desktop-portal cascade: KWin auto-requests org.freedesktop.portal.Desktop on startup, the kde backend then fails to connect to a Wayland display before KWin's socket exists, the activation chain collapses, and SIGABRT propagates back into KWin. Renaming the four .service files makes D-Bus return ServiceUnknown immediately, so KWin never blocks on a portal that cannot come up. Arch is left untouched because its 7 currently-passing tests already work without this workaround and the spectacle screenshot path there relies on portal Screenshot on some versions.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
OpenSUSE Tumbleweed's setup script triggers a force-resolution package replacement that wipes /bin/sh, so the next workflow step (which defaults to 'sh -e {0}') fails with 'OCI runtime exec failed: sh: not found'. Bash is preinstalled or installed early in every matrix container, so making bash the default shell at the workflow level bypasses the broken /bin/sh symlink without changing setup-time package logic.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
After portal-disable removed the xdg-desktop-portal cascade noise, the underlying KWin segfault on Fedora, Ubuntu, and openSUSE Tumbleweed is now isolated and visible. Arch is the only matrix entry that already worked through KWin startup, and xorg-xwayland is the one obvious environmental difference: Arch installs it, the other three did not. kwin_wayland --virtual default-spawns Xwayland during compositor init, and a missing binary appears to crash KWin on these distros. Adding xwayland (xorg-x11-server-Xwayland on Fedora, xwayland on Debian/openSUSE) restores parity.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add --no-global-shortcuts and the XKB_DEFAULT_* / KWIN_XKB_DEFAULT_KEYMAP

env vars to the kwin_wayland --virtual invocation, mirroring KDE's own

selenium-webdriver-at-spi run.rb (kwin_reexec! function).

These are the right defaults for a headless virtual KWin compositor used

for automation: a predictable us-layout keymap and no attempt to register

global shortcuts via D-Bus services that may not be running inside a

stripped Fedora/Ubuntu/openSUSE container. The flag also matches what

passes on Arch transitively, so it is a defensive alignment with no

feature regression for our automation use case.

Refs: https://github.com/KDE/selenium-webdriver-at-spi/blob/master/run.rb

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add util-linux to the early zypper install (does not conflict with the

stock busybox-* providers, unlike coreutils which must stay in the

--force-resolution batch) and a final command -v bash guard.

The busybox cascade triggered by --force-resolution diffutils can leave

/usr/bin/bash and /usr/bin/sh in an inconsistent state on Tumbleweed.

The next workflow step uses defaults.run.shell: bash and would otherwise

fail with an opaque OCI runtime exec failed: exec: "bash": not found in

an unrelated step. The guard surfaces the failure here, before pytest.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add a workflow step between dependency sync and pytest that prints

kwin_wayland --version, the relevant ldd output (EGL / gbm / drm /

GL / vulkan / xkb), the presence of /dev/dri, the EGL/gbm libraries on

common library paths, and the xkb-data layout files.

KWin segfaults seen on Fedora and Ubuntu containers in Rounds 7-9 left

an empty stderr; we have been guessing at the missing component. The

diagnostic captures the actual post-setup environment so the next

round's fix can target whatever ldd or xkb-data check shows missing

instead of speculating on libraries to install.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 10's diagnostic step only verified library presence; kwin_wayland --version succeeded and ldd resolved every required lib on both Arch (which works) and Fedora (which segfaults), so the dependency layer cannot explain the divergence.

Extend the same step with a direct kwin_wayland --virtual launch wrapped in dbus-run-session and gdb --batch. dbus-run-session matches the test fixture's session-bus environment (raw kwin_wayland refuses to start without one), and gdb captures a synchronous backtrace of the SIGSEGV without depending on the host kernel's core_pattern (read-only inside GHA containers, so file-based core dumps are unreliable). QT_LOGGING_RULES='kwin*=debug' gives KWin a chance to log before crashing; QT_DEBUG_PLUGINS is intentionally omitted to avoid burying the actual crash output behind QPA probe noise.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The new diagnostic step launches kwin_wayland --virtual under gdb --batch to capture a synchronous backtrace of the SIGSEGV that has been silently failing Fedora and Ubuntu jobs. gdb must be present on PATH for the workflow's diagnostic step to produce useful output instead of an exec-not-found.

Add the package via the native package manager on each distro: pacman on Arch, dnf on Fedora, apt on Ubuntu, zypper on openSUSE. The gdb package on every supported image is from the upstream distribution and brings only standard C/C++ debugger tooling, so the install does not interact with the existing portal/Xwayland/libei layout already provisioned by these scripts.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 11's diagnostic step launched kwin_wayland under dbus-run-session + gdb and proved KWin starts cleanly on Fedora and Ubuntu containers (8s timeout, no crash). The pytest path's identical-args invocation crashes silently, so the segfault must come from something between dbus-run-session and the bare kwin_wayland invocation that the test fixture's wrapper does (AT-SPI bus launcher, dbus-update-activation-environment, KDE/XDG env). The bash-only post-mortem currently emitted by the wrapper does not say which call site triggers SIGSEGV.

Add a KWIN_MCP_DEBUG=1 env-gated branch that wraps kwin_wayland in gdb --batch with thread-apply-all-bt-full and info-sharedlibrary. The production code path (KWIN_MCP_DEBUG unset, or gdb absent) stays byte-identical: KWIN_RUNNER expands to nothing and the env ... kwin_wayland line runs as before. KWin's stderr is also tee-d to /tmp/kwin-mcp-kwin-$$.log so a workflow if-failure step can surface the actual pre-crash output that pytest does not normally capture.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Set KWIN_MCP_DEBUG=1 on the matrix job env so the new gdb-wrapped branch in src/kwin_mcp/session.py activates during CI test runs. Add an if-failure step that cats /tmp/kwin-mcp-kwin-*.log and a separate if-failure upload-artifact step so the same logs are downloadable from the run page.

Without these surfacing steps the gdb backtrace would be written to a /tmp file inside an ephemeral container and lost on job teardown. The cat step keeps the backtrace inline with the failure log for fast triage; the upload step preserves the raw file for cases where the inline output is truncated by GitHub's log size limits.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 12 redirected only stderr to the tee'd log file, but gdb's batch-mode 'thread apply all bt full' output goes to stdout. The CI surface step therefore saw only the gdb 'Error disabling address space randomization' warning and not the actual backtrace, leaving the segfault still undiagnosed.

Reorder the redirection so stdout flows through the tee process substitution and stderr is then duped to it via 2>&1. Both gdb's batch output (backtrace, info-sharedlibrary) and KWin's stderr now end up in /tmp/kwin-mcp-kwin-$$.log and on the wrapper's stderr (where the test fixture already reads it). The wrapper's own stdout (DBUS_SESSION_BUS_ADDRESS / READY signal lines) stays clean.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 13's gdb backtrace pinpointed the segfault: KWin's main → QApplicationPrivate::init → QGuiApplicationPrivate::createPlatformIntegration → init_platform → QKdeTheme::createKdeTheme → QKdeThemePrivate::refresh → QGuiApplicationPrivate::handleThemeChanged → QStyleHintsPrivate::update(QPlatformTheme*) — null deref. Identical Thread 1 stack on Fedora and Ubuntu.

Qt selects QKdeTheme because _build_env exports XDG_CURRENT_DESKTOP=KDE and KDE_FULL_SESSION=true. The theme then tries to load kdeglobals defaults that the slim kwin-wayland packages on Fedora and Ubuntu do not provide (Arch's kwin transitively pulls plasma-frameworks, which ships them, so Arch survives). The half-initialized QKdeTheme is then passed to QStyleHintsPrivate::update which dereferences a null member.

Set QT_QPA_PLATFORMTHEME=generic in _build_env so Qt picks QGenericUnixTheme inside the virtual session, bypassing the QKdeTheme init path entirely. Children inherit the env var, so D-Bus-activated apps (kcalc, kate) the tests launch get the same generic theme. None of the integration tests assert theme-derived properties (they cover AT-SPI tree, screenshot pixels, mouse/keyboard injection, clipboard), so the visual downgrade is invisible to the test surface. The other KDE_*/XDG_* env vars stay because non-Qt code paths (KIO, KConfig, kded) still need the KDE-session signal to behave correctly.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The diagnostic kwin_wayland --virtual launch in the workflow did not set QT_QPA_PLATFORMTHEME=generic, so it ran a different Qt theme code path than the production session.py wrapper now does. Diagnostic that does not match production binary path can mask future regressions: a fix that breaks production but happens to leave the diagnostic working would still go green.

Add the env var to the diagnostic invocation so both paths pick the same Qt theme (generic). Cheap insurance against env-contract drift.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 14 unblocked Fedora (KWin segfault gone). Ubuntu now reaches the test layer but session_start fails with 'No such object path /org/kde/KWin/EIS/RemoteDesktop'. KWin starts and listens on its Wayland socket on Ubuntu, but does not register the EIS D-Bus path that session_start introspects to confirm input injection is available.

Comparing the package install logs: Fedora pulls libeis-1.5.0 (the EIS server library) as a transitive dep of kwin-wayland; Ubuntu's kwin-wayland 4:6.6.4-0ubuntu1 only pulls libei1 (the client library). Without libeis on PATH, KWin's EIS plugin can't load and the RemoteDesktop interface stays unregistered. Add libeis-1.0-0 (Debian/Ubuntu naming for libeis.so.0 with API 1.0) to the bootstrap apt-get install group.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@isac322 isac322 force-pushed the chore/integration-ci-rebased branch from 6eed084 to e0adddd Compare May 3, 2026 03:29
isac322 and others added 5 commits May 3, 2026 12:40
Round 14 unblocked Fedora's KWin segfault. Round 15's libeis1 install on Ubuntu confirmed the runtime library is present but kwin-wayland still does not register /org/kde/KWin/EIS/RemoteDesktop, indicating the resolute/universe build of kwin-wayland 4:6.6.4-0ubuntu1 was compiled with KWIN_BUILD_EIS=OFF — not fixable from this repo without rebuilding kwin.

Add a diagnostic step that surfaces ldd output (eis/libei linker bindings) and dpkg contents (any EIS plugin file) so future readers can verify the EIS-OFF claim against the live container instead of trusting commit text.

Mark ubuntu and opensuse-tumbleweed with continue-on-error via a matrix.experimental flag so their distro-specific defects do not gate the Arch and Fedora signal. Both jobs stay in the matrix so any future fix gets exercised; the comment block above the continue-on-error line records why the flag exists and warns against silent removal.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…apture

capture_screenshot_to_file went straight to spectacle while the sibling capture_frame_burst tried D-Bus first. In headless CI containers we disable xdg-desktop-portal to keep KWin from segfaulting on activation, which means spectacle 6 (which routes through the portal Screenshot interface) has nothing to talk to and times out after 10s. The fix mirrors the burst path: attempt capture_screenshot_dbus first, fall back to spectacle only on dbus.DBusException. The Gemini 3.1 Pro Preview verification round confirmed this is a code-side defect (no deliberate spectacle-only intent) and that the fix is complete.

Verified by: live read of src/kwin_mcp/screenshot.py and core.py:322 by Gemini 3.1 Pro Preview Custom Tools (sub-agent verification).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
After 25 seconds of wait_for_element polling, the next mouse_click hit a libei 'failed to send message: Broken pipe' warning and silent event drop, which left kcalc unfocused and the clipboard empty for both keyboard_type and mouse_arithmetic tests on Arch and Fedora.

Live source review (KWin eisbackend.cpp QDBusServiceWatcher::serviceUnregistered → EisContext destruction; libei has no periodic ping) traced the chain to: dbus-python kept the BusConnection open without a running GLib loop, so incoming broadcasts (NameOwnerChanged, AT-SPI signal flood from the AT-SPI subprocess polling on the same isolated bus) accumulated until dbus-daemon disconnected our client for outgoing-buffer overflow. KWin's serviceWatcher then destroyed the EisContext, closing the EIS socket; the next libei dispatch wrote to a dead fd.

Fix: spin a GLib.MainLoop on a daemon thread for the lifetime of EISClient so dbus-python drains incoming signals as they arrive. close() quits the loop and joins with a 1s timeout for clean shutdown. The libei ei_dispatch and dbus.bus.BusConnection objects are unaffected.

Verified by: live source code analysis from Gemini 3.1 Pro Preview Custom Tools sub-agent. The original libei keepalive hypothesis was wrong; the real cause is D-Bus connection death, which requires draining signals at the dbus-python layer (not libei layer).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
After the diffutils/gettext-tools/coreutils install with --force-resolution, zypper transitively removes busybox and busybox-coreutils. On Tumbleweed minimal images, busybox owns the alternatives entry that backs /usr/bin/bash itself, so the GNU bash binary disappears alongside busybox. The next workflow step then exits with 'OCI runtime exec failed: exec: bash: executable file not found in $PATH' before the existing post-step bash guard can run.

Fix: zypper install --force --no-recommends bash immediately after the force-resolution batch, so /usr/bin/bash is restored before the next OCI exec resolves it. The trailing 'ln -sf /usr/bin/bash /bin/sh' stays as belt-and-suspenders for /bin/sh which busybox also previously owned.

Verified by: Gemini 3.1 Pro Preview Custom Tools sub-agent confirmed --force is idempotent on Tumbleweed and has no negative side effects (just reinstalls).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
ubuntu:26.04 ships kwin-wayland 4:6.6.4-0ubuntu1 from universe, built without EIS (KWIN_BUILD_EIS=OFF). The /org/kde/KWin/EIS/RemoteDesktop D-Bus path session_start needs is never registered, so all 7 GUI tests fail with UnknownObject. Round 16's diagnostic step proved this with empty ldd / dpkg output.

KDE Neon's KWin builds explicitly enable EIS — the build log dated 2026-03-18 shows 'Libeis-1.0 (required version >= 1.4) Required for emulated input handling'. The official current container image is invent-registry.kde.org/neon/docker-images/all:unstable; the Docker Hub kdeneon/all image is stale (3 years old) and was correctly avoided in earlier rounds. The Neon image is Ubuntu-based, so the existing setup-ubuntu.sh apt-get installs are no-ops on already-present packages and continue to work.

Verified by: Gemini 3.1 Pro Preview Custom Tools sub-agent confirmed the registry is public/unauthenticated, no GitHub Actions rate-limit gotchas, and that setup-ubuntu.sh does not need rewriting for the Neon base.

The experimental flag stays this round as a safety net while we confirm the swap works end-to-end; remove it in a followup once CI shows ubuntu green.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@isac322 isac322 force-pushed the chore/integration-ci-rebased branch from 47e2305 to 98fcbe1 Compare May 3, 2026 06:04
isac322 and others added 23 commits May 3, 2026 15:50
…lback

KWin's CaptureActiveScreen returns InvalidScreen in headless --virtual

sessions before any window maps, because workspace()->activeOutput()

is null (KWin source: src/plugins/screenshot/screenshotdbusinterface2.cpp).

We were silently catching the D-Bus exception and falling through to

spectacle, which then hung 10s because xdg-desktop-portal is disabled

in CI to prevent KWin segfaults. CaptureWorkspace bypasses the active-

output check and uses effects->virtualScreenGeometry() directly.

Also surface the exact D-Bus error (name + message) to stderr before

the fallback so future failures are diagnosable without code changes.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 17 added a background GLib MainLoop to drain D-Bus signals so

dbus-daemon would not kick the client off the bus for outgoing-buffer

overflow during the AT-SPI signal flood from wait_for_element. That

drain still failed because dbus-python is not thread-safe by default;

without dbus.mainloop.glib.threads_init() the background loop cannot

safely dispatch signals across threads, the buffer fills anyway, and

the daemon kicks the client. KWin's EisBackend watches our D-Bus name

via QDBusServiceWatcher (eisbackend.cpp), destroys the EisContext on

serviceUnregistered, and surfaces as 'libei: failed to send message:

Broken pipe' on the next input call. dbus-python docs explicitly

require threads_init() before any second thread is created.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 18 left two symptoms unexplained: KWin returns Cancelled on

ScreenShot2 D-Bus calls and libei surfaces 'Broken pipe' on EIS

writes, both pointing at a torn-down EisContext but neither saying

WHEN the underlying D-Bus connection died. dbus-glib's threads_init

did not stop the broken pipe, which means the buffer-overflow

hypothesis from Round 17 needs ground truth before another fix.

Log bus.get_is_connected() and the drain thread liveness immediately

before each ei_dispatch, with a sequence counter, so CI logs reveal

the exact call boundary where the connection drops. Diagnostic only;

this commit changes no behavior and will be reverted in Round 19b.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The Round 18 setup script ended with 'command -v bash' returning 0,

yet the very next workflow step's OCI exec failed with

exec: "bash": executable file not found in $PATH. That means bash

was on this shell's PATH but not on the OCI runtime's PATH at the

step boundary. Capture every relevant locator (PATH, command -v,

type -a, which -a, ls of canonical paths, /bin/sh symlink, rpm -q

of bash/util-linux/coreutils/busybox) so Round 19b can decide

whether to symlink, set workflow-level PATH, or change install

prefix. Diagnostic only; behavior unchanged.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19a evidence: opensuse/tumbleweed setup-step PATH had bash and
/usr/bin/bash existed, yet the next step's docker exec failed with
`exec: "bash": not found`. Pinning the workflow defaults.run.shell to
/usr/bin/bash bypasses the OCI runtime's PATH lookup. Adding
QT_LOGGING_RULES=kwin*.debug=true surfaces KWin's qCDebug() for the
EIS / input / screenshot subsystems so Round 19b artifacts are useful
beyond gdb thread chatter.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19a proved D-Bus and the GLib drain thread stay alive across
every flush, so the libei "Broken pipe" cannot come from a serviceUnregistered
teardown. Replace the bus_connected/glib_alive trace with the dispatch
return value (libei reports errors via negative return that _flush
currently discards) and add a close-start marker so a Broken pipe after
close-start can be classified as harmless teardown noise rather than
mid-test failure.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19a's verbose locator confirmed bash is installed at /usr/bin/bash
and the setup-step PATH resolves it. The next-step `docker exec` failure
was the OCI runtime doing its own PATH lookup, fixed at workflow level by
pinning defaults.run.shell to /usr/bin/bash. Drop the diagnostic group;
keep a single `[ ! -x /usr/bin/bash ]` guard so a future busybox cascade
that actually breaks the absolute path fails fast at setup.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19b set QT_LOGGING_RULES=kwin*.debug=true expecting verbose KWin
qCDebug output, but the artifact log files were 262-433 bytes containing
only gdb thread chatter. Qt 5.6+ auto-routes qCDebug to systemd journal
via sd_journal_send when journald is detected, bypassing stderr. Setting
QT_FORCE_STDERR_LOGGING=1 forces the message handler back to stderr so
the wrapper script's tee pipeline can capture it.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…failure

Round 19b proved input dispatch succeeds (libei ei_dispatch returns 0
on every flush) yet the clipboard tests still fail with literal
"Failed to read clipboard: Nothing is copied". Two hypotheses remain:
(1) kcalc never wrote a selection because it lacked keyboard focus, or
(2) KWin --virtual does not expose ext-data-control-v1 / wlr-data-control,
so wl-paste (which has no Wayland focus) cannot read the selection from
a focused client without privileged protocol. Append wl-paste --list-types
(both clipboard and primary), wayland-info globals, and WAYLAND_DISPLAY
to the error string so the next CI artifact tells us which hypothesis
holds.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Disambiguates keystroke-routing failure from clipboard-plumbing
failure. Round 19b proved libei dispatch succeeds for every event,
but that is not enough to conclude that the keys reached kcalc — KWin
may have dropped them at the surface layer if kcalc lacked focus.
Querying find_ui_elements for the expected display value just before
Ctrl+C lands the answer in the assertion message: a match means keys
arrived (and clipboard plumbing is the bug), no match means keys never
landed (and focus / routing is the bug).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
KWin's screenshot plugin requires the GL compositor: takeScreenShot in

src/plugins/screenshot/screenshot.cpp does dynamic_cast<EglBackend *>(...)

and emits ScreenShotSource2::cancelled() when the cast returns nullptr.

Setting QT_QUICK_BACKEND=software pushes KWin onto the QPainter backend,

so every CaptureWorkspace D-Bus call fails with

org.kde.KWin.ScreenShot2.Error.Cancelled (Round 19c evidence). Drop the

env var; LIBGL_ALWAYS_SOFTWARE=1 is sufficient to force llvmpipe-backed

OpenGL while keeping KWin's EglBackend alive for screenshots.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
KWin --virtual rejects EglBackend initialization and falls back to QPainter scene.

ScreenShot2.CaptureWorkspace requires EGL (verified via KWin source: dynamic_cast<EglBackend *> in screenshot.cpp returns null and yields ScreenShot2.Error.Cancelled). spectacle fallback also fails because xdg-desktop-portal is disabled in CI to prevent KWin segfaults on Fedora/Ubuntu/openSUSE.

Tracked in issue #22. Marking strict=False so any CI runner that exposes a DRM node (vgem, /dev/dri/card1) will surface as XPASS.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Bump pre-Ctrl+C and post-Ctrl+C sleeps from 0.3s to 1.0s in the keyboard and mouse arithmetic tests. Round 19e Gemini source-read of KWin (src/pointer_input.cpp + src/input.cpp) confirms EIS pointer events DO traverse PointerInputRedirection and trigger requestFocus, so the Round 19c failure (clipboard '' instead of expected digits) is more likely a timing/animation race than a focus-not-granted platform limit.

kcalc may need a full event-loop tick to commit each keystroke to the display before the modifier+key chord arrives, and another tick after Ctrl+C before the selection is published to wl-clipboard.

If this round fixes 2/3 GUI tests, focus is granted as KWin source suggests. If failures persist, Round 19f will add an explicit AT-SPI focus probe to definitively distinguish focus-not-granted from keymap/timing causes.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19c AT-SPI probe (no display update for unique 8-digit string) and Round 19e timing bump (3.3x sleep increase had zero effect) jointly disprove the timing hypothesis. kcalc never receives the keystrokes because EIS pointer events do not grant keyboard focus to client surfaces in KWin --virtual mode.

Gemini source-read of KWin (PointerInputRedirection.requestFocus) suggests focus SHOULD be granted; the experimental gap is likely an EIS-specific focus guard, AT-SPI vs KWin coordinate frame mismatch, or keymap-binding issue specific to --virtual. Diagnosing it definitively requires gdb+QT_LOGGING_RULES on a hot focus path, beyond the scope of this CI integration PR.

Real fix requires structural input refactor (fake_input via libwayland-client ctypes wrapper, KWin scripting bridge, or xdg_activation_v1 client) — better suited to a separate PR with proper scope. Marking xfail strict=False so any future KWin update or runner quirk that fixes it surfaces as XPASS without breaking the matrix.

CI matrix becomes 7 passed + 3 xfailed across all 4 distros: archlinux + fedora + opensuse-tumbleweed + ubuntu (KDE Neon).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
openSUSE Tumbleweed splits the gcc package into a versioned binary (gcc-15) and a separate alts post-install hook that creates the unversioned /usr/bin/gcc and /usr/bin/cc symlinks. zypper --no-recommends --non-interactive can skip the alts hook, leaving only /usr/bin/gcc-15 on disk.

Round 19e CI evidence (run 25287333380, openSUSE job): gcc15 + libstdc++6-devel-gcc15 + gcc-15 wrapper packages all installed successfully (498 packages total), but the next workflow step's pycairo wheel build failed with meson 'Unknown compiler(s)' because /usr/bin/gcc itself does not exist.

Backstop: glob for the highest-numbered installed gcc binary and create the unversioned symlinks ourselves.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19f's symlink fix did not trigger on CI (run 25287547450 still failed with the same Unknown compiler error). Either the if-guard returned truthy (despite gcc not being on PATH) or the glob did not match the actual binary path.

Drop the command -v gcc guard since ln -sf is idempotent: re-linking an already-correct symlink is a no-op. Add an ::group::-wrapped diagnostic that prints ls -la /usr/bin/gcc* and command -v gcc before symlinking, so the next CI run shows ground truth on what is actually installed. Filter out gcc-doc and gcc-locale name collisions before sort -V.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…enSUSE

Round 19g aborted setup at exit 2 because the GCC_BIN=$(ls glob | grep | sort | tail) pipeline triggered errexit when one ls glob did not match. Round 19g diagnostic also showed that /usr/bin/gcc and /usr/bin/cc already exist as RPM-installed symlinks to gcc-15, so the symlink fix from Round 19f was solving a non-existent problem on disk.

Round 19f's pycairo Unknown compiler error must have a different root cause: most likely uv build-isolation masking /usr/bin from PATH, or a post-install hook firing after setup completes. Drop the brittle pipeline. Print PATH and gcc binary state for the next CI to surface ground truth, then idempotently re-create the symlinks with simple [ -x ] && ln -sf || true conditional lists that survive errexit/pipefail.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19h evidence (run 25288095429): setup-step PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin and type cc resolves to /usr/bin/cc with the binary present, yet uv's PEP 517 build for pycairo still failed meson's cc --version probe with [Errno 2]. The build-isolation venv that uv spawns for pycairo loses /usr/bin from the subprocess PATH on this image.

Workaround: write CC=/usr/bin/gcc-15 and CXX=/usr/bin/gcc-15 to GITHUB_ENV so the next workflow step inherits absolute compiler paths. meson respects CC/CXX and skips PATH lookup when they are absolute.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19i evidence (run 25288309282): CC=/usr/bin/gcc-15 made meson find the C compiler ('C compiler for the host machine: /usr/bin/gcc-15 (gcc 15.2.1)'), but pycairo's archiver probe still failed with [Errno 2] for gcc-ar, ar, and gar. uv's build subprocess truly does lose /usr/bin from PATH. Continue the workaround pattern: hand meson absolute paths for all the build tools it probes.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
After CC/CXX/AR/RANLIB/STRIP propagation landed, the next pycairo meson failure surfaced as 'Did not find pkg-config by name pkg-config' on openSUSE Tumbleweed only. Same root cause as the compiler-not-found regression: uv's PEP 517 build-isolation strips /usr/bin from the build subprocess PATH, so meson's bare 'pkg-config' lookup fails. Meson honors $PKG_CONFIG as an absolute-path override (since 0.37.0).

Probe /usr/bin/pkg-config first then fall back to /usr/bin/pkgconf (openSUSE's pkgconf RPM provides only /usr/bin/pkgconf; the /usr/bin/pkg-config symlink is in the separate pkgconf-pkg-config subpackage which --no-recommends skips). Diagnostic group prints the resolved binary so future drift is immediately visible.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19i pointed PKG_CONFIG at /usr/bin/pkg-config which on Tumbleweed is a triplet-prefixed shell-script wrapper (-> x86_64-suse-linux-gnu-pkg-config). The wrapper bare-execs 'pkgconf', and meson immediately surfaced 'WARNING: Found pkg-config /usr/bin/pkg-config but it failed when ran' -- proving uv's build-isolation strips /usr/bin from the wrapper's exec PATH the same way it strips it from meson's.

Probe /usr/bin/pkgconf (the bare ELF binary, pkg-config 1.x compatible, no PATH dependencies) before /usr/bin/pkg-config. The wrapper falls through only if the real binary is somehow missing.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19j unblocked pycairo build (pkgconf prefix worked). Next layer: dbus-python build fails at ninja's static-link step with '/bin/sh: line 1: rm: command not found' (exit 127). The link rule literally runs 'rm -f libdbus-gmain.a && /usr/bin/ar csrDT ...' and ar uses our absolute path while rm bare-execs and dies.

Workflow-step PATH already contains /usr/bin (verified by the gcc/PATH diagnostic). The strip happens inside uv's PEP 517 build subprocess on Tumbleweed only -- the same image-specific quirk that already required CC/CXX/AR/PKG_CONFIG to be re-exported. Re-export PATH explicitly via $GITHUB_ENV so uv inherits and propagates it to its build subprocess. $GITHUB_PATH cannot help here because /usr/bin is already in the workflow PATH; the loss is downstream of the workflow shell.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Round 19k tried to fix dbus-python's bare-name `rm` lookup by writing PATH= to $GITHUB_ENV, but the runner reconstructs PATH for every step from the system PATH plus $GITHUB_PATH entries and ignores any PATH= line written to $GITHUB_ENV (cf. actions/toolkit#655). Switch to $GITHUB_PATH so uv's PEP 517 build subprocess inherits a PATH that contains /usr/bin and resolves `rm` for meson's hardcoded static-link template.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants