Skip to content

docs(spike): screenshot + KWin scripting feasibility reports#21

Open
isac322 wants to merge 2 commits into
mainfrom
overhaul/pr1-screenshot2-spike
Open

docs(spike): screenshot + KWin scripting feasibility reports#21
isac322 wants to merge 2 commits into
mainfrom
overhaul/pr1-screenshot2-spike

Conversation

@isac322
Copy link
Copy Markdown
Owner

@isac322 isac322 commented May 5, 2026

Wave 1 of the kwin-mcp backend overhaul. Combines two pre-implementation spikes into a single docs-only PR.

ScreenShot2 virtual-session feasibility

docs/design/screenshot2-virtual-feasibility.md + scripts/screenshot2_probe.sh

Probe runs kwin_wayland --virtual with 10 env-var combinations × 3 ScreenShot2 methods, plus Phase 2 app scenarios (kcalc, kcalc + kdialog ×2 multi-window) and Phase 3 forced-compositor stress.

Conclusion: ScreenShot2 is the primary fast path in normal virtual KWin sessions on KWin 6.6.x. spectacle remains a compatibility fallback for forced-compositor failures (KWIN_COMPOSE=O / KWIN_COMPOSE=Q).

Correction note: This replaces an earlier (force-pushed away) commit that incorrectly reported 30/30 ScreenShot2 failures. See @isac322's review feedback for the original ground-truth that triggered re-running the probe.

KWin scripting feasibility on KDE Plasma 6

docs/design/kwin-scripting-feasibility.md + scripts/kwin_scripting_probe.sh

Probe verifies KWin 6.6.4 /Scripting D-Bus interface. JS templates rewritten from KDE Plasma 6 scripting docs (no kdotool GPL-3.0 copy).

Decision recorded: in-memory script loading via loadScriptFromText is unavailable on KWin 6.6.x. The downstream window backend (PR #26) uses the loadScript(tempfile_path, name) pattern instead.

README

Architecture diagram + How It Works → Screenshot Capture both reflect "KWin ScreenShot2 D-Bus (spectacle fallback)".

Reproduction

bash scripts/screenshot2_probe.sh       # ScreenShot2 spike
bash scripts/kwin_scripting_probe.sh    # KWin scripting spike

Files

  • README.md
  • docs/design/screenshot2-virtual-feasibility.md (new)
  • docs/design/kwin-scripting-feasibility.md (new)
  • scripts/screenshot2_probe.sh (new)
  • scripts/kwin_scripting_probe.sh (new)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

📝 Docs & SEO Review

Source files changed in this PR:

integrations/claude-code/skills/kwin-desktop-automation/SKILL.md
integrations/opencode/plugin/skill/kwin-desktop-automation/SKILL.md

Consistency check results:

✅  All documentation/plugin SEO checks passed.

Run @docs-seo in Claude Code to perform a full documentation review.

@isac322
Copy link
Copy Markdown
Owner Author

isac322 commented May 5, 2026

I think the current conclusion in this spike needs to be corrected before this PR is used as design evidence.

The current PR says that, on this host with KWin 6.6.4, all 30 ScreenShot2 attempts failed and therefore spectacle should remain the primary screenshot path. I re-ran the PR's probe and also ran additional end-to-end virtual-session experiments. Those results do not support the current conclusion.

Concrete results from the successful experiments

Environment:

KWin: 6.6.4
Session type: kwin_wayland --virtual only
Live KWin: not used
ScreenShot2 object: /org/kde/KWin/ScreenShot2
ScreenShot2 interface: org.kde.KWin.ScreenShot2
Permission env: KWIN_SCREENSHOT_NO_PERMISSION_CHECKS=1, KWIN_WAYLAND_NO_PERMISSION_CHECKS=1

1. Baseline virtual session, no app

I started a normal virtual KWin session and called ScreenShot2 directly.

Observed result:

CaptureActiveScreen: OK 1280x720, PNG size ~3653 bytes
CaptureWorkspace: OK 1280x720, PNG size ~3653 bytes
CaptureArea(0,0,1,1): OK 1x1, PNG size ~70 bytes

This proves that ScreenShot2 itself is reachable and functional in a normal empty virtual KWin session on this host.

2. Baseline virtual session with an app

I started a normal virtual KWin session, launched kcalc, then called the same ScreenShot2 methods.

Observed result:

CaptureActiveScreen: OK 1280x720, PNG size ~78655 bytes
CaptureWorkspace: OK 1280x720, PNG size ~78655 bytes
CaptureArea(0,0,1,1): OK 1x1, PNG size ~70 bytes

This proves the success is not just an empty-desktop artifact. ScreenShot2 captures the virtual workspace with an actual app window present.

3. App manipulation captured by ScreenShot2

I started virtual KWin, launched kcalc, typed 2+3=, then called CaptureActiveScreen through the existing capture_screenshot_dbus() path.

Observed result:

Output: /tmp/opencode/kwin-screenShot2-kcalc-2-plus-3.png
Size: 72979 bytes
Visual inspection: KCalc is visible and the display shows result 5

This proves ScreenShot2 can capture not only an app window, but also the result of an input/action performed inside the virtual session.

4. Full workspace capture

I called both CaptureActiveScreen and CaptureWorkspace after launching and manipulating KCalc.

Observed result:

CaptureActiveScreen: 800x600, size ~72878 bytes
CaptureWorkspace: 800x600, size ~72878 bytes
Visual inspection: the full virtual workspace is captured, with KCalc visible inside it and result 5 shown

This proves CaptureWorkspace can capture the whole virtual KWin workspace, not only a cropped app window.

5. Multiple windows in one workspace capture

I started virtual KWin, launched multiple windows, and captured the workspace with CaptureWorkspace.

Observed result:

Scenario A:
- Apps: kcalc, kcharselect, kate
- Output: /tmp/opencode/kwin-screenShot2-multiple-apps-workspace.png
- Dimensions: 1200x800
- Visual inspection: all three windows are visible, though Kate overlaps the others

Scenario B:
- Apps: kcalc, kdialog Dialog-One, kdialog Dialog-Two
- Output: /tmp/opencode/kwin-screenShot2-multiple-small-apps-workspace.png
- Dimensions: 1200x800
- Visual inspection: all three windows are visible at the same time; none are fully hidden

This proves ScreenShot2 CaptureWorkspace captures multiple visible windows together in the same virtual workspace.

6. Re-running this PR's probe

I also re-ran the probe from this PR on the same host. The current result is not 30/30 failures.

Observed result:

baseline: 3/3 OK
KWIN_PREFER_SW_QPAINTER=1: 3/3 OK
LIBGL_ALWAYS_SOFTWARE=1: 3/3 OK
MESA_LOADER_DRIVER_OVERRIDE=swrast: 3/3 OK
EGL_PLATFORM=surfaceless: 3/3 OK
LIBGL_ALWAYS_SOFTWARE=1 + MESA_LOADER_DRIVER_OVERRIDE=swrast: 3/3 OK
LIBGL_ALWAYS_SOFTWARE=1 + EGL_PLATFORM=surfaceless: 3/3 OK
KWIN_COMPOSE=O: 3/3 FAIL
KWIN_COMPOSE=Q: 3/3 FAIL
LIBGL_ALWAYS_SOFTWARE=1 + MESA_LOADER_DRIVER_OVERRIDE=swrast + KWIN_COMPOSE=O: 3/3 FAIL

So the current probe result is 21/30 OK, with failures concentrated in forced compositor settings.

What is wrong with the current PR design/report

1. It treats forced compositor failures as baseline virtual-session failures

The important distinction is:

  • normal kwin_wayland --virtual baseline: works
  • forced KWIN_COMPOSE=O/Q variants: fail

The report currently collapses these into one feasibility conclusion. That makes the conclusion too broad.

2. It does not validate the user-facing artifact

The probe checks whether bytes arrive from the ScreenShot2 pipe, but the report does not require converting those bytes into PNG and validating the image. Also, the script says PNG written while writing .raw files.

For this project, the important question is not only “did D-Bus return bytes?” It is:

Can kwin-mcp capture a readable screenshot of the virtual workspace that shows the app/action the agent performed?

The current probe does not answer that strongly enough.

3. It misses app/action/workspace scenarios

The current matrix mostly tests raw ScreenShot2 method calls. It should include at least:

  • empty workspace capture
  • single app visible in workspace
  • app after input/action
  • multiple windows visible in the same workspace

Without those cases, it is too easy to draw the wrong design conclusion.

4. It uses the result as policy evidence for spectacle primary

The data should not justify “keep spectacle primary because ScreenShot2 is infeasible.” The corrected evidence supports a different policy:

Use ScreenShot2 as the primary/fast path in normal virtual sessions, with spectacle as compatibility fallback for environments where ScreenShot2 is unavailable or cancelled.

The logs around the failure cases include EGL/Zink/portal/bus noise, so org.kde.KWin.ScreenShot2.Error.Cancelled should be treated as an environment/backend failure mode, not as evidence that ScreenShot2 is unusable in virtual sessions.

How I would change this PR

Step 1: Rewrite the feasibility doc

Replace the current “30/30 failed” table with two sections.

First section: Normal virtual-session baseline

CaptureActiveScreen: OK
CaptureWorkspace: OK
CaptureArea(0,0,1,1): OK
Single-app workspace capture: OK
Manipulated-app capture: OK
Multi-window workspace capture: OK

Second section: Forced compositor/backend variants

KWIN_COMPOSE=O: fails with ScreenShot2.Error.Cancelled
KWIN_COMPOSE=Q: fails with ScreenShot2.Error.Cancelled
LIBGL_ALWAYS_SOFTWARE + MESA_LOADER_DRIVER_OVERRIDE=swrast + KWIN_COMPOSE=O: fails with ScreenShot2.Error.Cancelled
other tested rendering envs: OK in the current run

Then change the conclusion to something like:

ScreenShot2 is feasible in normal kwin_wayland --virtual sessions on this host. Some forced compositor/backend configurations can still make KWin cancel ScreenShot2 captures. kwin-mcp should prefer ScreenShot2 for virtual sessions and retain spectacle as a fallback.

Step 2: Change the probe script to produce real PNG artifacts

The helper should convert raw BGRA data into PNG, like the production code does:

Image.frombytes("RGBA", (width, height), data, "raw", "BGRA", stride).save(output_path, "PNG")

Then each result should report:

method
status
width
height
stride
png_path
png_size
error_name/error_message if failed
kwin_still_alive=true/false

Do not print PNG written if the artifact is actually .raw.

Step 3: Add app-level validation cases

Add a baseline scenario that does this:

1. start virtual KWin
2. launch kcalc
3. type 2+3=
4. CaptureWorkspace
5. convert to PNG
6. keep artifact path
7. record dimensions and file size

Expected result from my run:

KCalc visible, result 5 visible, PNG around 70KB depending on geometry/theme

Add a multi-window scenario:

1. start virtual KWin at 1200x800
2. launch kcalc
3. launch kdialog --title Dialog-One --msgbox "First dialog window"
4. launch kdialog --title Dialog-Two --inputbox "Second dialog window"
5. CaptureWorkspace
6. convert to PNG
7. keep artifact path

Expected result from my run:

KCalc, Dialog-One, and Dialog-Two are all visible in the same workspace PNG.

Step 4: Keep forced compositor tests, but classify them as stress tests

Keep the KWIN_COMPOSE=O/Q cases, but report them separately as backend stress/failure cases. They should not be mixed into the primary feasibility decision for the normal virtual session path.

Step 5: Update README/SKILL wording accordingly

The docs should not say screenshot is spectacle primary because ScreenShot2 is infeasible. A more accurate wording is:

kwin-mcp prefers KWin ScreenShot2 D-Bus capture in virtual sessions and falls back to spectacle when ScreenShot2 is unavailable, cancelled, or unauthorized.

Suggested new design decision

I would change the design decision from:

Keep spectacle as primary; ScreenShot2 stays opportunistic only.

to:

Use ScreenShot2 as the primary fast path for normal virtual KWin sessions. Keep spectacle as a compatibility fallback. Treat KWIN_COMPOSE=O/Q ScreenShot2 cancellation as an environment-specific backend failure, not as a general virtual-session limitation.

Given this, I would not use the current 30/30 failure table as design input without rerunning and updating the report.

@isac322 isac322 force-pushed the overhaul/pr1-screenshot2-spike branch from 8a36e19 to c48759d Compare May 5, 2026 07:39
@isac322
Copy link
Copy Markdown
Owner Author

isac322 commented May 5, 2026

Correction applied based on review feedback

Following @isac322's review feedback, I re-ran scripts/screenshot2_probe.sh and confirmed the original spike conclusion was wrong.

What was wrong

The original commit reported 30/30 ScreenShot2 attempts fail in virtual KWin sessions and recommended keeping spectacle as primary. The user's review showed actual results were:

  • Baseline + 6 software-rendering env-var combos: all OK, real PNGs at 1280×720
  • Multi-window scenarios (kcalc, kcalc + kdialog ×2, kcalc + kcharselect + kate): all captured real content
  • Failures concentrated in KWIN_COMPOSE=O / KWIN_COMPOSE=Q forced-compositor variants

What was changed

Force-pushed overhaul/pr1-screenshot2-spike:

  • scripts/screenshot2_probe.sh rewritten to convert BGRA → PNG (matching production _capture_frame_burst_dbus) and exercise Phase 2 app scenarios (kcalc, multi-window via kdialog × 2)
  • docs/design/screenshot2-virtual-feasibility.md rewritten with the corrected conclusion: ScreenShot2 is the primary fast path in normal virtual KWin sessions; forced-compositor variants are a separate Phase 3 stress class
  • README.md reverted to original "ScreenShot2 D-Bus (spectacle fallback)" wording
  • Re-ran probe: 30 OK / 9 FAIL (Phase 1 baseline 21 OK + Phase 2 multi-window 9 OK; Phase 3 forced compositor 9 expected FAIL with Cancelled)

Cascading fixes to downstream PRs

Verification

  • All 3 affected PRs are MERGEABLE / CLEAN post-push.
  • Per-branch lint/format/ty/ci_guards/SEO all pass.
  • tests/integration/test_screenshot_backends.py (4 tests) and test_dbus_call_compat.py (6 tests) green.
  • Reproducible: bash scripts/screenshot2_probe.sh (uses kwin_wayland --virtual --no-lockscreen --width 1280 --height 720).

Sorry for the wrong conclusion in the original spike — fork-resource exhaustion and a portal/D-Bus boot race during the original run produced misleading results that I should have caught.

@isac322 isac322 force-pushed the overhaul/pr1-screenshot2-spike branch from c48759d to a245d99 Compare May 5, 2026 11:03
@isac322 isac322 changed the title docs(spike): ScreenShot2 virtual-session feasibility report docs(spike): screenshot + KWin scripting feasibility reports May 5, 2026
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@isac322 isac322 changed the base branch from main to launch/backend-overhaul May 5, 2026 13:34
Wave 1 of kwin-mcp backend overhaul. Combines two pre-implementation
spikes into one docs-only PR.

ScreenShot2 virtual-session feasibility (`docs/design/screenshot2-virtual-feasibility.md`):
- Probe runs `kwin_wayland --virtual` with 10 env-var combos x 3 ScreenShot2
  methods, plus Phase 2 app scenarios (kcalc, kcalc + kdialog x2 multi-window)
  and Phase 3 forced-compositor stress
- Conclusion: ScreenShot2 is the primary fast path in normal virtual KWin
  sessions; spectacle remains a compatibility fallback for forced-compositor
  failures (KWIN_COMPOSE=O/Q)
- Reproduction: `bash scripts/screenshot2_probe.sh`

KWin scripting feasibility on KDE Plasma 6 (`docs/design/kwin-scripting-feasibility.md`):
- Probe verifies KWin 6.6.4 /Scripting D-Bus interface, JS templates rewritten
  from KDE Plasma 6 scripting docs (no kdotool GPL-3.0 copy)
- Decision recorded: scripting blocked on KWin 6.6.x because
  `loadScriptFromText` is unavailable; window backend will use
  `loadScript(tempfile)` pattern instead
- Reproduction: `bash scripts/kwin_scripting_probe.sh`

README arch diagram + How It Works section keep ScreenShot2 D-Bus as primary
with spectacle fallback (no spectacle-only wording).

Affected files:
- README.md
- scripts/{screenshot2_probe.sh, kwin_scripting_probe.sh}
- docs/design/{screenshot2-virtual-feasibility.md, kwin-scripting-feasibility.md}
@isac322 isac322 force-pushed the overhaul/pr1-screenshot2-spike branch from a245d99 to d2e3932 Compare May 5, 2026 13:44
Base automatically changed from launch/backend-overhaul to main May 29, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant