agent-server: expand API and add full exercise test suite (ch1-ch7) by RicardoRdzG · Pull Request #1897 · colobot/colobot

RicardoRdzG · 2026-04-27T19:43:05Z

Summary

Adds /objects, /get_speed, /set_speed, /get_program, key down/up actions, walk_me_to, and program-slot helpers to the agent server and AgentClient
New test_08 covers the program API; test_09 covers all 7 Chapter 1 exercises; test_10 covers 31 exercises across Chapters 2–7 (38 levels total, all passing)
Two CBot script bugs required test-side workarounds:
- ExchangePost race condition (Remote Control Game relies on DirectX #2, ch6/lvl3): at 16× simulation speed, the slave can read param=0 between two consecutive send() calls in the controller; fixed by sending param before order in the override script
- CBot int/nan sentinel deadlock (Remote Control Comments are written in French #4, ch7/lvl2): int m_type = nan is stored as 0 (not NaN) because static_cast<int>(NaN) is implementation-defined; comparisons against nan always yield false, deadlocking the exchange class; fixed by using -1 as a sentinel in a replacement class-based protocol deployed via a prepare hook

Test plan

Run COLOBOT_AGENT_URL=http://localhost:7777 python3 -m pytest tests/agent/ -v and verify all 38 tests in test_09/test_10 pass
Run the full suite (tests/agent/) to confirm no regressions in test_01–08
Build with cmake --preset MacOS-CI && cmake --build --preset MacOS-CI to confirm no C++ compile errors from the new agent server endpoints

🤖 Generated with Claude Code

Embeds an HTTP server (cpp-httplib, header-only, MIT) that listens on 127.0.0.1:7777 and lets AI agents drive the game UI without OS-specific automation tools. Activated via: ./colobot -agentserver [port] Endpoints implemented: GET /health — liveness check (no main thread needed) GET /state — widget tree for the current screen POST /click — click a widget by stable string ID POST /type — set text in an edit field directly POST /select — select a list item by name or index POST /key — inject SDL key events (F1–F12, Escape, Return…) GET /screenshot — current framebuffer as base64 PNG Threading: HTTP thread queues commands + std::promise; main thread drains queue after each Render() call and fulfils the promise. Widget IDs: stable string registry (ButtonOK, EditPlayerName, …) with evt:N fallback for unlisted controls. Milestones validated on macOS: M0: /health responds immediately M1: /state lists EditPlayerName, ButtonOK, ListPlayers after 10s M2: /type + /click advances past player-select to main menu Also adds: docs/agent-server-spec.md, docs/verification-spec.md, scripts/verify-visual.sh, CLAUDE.md, lib/cpp-httplib/httplib.h. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rs build note On macOS the engine renders to FBOs so glReadPixels on the default framebuffer returns the clear color. Replace with screencapture (same approach as scripts/verify-visual.sh), bringing the game window to front first so the correct Space is captured. Adds Linux fallback using ImageMagick import or scrot. Also document `make UpdateShaders` in CLAUDE.md — required step in worktree builds to copy GLSL shaders into build-dev/data/shaders/ (cmake dev mode handles this automatically in a normal build). Milestones validated on macOS: M0, M1, M2, M3 ✅ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

DetectScreen() identifies the current game screen from widget IDs: PlayerSelect, MainMenu, LevelSelect, InGameMenu, InGame, or unknown. Widget threshold ≥7 prevents false InGame during the loading window (which shows ≤6 unlabeled widgets). BuildStateJson() now includes the detected screen name in the /state response. Also adds -agentserver[=PORT] to the -help output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Tested on Ubuntu 24.04 arm64 (Podman/libkrun): Xvfb provides the virtual display, LIBGL_ALWAYS_SOFTWARE=1 forces Mesa llvmpipe for software GL, and imagemagick's 'import -window root' captures the framebuffer. The -headless flag must NOT be used alongside -agentserver since it skips SDL window/GL context creation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- When -agentserver is active, always create the SDL video surface even if -headless is set, so the GL context exists for rendering - Use SDL_WINDOW_HIDDEN in headless mode to avoid showing a window - Skip SDL_GL_ACCELERATED_VISUAL=1 in headless mode so software renderers (llvmpipe) are accepted without failing context creation Also adds .github/workflows/agent-server-tests.yml: CI job that builds Colobot, starts the agent server under Xvfb + llvmpipe, and runs all four milestones (health, state, type+click, screenshot) as assertions. Artifacts: screenshot PNG and agent server log, uploaded on every run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Phase 2) Screenshots are now captured directly from the OpenGL framebuffer using GetFrameBufferPixels() (glReadPixels on FBO 0) instead of shelling out to screencapture/imagemagick/scrot. This removes all platform-specific and external-tool dependencies from the screenshot path. How it works: - HTTP thread sets m_screenshotPending flag + promise, then waits on the future - CApplication::Render() calls CaptureFrameIfPending() between m_engine->Render() and SDL_GL_SwapWindow(), when FBO 0 holds the final composited frame - CaptureFrameIfPending reads RGBA pixels, flips rows (GL is bottom-up), converts RGBA→RGB, encodes PNG in-memory with libpng, fulfills the promise - Latency: one frame (~16ms); no temp files, no shell commands Benefits over Phase 1: - No imagemagick, scrot, or screencapture dependency - Identical behavior on macOS, Linux, and Windows - Screenshot content is exactly what the GL pipeline rendered (not a screen grab) - Works with hidden SDL windows (groundwork for future true-headless mode) Also removes imagemagick from the GHA workflow apt-get install step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…adback GET /screenshot → GL framebuffer readback (default; headless-safe, no deps) GET /screenshot?source=os → OS screencapture tool (validates platform behavior) The OS path uses screencapture on macOS and import/scrot on Linux — the same approach as before Phase 2, now exposed as an explicit opt-in for cases where you need to validate window chrome, compositor scaling, HiDPI rendering, or other platform-specific behavior that the GL framebuffer does not capture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add tests/agent/ pytest framework with session-scoped fixtures for auto-starting Colobot (Xvfb + llvmpipe) or connecting to a running server - AgentClient wraps all HTTP endpoints with visual regression helpers - Tests numbered (01-04) to enforce execution order across health, state, screenshot (GL readback + snapshot diffing), and navigation - GHA workflow updated to install pytest/requests/pillow and upload snapshots - Spec updated to v0.4 documenting the full automated test suite and dual screenshot modes (source=gl / source=os) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Suppress -Wold-style-cast for cpp-httplib (third-party) via GCC pragma push/pop - Check fread() return value to satisfy -Wunused-result Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The player_select.png was captured on macOS; CI (Linux/llvmpipe) renders differently causing 4.9% pixel diff vs 3% tolerance. The assert_snapshot helper auto-saves a baseline on first run, so CI will generate its own on each run. Local runs will do the same on first use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…g APIs C++ (agent_server): - Register widget IDs for setup screen (ButtonTabGame, ButtonTabDisplay, ButtonApply, ListLanguage), in-game console (EditConsole), and studio editor (StudioEdit, StudioCompile, StudioRun, StudioOK, StudioCancel) - Detect SetupGame, SetupDisplay, and Studio screens in DetectScreen() - Add /click_pos endpoint: injects SDL mouse events at interface-coord [0,1] position, enabling clicks on game objects (robots) not tied to widget IDs - Add Backquote, arrow keys, Delete, Backspace to /key endpoint Python: - AgentClient.click_pos(x, y): raw position click helper - AgentClient.console(cmd): open console → type command → execute - test_05_languages.py: 7 tests verifying all 9 language options in Setup > Game are present, correctly ordered, and selectable - test_06_ingame.py: exercises chapter 1 level 1 — enter level, open robot studio (click_pos), edit and compile code, enable showsoluce and winmission cheats via console, verify win screen appears Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Navigates to Exercise ch1/lv1 (WheeledShooter vs 3 spiders), opens the robot's studio via click_pos, writes a CBot program using radar/turn/fire to kill all spiders, runs it with StudioRun, and asserts the win screen (InGameMenu + ButtonAgain) appears within 30 s. Studio tests are skipped gracefully if the robot is not at the expected viewport position — adjust ROBOT_X/ROBOT_Y constants to tune. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lpers - Register all 5 setup tab buttons (Display/Graphics/Gameplay/Controls/Sound) and fix DetectScreen to use ButtonTabDisplay+ButtonTabGraphics for SetupDisplay, avoiding the >=7-widget fallback that was misclassifying setup screens as InGame - Add navigate_to_main_menu() and navigate_to_player_select() helpers in conftest so fixtures recover from any screen state between tests - Fix test_05 fixture to explicitly click ButtonTabGameplay after opening Setup (setup screen remembers the last active tab, so we can't assume which opens) - Fix EXPECTED_LANGUAGES order to match the game's actual list order - Update main_menu.png snapshot to current 640x480 resolution - Fix winmission flow: exercise levels show a cinematic (LevelComplete + ButtonEndLevel) not InGameMenu; map EVENT_BUTTON_OK=40 and EVENT_DIALOG_OK/CANCEL correctly - Fix CI build: suppress -Wold-style-cast in third-party httplib.h, handle fread() unused-result warning in DoScreenshotOS() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds /objects, /get_speed, /set_speed, /get_program, key down/up actions, and several program-slot helpers to the agent server and AgentClient. New tests cover the program API (test_08), all Chapter 1 exercises (test_09), and exercises from Chapters 2-7 (test_10, 38 levels total). Two script bugs required workarounds: the ExchangePost race condition in Remote Control #2 (fixed by sending param before order) and the CBot int/nan sentinel deadlock in Remote Control colobot#4 (fixed with an int -1 sentinel via a class-based protocol). The open_studio() helper now correctly selects the newly-added program slot after ButtonAddProgram, so StudioRun is always enabled. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

RicardoRdzG · 2026-04-27T19:57:18Z

Opened by mistake — PR should target the fork, not the upstream repo.

RicardoRdzG and others added 18 commits March 8, 2026 02:17

Fix small font in Hires displays

c790fe4

Fix small mouse pointer in Hires displays

d378507

Use dynamic font texture size to avoid empty character rendering

90d3711

tests/agent: add .gitignore to exclude __pycache__

96da697

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

agent_server: fix -Werror build failures in CI

eb61593

- Suppress -Wold-style-cast for cpp-httplib (third-party) via GCC pragma push/pop - Check fread() return value to satisfy -Wunused-result Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

RicardoRdzG closed this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-server: expand API and add full exercise test suite (ch1-ch7)#1897

agent-server: expand API and add full exercise test suite (ch1-ch7)#1897
RicardoRdzG wants to merge 18 commits into
colobot:devfrom
RicardoRdzG:claude/awesome-napier-90fbf9

RicardoRdzG commented Apr 27, 2026

Uh oh!

RicardoRdzG commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RicardoRdzG commented Apr 27, 2026

Summary

Test plan

Uh oh!

RicardoRdzG commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant