Skip to content

fix: persist XCTest runner port per device — stop orphaning xcodebuild processes#3167

Open
qwertey6 wants to merge 7 commits intomobile-dev-inc:mainfrom
ReverentPeer:pr/7-persist-xctest-port
Open

fix: persist XCTest runner port per device — stop orphaning xcodebuild processes#3167
qwertey6 wants to merge 7 commits intomobile-dev-inc:mainfrom
ReverentPeer:pr/7-persist-xctest-port

Conversation

@qwertey6
Copy link
Copy Markdown

Proposed changes

Stop orphaning xcodebuild processes between Maestro runs. Each maestro test invocation was picking a random port from 7001-7128, even when an XCTest runner from a previous run was still listening on a different port. The new run couldn't find a runner on its new port, so it spawned a fresh xcodebuild + runner pair. The old runner got killed by simctl (XCTest only allows one runner per app), but the old xcodebuild process was orphaned and stayed alive forever.

Over many runs, this accumulated orphaned xcodebuild processes that held simulator resources, eventually causing connection failures, timeouts, and the symptoms reported in #1299, #2932, and various flaky-test reports.

Impact

  • Saves ~5 seconds per warm run by reusing the existing runner (the isChannelAlive() short-circuit in restartXCTestRunner finally works the way it was meant to)
  • No more orphaned xcodebuild zombies filling up the process table
  • Cross-platform: file-based store, no lsof/process scanning, works on Windows
  • Builds on the runner-reuse work in perf: 4x faster startup — skip unnecessary driver reinstall, cache build products #3139 (which added isChannelAlive() skip but couldn't use it because the port changed every run)

Root cause

TestCommand.selectPort() returns a random port from 7001..7128 for every invocation. The previous run's runner is still listening on its old port (e.g., 7106). The new run picks a new port (e.g., 7117) and:

  1. Calls restartXCTestRunner() for port 7117
  2. isChannelAlive() checks port 7117 → returns false (old runner is on 7106)
  3. Spawns a new xcodebuild test-without-building on port 7117
  4. simctl kills the old runner (only one XCTest runner per app at a time)
  5. The old xcodebuild is now waiting for its dead runner — orphaned forever

After N runs, there are N orphaned xcodebuild processes. Verified locally by running ps aux | grep xcodebuild after a few maestro test invocations.

Fix

Persist the XCTest runner port per device to ~/.maestro/xctest-ports/<deviceId>. On the next invocation, read the saved port and probe it with isPortListening() (a short-timeout Socket.connect()). If something is listening, reuse the port — the existing isChannelAlive() check in restartXCTestRunner() then short-circuits the entire reinstall path. If nothing is listening, the saved port is stale; pick a new random port and update the file.

Verification

Run State Time Port Processes
1 (cold) Fresh start 13s 7106 2 (xcodebuild + runner)
2 (warm) Reuse 8s 7106 (same) 2 (no orphans)
3 (warm) Reuse 7s 7106 (same) 2 (no orphans)
4 (after pkill) Recover 11s 7009 (new) 2 (fresh)

Parallel runs on two simulators each get their own port file and don't interfere.

Why not pick the same port deterministically (e.g., hash deviceId → port)?

Considered. Two issues:

  • Hash collisions across devices on the same machine
  • The user might have the port in use by something else

The persistent file is more flexible: the port is "sticky" once chosen, but each device picks an actually-available port at first start.

Why not scan running processes to find the port (lsof)?

  • Not cross-platform (Windows has no lsof)
  • Fragile parsing
  • Maestro CLI runs in sandboxes that may restrict process inspection

Cross-platform safety

XCTestPortStore uses java.io.File only — no Unix-specific calls. Works on Windows, Linux, and macOS.

Depends on: #3166#3165#3141#3140#3139#3138 (stacked PR chain)

Issues fixed

Builds on #3139 to make the driver-reuse optimization actually work across CLI invocations. Likely contributes to fixing #1299 and the broader class of "iOS driver hangs" reports.

qwertey6 and others added 7 commits April 4, 2026 15:37
iOS simulators share the host's localhost, causing port collisions when
multiple Maestro processes target different sims simultaneously. Session
tracking was per-platform, so two processes on different devices would
interfere with each other's sessions.

Changes:
- Per-device session tracking: SessionStore keys are now
  "{platform}_{deviceId}_{sessionId}" instead of "{platform}_{sessionId}"
- Add --driver-host-port CLI flag for explicit XCTest server port
- Auto-select available ports with isPortAvailable() check
- Refactor SessionStore from singleton to injectable class (DI)
- Add shouldCloseSession(platform, deviceId) for per-device shutdown
  instead of global activeSessions().isEmpty()
- Add cross-process file locking to KeyValueStore (~/.maestro/sessions)
- Append PID to debug log directory to prevent parallel race
- Enable useJUnitPlatform() in maestro-cli (was missing)
- Add SessionStoreTest with 8 tests covering isolation and lifecycle

Verified: 3 iOS simulators + Android emulator running simultaneously,
all passing. Both --driver-host-port (explicit) and auto-port-selection
work correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Default --reinstall-driver to false: reuse a healthy running driver
  instead of killing and reinstalling on every run (~40s saved on iOS)
- XCTestDriverClient checks isChannelAlive() before reinstalling —
  if the user explicitly passes --reinstall-driver, honor it
- Cache extracted iOS build products per-device in
  ~/.maestro/build-products/<deviceId>/ with SHA-256 hash validation:
  skips extraction when source matches cache, re-extracts on upgrade
- Reduce XCTest status check HTTP read timeout from 100s to 3s
- Remove Thread.sleep(1000) heartbeat delay hack (no longer needed
  with per-device session tracking)

Single device: ~52s → ~10-12s. Three devices parallel: ~54s → ~18s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- iOS XCTest runner: add isVersionMatch() to XCTestInstaller interface.
  LocalXCTestInstaller compares SHA-256 hash of build products against
  a .running-hash marker written at startup. restartXCTestRunner now
  checks both isChannelAlive() AND isVersionMatch() — stale runners
  from a previous Maestro version are replaced automatically.

- Android driver: add isDriverVersionCurrent() that hashes the bundled
  maestro-app.apk and maestro-server.apk, compares against stored hash
  in ~/.maestro/android-driver-hash. On mismatch, APKs are reinstalled
  even when reinstallDriver=false.

- App binary cache (clearAppState): getCachedAppBinary now compares
  Info.plist of cached vs installed app. Stale cache from app updates
  is detected and refreshed before reinstall. Per-device cache dirs
  (~/.maestro/app-cache/<deviceId>/) prevent parallel races.

- Add XCTestDriverClientTest (4 tests) and LocalSimulatorUtilsTest
  (3 tests) covering version mismatch, reuse, and cache behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On iOS, waitForAppToSettle has two tiers: a server-side screenshot
hash check (Tier 1, hardcoded 3000ms) and a client-side hierarchy
comparison fallback (Tier 2). The per-command waitToSettleTimeoutMs
config only controlled Tier 2, so even waitToSettleTimeoutMs: 100
would still burn up to 3 seconds in Tier 1.

Fix: use waitToSettleTimeoutMs as the total settle budget. Tier 1
runs with this timeout, and any remaining time goes to Tier 2:

  - swipe with waitToSettleTimeoutMs: 500 → capped at 500ms total
  - default (no config) → unchanged 3000ms behavior

Wikipedia e2e flow with tuned timeouts: 25s vs 53s baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On iOS, when a React Native Pressable has accessibilityLabel set, the
child Text content is collapsed into the parent's accessibility label.
The element's title and value are empty, so Maestro's `text` attribute
was always empty for these elements — making `tapOn: "<text>"` fail to
find buttons that are clearly visible to users.

The element IS reachable via `tapOn { label: ... }` or by regex against
accessibilityText, but those are unintuitive workarounds. Users see text
on screen and expect `tapOn: "that text"` to work — that's the entire
point of the selector.

Fix: in mapViewHierarchy, fall back to element.label (the iOS
accessibility label) when both title and value are empty. accessibilityText
still uses element.label as its canonical source, so the existing
Filters.textMatches accessibilityText fallback continues to work.

This also indirectly fixes the "tap doesn't fire onPress" symptom: when
matching by accessibilityText regex, Maestro might select a parent View
wrapping the Pressable, leading to coordinate taps in the wrong area.
With text populated on the Pressable itself, normal element ranking
picks the correct deepest match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The XCTest runner sometimes binds its HTTP server to ::1 (IPv6) only.
Maestro CLI was hardcoded to connect via 127.0.0.1 (an IPv4 literal),
which cannot reach an IPv6-only socket. Result: every HTTP call fails
with "Connection refused" even though the runner is alive and curl can
reach it via localhost.

Fix: replace 127.0.0.1 with localhost in three places:
- MaestroSessionManager.defaultXctestHost
- LocalXCTestInstaller constructor default
- LocalXCTestInstaller.xcTestDriverStatusCheck (was hardcoded)

okhttp's default Dns resolver returns all addresses for localhost
(both 127.0.0.1 and ::1) and tries them in order on connection
failure, so this works regardless of which address family the
runner binds to.

This is the same root cause as mobile-dev-inc#1299 (open since July 2023).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each Maestro CLI invocation was picking a new random port for the
XCTest server, even when a runner from a previous run was still
listening on a different port. The new run couldn't find a runner on
its new port, so it spawned a fresh xcodebuild + runner pair. The old
runner got killed by simctl (XCTest only allows one runner per app),
but the old xcodebuild process was orphaned and stayed alive forever.

Over many runs, this accumulated orphaned xcodebuild processes that
held simulator resources, eventually causing connection failures and
timeouts.

Fix: persist the XCTest runner port to ~/.maestro/xctest-ports/<deviceId>
after a successful start. On the next invocation, read the saved port
and probe it with isPortListening(). If something is listening, reuse
the port — isChannelAlive() (one level up) will then short-circuit
the entire reinstall path. If nothing is listening, the saved port
is stale; pick a new random port and update the file.

XCTestPortStore is a small file-based per-device store. Cross-platform
(no lsof/process scanning), so works on Windows.

Verified locally with 4 sequential runs on the same simulator:
- Run 1 (cold): 13s, port 7106 saved
- Run 2 (warm, reuse): 8s, same port, 2 processes
- Run 3 (warm, reuse): 7s, same port, 2 processes
- Run 4 (after pkill): 11s, new port 7009, fresh runner

Parallel runs on two simulators get independent port files and
don't interfere with each other.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant