Skip to content

Finalize split runtime cutover#225

Open
malpern wants to merge 15 commits intomasterfrom
codex/duplicate-key-investigation-pr
Open

Finalize split runtime cutover#225
malpern wants to merge 15 commits intomasterfrom
codex/duplicate-key-investigation-pr

Conversation

@malpern
Copy link
Owner

@malpern malpern commented Mar 9, 2026

Summary

  • finalize the macOS split-runtime cutover so KeyPath Runtime is the normal runtime path
  • separate duplicate-key investigation scaffolding from this refactor branch
  • publish a clean branch based directly on master with Rust target artifacts removed

Validation

  • swift test --filter RuntimeCoordinatorTests|DiagnosticsServiceTests|KeyPathCLITests|WizardNavigationEngineTests
  • swift build --product KeyPath
  • ./Scripts/quick-deploy.sh

Notes

  • duplicate-key investigation work is preserved separately on branch codex/duplicate-key-investigation-diagnostics

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfea143956

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

throw NSError(domain: "KeyPathOutputBridge", code: Int(code), userInfo: [NSLocalizedDescriptionKey: "failed to bind output bridge socket: \(code)"])
}

chmod(session.socketPath, 0o666)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce authenticated access to output bridge socket

The companion opens each UNIX socket as world-writable (0o666), and request handling accepts .emitKey directly without checking peer credentials or requiring a prior authenticated handshake (handle(clientFD:) routes .emitKey immediately). Because the helper also creates the socket/session directories with 755 permissions (HelperService.ensureKanataOutputBridgeDirectory), any local user can discover/connect to the socket and inject privileged key events, which is a security regression.

Useful? React with 👍 / 👎.

Comment on lines +380 to +383
reply(true, nil)
NSLog("[KeyPathHelper] restartKanataOutputBridgeCompanion acknowledged, scheduling async restart")

DispatchQueue.global(qos: .userInitiated).asyncAfter(deadline: .now() + 0.5) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Wait for companion restart before reporting success

This method acknowledges success before attempting the restart, then performs the actual restart asynchronously. Callers such as KanataSplitRuntimeHostService.restartCompanionAndRecoverPersistentHost() immediately query companion health and treat a negative result as a hard recovery failure, so this race can incorrectly fail recovery and tear down the split runtime even when the delayed restart would have succeeded.

Useful? React with 👍 / 👎.

@claude
Copy link

claude bot commented Mar 9, 2026

PR Review pending - see detailed review below

@claude
Copy link

claude bot commented Mar 9, 2026

Bug 1 - Duplicate isKarabinerElementsRunning() in DiagnosticsService.getSystemDiagnostics(): Lines 382 and 427 both await this method, spawning two separate pgrep calls and appending two distinct diagnostics (warning + error) for the same condition. Capture the result once at the top and reuse it.

@claude
Copy link

claude bot commented Mar 9, 2026

Bug 2 - Hard-coded TCP port 37001 in DiagnosticsService: fetchTcpStatusInfo() and fetchTcpHello() both construct KanataTCPClient(port: 37001) directly. Every other caller uses PreferencesService.shared.tcpServerPort. If the user changes the port in preferences, these diagnostic methods silently connect to the wrong port.

@claude
Copy link

claude bot commented Mar 9, 2026

Bug 3 - currentRuntimeStatus() underreports when legacy daemon is active: RuntimeCoordinator+ServiceManagement.swift:180-186 returns .stopped whenever the split runtime host is not running, even if the legacy RecoveryDaemonService is still active. resetToDefaultConfig() in RuntimeCoordinator.swift:877 checks runtimeStatus.isRunning before triggering a TCP reload. In a half-migrated state this causes the reset to silently skip the reload. Even if the legacy daemon is out of scope going forward, the silent skip should at minimum emit a log warning.

@claude
Copy link

claude bot commented Mar 9, 2026

Dead code - fetchTcpStatusInfo() has no callers: DiagnosticsService.swift:886 defines private func fetchTcpStatusInfo() with no call sites anywhere in the codebase. Should be removed.

Hot-path logging noise - buildUIState() emits .log() on every state update: RuntimeCoordinator.swift:60 logs customRules.count at INFO level. buildUIState() is called on every SwiftUI redraw, save, and layer change. This floods keypath-debug.log under normal usage. Should be demoted to .debug() or removed.

@claude
Copy link

claude bot commented Mar 9, 2026

Consistency - prepareExperimentalOutputBridgeEnvironment creates a fresh KanataOutputBridgeCompanionManager each call: KanataRuntimePathCoordinator.swift:73-76 allocates KanataOutputBridgeCompanionManager(helperManager:) per invocation instead of using .shared. Every other call site in this PR uses .shared. Likely an oversight.

Consistency - activateKanataOutputBridgeSession 45-second timeout undocumented: HelperManager+RequestHandlers.swift:315 uses 45s vs. the 30s default for all other calls. A brief comment explaining why this operation needs more time would help future readers.

Stale comments - RuntimeCoordinator.swift doc block line counts are wrong: Header says +Lifecycle.swift is ~77 lines (actual: 202) and +ServiceManagement.swift is ~119 lines (actual: 199). Typo at RuntimeCoordinator.swift:311: Initialize EngineClien (missing t).

@claude
Copy link

claude bot commented Mar 9, 2026

Summary - What is solid: Removal of KanataService.swift (571 lines), ProcessCoordinator.swift (124 lines), and ProcessManager.swift (257 lines) cleans up substantial legacy complexity without apparent regressions. HelperXPCCallCompletionState lock-based completion guard correctly prevents double-resume of CheckedContinuation under timeout plus XPC races. restartKanataOutputBridgeCompanion clearConnection() retry pattern is good defensive hygiene. DEBUG test seams (testPersistentHostPID, testStartPersistentError) follow the established KeyPathTestCase pattern correctly. Validation order and PermissionOracle flow are unchanged and intact. The split-runtime-first hard break in startKanata() is a clean, intentional design decision that the code executes correctly.

@claude
Copy link

claude bot commented Mar 9, 2026

PR Review: Finalize Split Runtime Cutover

This is a substantial architectural change that commits KeyPath to the split-runtime path as the only supported runtime. The code is well-structured overall and the separation of concerns is improved. Here are my observations:


Issues Worth Addressing

1. No fallback if split runtime fails — user is left without remapping

startKanata() in RuntimeCoordinator+ServiceManagement.swift now hard-fails (returns false, sets lastError) if currentSplitRuntimeDecision() returns anything other than .useSplitRuntime. When the runtime path evaluator returns .blocked or .useLegacySystemBinary, the user has no path forward except manual intervention. The error message should more clearly direct to the installer wizard (e.g., "Open the KeyPath installer to repair the runtime configuration").

2. Race condition in startPersistentPassthruHost — may return PID 0

guard activeHostProcess?.isRunning != true else {
    return activeHostProcess?.processIdentifier ?? 0  // returns 0 if process exited between guard and this line
}

PID 0 is sched on macOS and should never be returned as a valid host PID. This should be ?? -1 or should throw. Callers treat the returned pid_t as a running process's PID.

3. currentRuntimeStatus() has a silent gap during cutover

func currentRuntimeStatus() async -> RuntimeStatus {
    if KanataSplitRuntimeHostService.shared.isPersistentPassthruHostRunning {
        return .running(pid: ...)
    }
    return .stopped  // doesn't account for legacy daemon still running during cutover window
}

During the legacy-to-split cutover in startKanata() there is a window where the legacy daemon has been stopped but the split runtime has not started yet. UI callers may flash a "stopped" state mid-start. Either document that callers should check isStartingKanata, or add a .starting case.

4. isOneShotProbeMode logic is duplicated in three places

RuntimeCoordinator.swift, App.swift, and AppDelegate all independently check the same four env vars. A fifth probe mode would require updating all three sites. Extract to a single static method on AppDelegate where the constants already live.

5. detectKanataTCPPortConflict uses locale-sensitive comparison and runs too late

localizedCaseInsensitiveContains is locale-dependent. Use .contains after .lowercased() for binary-name matching.

More importantly: this check only runs after the full 20-second readiness timeout. If an existing Kanata process holds the port, the user waits 20 seconds before seeing the tcpPortInUse error. Consider a quick port-conflict probe before the wait loop begins.


Design Observations

6. checkKanataInputCaptureStatus uses fragile log string matching

The new check reads Kanata's stderr log and looks for "iohiddeviceopen error" + "not permitted" + "apple internal keyboard / trackpad". This depends on Kanata log message wording that could change between Kanata versions, returns .ready optimistically when the log file does not exist, and only sees the last 64KB. Per the CLAUDE.md PermissionOracle rule, Apple APIs should take precedence over side-channel detection. If IOHIDCheckAccess or similar could determine this, it would be more robust. At minimum, add a comment explaining why log parsing is necessary here.

7. Duplicated environment setup in launchPassthruHost / startPersistentPassthruHost

Both functions build nearly identical environment dictionaries (~30 lines each). The only difference is passthruPersistEnvKey in the persistent variant. Extract a shared private helper buildPassthruEnvironment(session:includeCapture:pollMilliseconds:persist:).

8. Companion monitor catch block swallows the thrown error

In checkSplitRuntimeCompanionHealth, the catch block stops the host and synthesizes an exit event, but the thrown error from restartCompanionAndRecoverPersistentHost is never logged. Add an error log call (e.g., AppLogger.shared.error with the failure info) before the cleanup.

9. Rust keypath_kanata_bridge_run_runtime has a TOCTOU on the TCP port

The function probes TCP port availability by binding and immediately dropping a listener, then TcpServer::new binds the same port. Another process could take the port in between. Tolerable for a diagnostic function, but worth a comment documenting the race.


Positive Changes

  • Removing investigation scripts (repro-duplicate-keys.sh, run-duplicate-key-test.sh, manual-keystroke-test.sh) keeps the branch clean
  • API renaming (installAllLaunchDaemonServices to installRequiredRuntimeServices, restartUnhealthyServices to recoverRequiredRuntimeServices) — intent is much clearer
  • kanataReadinessTimeout increased from 8s to 20s with a clear explanatory comment
  • ServiceHealthChecker now iterates multiple launchctl targets — more robust detection
  • inputCaptureReady field in KanataServiceRuntimeSnapshot — health decision now accounts for actual input capture, not just process liveness + TCP responsiveness
  • Consistent #if DEBUG test seams on all new services — matches project patterns
  • KeyPathHelper now correctly depends on KeyPathCore for the new protocol surface
  • Rust unit tests in lib.rs — good FFI bridge coverage for the passthru runtime

Minor

  • stopPersistentPassthruHost() sets activeHostProcess = nil before the process has actually exited; the termination handler's PID guard mitigates a concurrent start, but worth a comment.
  • keypath CLI renamed to keypath-cli in Package.swift — verify no CI steps or external scripts reference the old name.
  • handleSplitRuntimeHostExit user message "Toggle the service again..." is vague. Something like "Use the KeyPath status menu to restart, or open Settings to diagnose" would be more actionable.

@claude
Copy link

claude bot commented Mar 9, 2026

test

@claude
Copy link

claude bot commented Mar 9, 2026

Review Part 1 of 3 - Bugs and Architecture findings for PR 225 Finalize Split Runtime Cutover. More details in comments 2 and 3.

@claude
Copy link

claude bot commented Mar 9, 2026

BUGS AND CORRECTNESS: (1) waitUntilExit blocks MainActor in launchPassthruHost -- KanataSplitRuntimeHostService.launchPassthruHost is MainActor and calls process.waitUntilExit() line 155, a synchronous blocking call that freezes the actor for up to 8s. Fix with a detached wait. (2) currentRuntimeStatus does not check legacy recovery daemon -- returns .stopped even if legacy daemon is active, causing callers like resetToDefaultConfig to skip TCP reload during daemon transition. Check recoveryDaemonService.isRecoveryDaemonRunning() as secondary condition. (3) Variable shadowing in startKanata and restartKanata -- the function parameter reason is shadowed by the enum case binding. Produces a Swift warning. Rename binding to evalReason.

@claude
Copy link

claude bot commented Mar 9, 2026

ARCHITECTURE AND PERFORMANCE: (4) Multiple evaluateCurrentPath() calls in startup hot path -- violates CLAUDE.md anti-pattern. Called at least 3x: in engine.inspectSystem(), via shouldUseSplitRuntimeHost(), and inside startKanata() via currentSplitRuntimeDecision(). Each involves XPC to the helper. Cache at top of performInitialization and pass down. (5) Env var key constants duplicated -- KEYPATH_EXPERIMENTAL_OUTPUT_BRIDGE_SESSION and KEYPATH_EXPERIMENTAL_OUTPUT_BRIDGE_SOCKET defined as private in KanataSplitRuntimeHostService lines 36-37 AND as public statics in KanataRuntimePathCoordinator lines 6-7. Remove private copies, reference coordinator constants. (6) Duplicate Karabiner conflict diagnostic -- getSystemDiagnostics emits .warning at line 384 AND .error at line 428 for the same karabinerElementsRunning condition. Merge into one. (7) isOneShotProbeMode logic duplicated in RuntimeCoordinator line 159 and KeyPathApp line 27 with same 4 env-var checks. Should share a single source of truth.

@claude
Copy link

claude bot commented Mar 9, 2026

RUST CODE: (8) aarch64-only build in build-kanata-host-bridge.sh -- hardcodes --target aarch64-apple-darwin, intentional given macOS 15 min but should be documented in script header. (9) Unnecessary .clone() at lib.rs:139 -- tcp_server_address: tcp_server_address.clone() may be moveable into ValidatedArgs instead. (10) Test bypasses public API -- lib.rs:726 reads output_rx directly via unsafe pointer cast instead of keypath_kanata_bridge_passthru_try_recv_output, fragile to field reordering. TESTING GAPS: (11) No unit tests for new Swift services -- KanataSplitRuntimeHostService, KanataOutputBridgeCompanionManager, KanataOutputBridgeSmokeService have zero Swift test coverage. Test seams testPersistentHostPID and testStartPersistentError are wired but not exercised in any test. (12) analyzeLogFile loads entire log file into memory before tailing -- String(contentsOfFile:) then .suffix(100) is wasteful for large logs. Use file-handle seek-to-end instead.

@claude
Copy link

claude bot commented Mar 9, 2026

WHAT IS GOOD: Removing 5000+ lines of legacy code (ProcessManager, KanataService, HelperBackedPrivilegedOperations) is a meaningful net reduction in surface area. The test seams in KanataSplitRuntimeHostService follow the established project pattern. The Rust C-ABI bridge is cleanly structured with consistent null-pointer guards and a clear error-buffer convention. Hard-failing in startKanata when the split runtime is not viable (no silent legacy fallback) is exactly right for committing to the new architecture. stopPersistentPassthruHost correctly sets expectedPersistentHostTermination = true before terminate(), preventing false unexpected-exit alerts. The companion health monitor loop with the isRecoveringSplitRuntimeCompanion re-entrancy guard prevents a recovery storm if the companion is persistently unhealthy. SUMMARY: Items 1 and 3 are correctness issues to address before merging. Items 4 and 5 are architectural concerns that CLAUDE.md explicitly identifies as anti-patterns. Item 11 (test coverage gap) is the most significant long-term risk. Generated with Claude Code.

@claude
Copy link

claude bot commented Mar 9, 2026

Code Review — Finalize Split Runtime Cutover

Overall this is a well-structured cutover. The removal of KanataService.swift, ProcessManager.swift, and ProcessCoordinator.swift is satisfying — dead code gone, single runtime path established. The XPC infrastructure is notably more robust than what it replaced. A few things worth discussing:


🟡 Medium — Potential double-resume in executeValueXPCCall

In HelperManager+RequestHandlers.swift, executeValueXPCCall fetches the proxy inside a nested Task, while the timeout Task runs concurrently. If getRemoteProxy throws, the errorHandler closure fires and calls completionState.tryComplete() + continuation.resume(throwing:). But the outer catch block in the same nested Task also calls completionState.tryComplete() + continuation.resume(throwing:). Both could theoretically fire for the same error, which would cause a CheckedContinuation double-resume crash.

// executeValueXPCCall — nested Task
Task {
    do {
        let proxy = try await self.getRemoteProxy { error in
            guard completionState.tryComplete() else { return }
            continuation.resume(throwing: error)   // path A
        }
        call(proxy) { result in ... }
    } catch {
        guard completionState.tryComplete() else { return }
        continuation.resume(throwing: error)       // path B — can fire alongside A
    }
}

executeXPCCall doesn't have this problem because it fetches the proxy before entering the continuation. Consider aligning executeValueXPCCall to the same structure, or verifying the errorHandler and catch paths are mutually exclusive (they likely are if getRemoteProxy throws synchronously rather than calling the errorHandler — but this is worth confirming explicitly).


🟡 Medium — activeXPCCalls Set may have data races

HelperManager uses activeXPCCalls.insert(name) / activeXPCCalls.remove(name) to detect concurrent calls. If HelperManager is not an actor or otherwise serialized, this Set mutation across concurrent executeXPCCall / executeValueXPCCall invocations is a data race. The warning logged is correct in spirit, but the detection mechanism itself is unsafe if not protected. Either wrap HelperManager in an actor (or @MainActor) or protect this field with a lock.


🟠 Low — shouldSoftenOutputBridgeStatusFailure silently fakes bridge status

if Self.shouldSoftenOutputBridgeStatusFailure {
    reply(.success(KanataOutputBridgeStatus(available: false, ...)))
}

During diagnostic env modes (KEYPATH_ENABLE_HOST_PASSTHRU_DIAGNOSTIC, etc.) a decode failure returns a fake "unavailable" status instead of propagating the error. This could mask real bugs where the helper is returning malformed JSON. At minimum, log the decode error payload at warn level before softening. The trimmed payload is already captured in trimmedPayload — log it.


🟡 Low — startKanata and restartKanata evaluate split-runtime decision redundantly

restartKanata evaluates currentSplitRuntimeDecision() once, then calls startKanata, which evaluates it again. Since KanataRuntimePathCoordinator.evaluateCurrentPath() is cheap (unlike SMAppService.status), this is fine for correctness, but it could lead to surprising behavior if the decision changes between the two calls during a restart. Consider passing the decision down as a parameter, or documenting that the second evaluation is authoritative.


🟡 Low — No fallback when split-runtime fails to start

Per the PR intent, the legacy recovery daemon is no longer a normal startup path. But if KanataSplitRuntimeHostService.shared.startPersistentPassthruHost throws, the user sees:

"Split runtime host failed to start: … Legacy fallback is reserved for recovery paths."

This is the right call architecturally, but it means a user whose split runtime is broken has no self-healing path from normal startup. Make sure the wizard/repair flow handles this state (i.e., InstallerEngine.inspectSystem() surfaces a split-runtime failure as a fixable issue). If it does already, a brief comment noting "wizard handles this" would help future readers.


🔵 Observation — build-kanata-host-bridge.sh hardcodes aarch64-apple-darwin

The comment explains this is intentional ("KeyPath currently packages Apple Silicon-only local builds"). That's fine, but if CI ever runs on x86_64 runners the build will fail silently or produce wrong artifacts. Worth adding an explicit check:

ARCH=$(uname -m)
if [ "$ARCH" != "arm64" ]; then
    echo "⚠️  Warning: Building for aarch64-apple-darwin on $ARCH host" >&2
fi

🔵 Observation — KanataOutputBridgeCompanionManager is @MainActor unnecessarily?

The class delegates everything to helperManager methods. If HelperManager is not @MainActor, there's no reason for KanataOutputBridgeCompanionManager to be either. The @MainActor annotation means callers must hop to the main thread to call even lightweight prepareEnvironment. If this is intentional (e.g., it's always called from UI code), it's fine — but worth a comment.


✅ Things done well

  • The HelperXPCCallCompletionState class with a lock for thread-safe one-shot completion is solid and correctly prevents double-resume in the timeout vs. XPC-callback race.
  • XPC error code handling (4097 / errSecCSReqFailed) with specific advice to the user is excellent UX for the signature mismatch edge case.
  • KanataOutputBridgeSmokeService with injectable operation closures is very testable.
  • The legacy-daemon cutover logic in startKanata (stop legacy → start split) has good logging and clean error reporting at each step.
  • Deletion of ProcessManager.swift, ProcessCoordinator.swift, HelperBackedPrivilegedOperations.swift etc. removes a lot of confusing dual-path complexity.

Summary: The double-resume risk in executeValueXPCCall and the activeXPCCalls thread safety are the items worth fixing before merge. Everything else is low-risk or observational.

@claude
Copy link

claude bot commented Mar 9, 2026

Code Review — PR #225: Finalize split runtime cutover

This is a substantial, well-structured PR. The split-runtime architecture cleanly separates input capture (in-process via the bundled Kanata host) from output injection (via the privileged helper's output bridge companion). The existing patterns and anti-patterns from CLAUDE.md are followed throughout. Below are my findings, from most to least significant.


🐛 Bugs / Correctness Issues

1. stopPersistentPassthruHost() → termination handler cleanup block never runs

In KanataSplitRuntimeHostService.swift, stopPersistentPassthruHost() sets activeHostProcess = nil before the process actually exits (line 379). The termination handler installed at line 301 then checks:

if self.activeHostProcess?.processIdentifier == process.processIdentifier {
    self.activeHostProcess = nil
    self.expectedPersistentHostTermination = false  // ← Never reached
}

Since activeHostProcess is already nil when the handler fires, the pid comparison is nil != pid → false, so the cleanup block is skipped. In practice this is benign because startPersistentPassthruHost resets expectedPersistentHostTermination = false at line 275 before the next launch. But the termination handler's cleanup is silently dead code — and if startup order ever changes, the stale expectedPersistentHostTermination = true flag would cause the next unexpected exit to be treated as expected, suppressing a legitimate error.

Suggestion: Either set activeHostProcess = nil inside the termination handler only (not in stopPersistentPassthruHost), or explicitly reset expectedPersistentHostTermination = false at the end of stopPersistentPassthruHost instead of relying on the handler.


2. restartPersistentPassthruHostAfterCompanionRestart — no guarantee old process has exited

stopPersistentPassthruHost()
try await Task.sleep(for: .milliseconds(250))
return try await startPersistentPassthruHost(...)

stopPersistentPassthruHost() sends SIGTERM and immediately returns. If the old Kanata host process takes >250ms to exit (possible if it's in the middle of VHID teardown), the new host process may start while the old one still holds the keyboard device. This would likely cause an immediate IOHIDDeviceOpen error in the new host.

Suggestion: Use a ProcessExitLatch-style wait, or at minimum wait with a bounded retry loop checking activeHostProcess?.isRunning == false.


⚠️ Code Quality

3. Fragile string literal in init cutover logic

RuntimeCoordinator+Lifecycle.swift line 128:

if activeRuntimeTitle == "Split Runtime Host" {

This is a magic string that will silently fail if the title ever changes. Since SystemContextAdapter.swift is also modified in this PR, the title could drift.

Suggestion: Extract to a constant (e.g., KanataRuntimePathCoordinator.splitRuntimeHostTitle) shared with SystemContextAdapter.


4. Duplicate Karabiner conflict diagnostics in DiagnosticsService

getSystemDiagnostics() generates two diagnostics for the same karabinerElementsRunning condition:

  • Lines 384–397: .warning — "Karabiner-Elements Conflict"
  • Lines 429–442: .error — "Karabiner Grabber Conflict"

Both fire whenever karabiner_grabber is running. The user sees a warning and an error for the same underlying fact, which is confusing and makes the diagnostic list feel noisy.

Suggestion: Keep only the .error version (it's more specific and actionable) or consolidate into a single diagnostic.


5. Debug/diagnostic ActionDispatcher system actions belong behind #if DEBUG

handleSystem() now contains 10+ diagnostic/exercise actions (prepare-host-passthru-bridge, run-host-passthru-diagnostic, start-host-passthru, exercise-host-passthru-cycle, exercise-output-bridge-companion-restart, exercise-coordinator-split-runtime-recovery, etc.) that:

  • Write diagnostic reports to /var/tmp/keypath-host-passthru-*.txt
  • Can launch/stop the split runtime host on demand
  • Trigger soak tests and companion restarts

These are reachable in production via any Kanata push-msg keypath:// URI. If a malicious script or accidental config ever sends keypath://system/stop-host-passthru, it stops the keyboard remapping service. The /var/tmp output paths are world-readable and world-writable.

Suggestion: Wrap these cases in #if DEBUG or gate them behind TestEnvironment.isRunningTests || ProcessInfo.processInfo.environment["KEYPATH_ENABLE_DIAGNOSTIC_ACTIONS"] == "1".


6. 45-second XPC timeout for activateKanataOutputBridgeSession needs user feedback

HelperManager+RequestHandlers.swift line 317:

try await executeXPCCall("activateKanataOutputBridgeSession", timeout: 45.0) { ... }

Callers in the startup path (startPersistentPassthruHost) can block the main actor for up to 45s with no user-visible progress indicator. The comment explains why (launchd kickstart on slow/CI machines), but on a hung helper this silently freezes the UI.

Suggestion: Consider showing a spinner/progress state in KanataViewModel when isStartingKanata = true, and confirm whether this 45s timeout is observed in practice on fast machines.


🔒 Minor Security Note

7. Session info persisted to a world-readable path

KanataOutputBridgeCompanionManager.bridgeEnvironmentOutputPath = "/var/tmp/keypath-host-passthru-bridge-env.txt" exposes the active session ID and Unix socket path to any local process. This is low risk (the socket itself is root-owned), but worth noting for the threat model.


✅ What's Well Done

  • ProcessExitLatch actor — correctly uses actor isolation to avoid the "check-then-act" race; the synchronous nature of the withCheckedContinuation closure means the assignment is safe.
  • KanataEventListener reconnection — the three bug fixes (cleanup on exit, post-handshake state handler, propagating poll send errors) are solid.
  • KanataRuntimePathCoordinator.evaluateCurrentPath() — pulling in outputBridgeStatus and helperReady as inputs to the evaluator is a clean pull-based pattern consistent with InstallerEngine.inspectSystem().
  • Test seams preserved#if DEBUG nonisolated(unsafe) static var testPersistentHostPID and testStartPersistentError follow the established project pattern (matching VHIDDeviceManager.testPIDProvider).
  • Removal of ProcessCoordinator.swift and ProcessManager.swift — ~380 lines of now-dead process management code cleanly deleted.
  • Rust FFI lib.rs — null checks on all pointer parameters, error written to caller's buffer, clean Box::into_raw ownership transfer. Looks correct assuming a matching keypath_kanata_bridge_free_runtime exists in the consume path.

Summary

Priority Issue
🐛 Medium stopPersistentPassthruHost termination handler cleanup is dead code; expectedPersistentHostTermination not reliably reset
🐛 Medium 250ms sleep in companion restart doesn't wait for actual process exit
⚠️ Medium Debug system actions (stop-host-passthru, etc.) reachable in production via keypath:// URI
⚠️ Low Magic string "Split Runtime Host" in init cutover logic
⚠️ Low Duplicate Karabiner conflict diagnostics (warning + error for same condition)
ℹ️ Info 45s XPC timeout has no user-visible feedback on main actor
ℹ️ Info /var/tmp session env file is world-readable

Overall the architecture is solid and the cutover approach is sound. The two correctness bugs are worth fixing before merging; the rest are polish items.

@claude
Copy link

claude bot commented Mar 9, 2026

PR Review: Finalize Split Runtime Cutover (Follow-up)

Existing reviews have covered the main issues well. This review focuses on a few additional findings.


Bug: NotificationCenter observers registered without storing tokens

RuntimeCoordinator.init registers two block-based observers (lines 435 and 447) but discards the returned NSObjectProtocol tokens:

// Token discarded — cannot be removed later
NotificationCenter.default.addObserver(forName: .configAffectingPreferenceChanged, ...) { ... }
NotificationCenter.default.addObserver(forName: .splitRuntimeHostExited, ...) { ... }

The deinit at line 474 cancels splitRuntimeCompanionMonitorTask but never calls removeObserver. With the block-based addObserver API, the token must be stored and passed to removeObserver(_:) to unregister — the selector-based form does not apply here.

RuntimeCoordinator is not a singleton. It is also instantiated ad-hoc in WizardKanataServicePage.swift:561, WizardAccessibilityPage.swift:525, WizardInputMonitoringPage.swift:605, and DiagnosticSummarySection.swift:85. Each instantiation permanently registers a new .splitRuntimeHostExited handler. While [weak self] prevents crashes from deallocated instances, any live instance will call AppContextService.shared.stop() in response to the notification — so if two live instances exist simultaneously, stop() fires twice.

Fix: store the tokens and remove them in deinit:

private var notificationTokens: [NSObjectProtocol] = []

// In init:
notificationTokens.append(NotificationCenter.default.addObserver(...) { ... })

// In deinit:
notificationTokens.forEach { NotificationCenter.default.removeObserver($0) }
splitRuntimeCompanionMonitorTask?.cancel()

Minor: restartKanata evaluates decision before it is needed

restartKanata (line 155) calls currentSplitRuntimeDecision() up front, before checking isPersistentPassthruHostRunning. If the host is already running (the common restart path), splitDecision is never used — stopKanata/startKanata are called directly, and startKanata re-evaluates the decision independently at line 67. Since currentSplitRuntimeDecision() involves an XPC round-trip + Rust bridge probe, this is a non-trivial upfront cost wasted on the common path. Deferring the evaluation until after the running-check would eliminate it and also address the redundant-evaluation concern raised in prior reviews.


Confirming priority items from existing reviews

The prior reviews correctly identified the highest-priority issues. In my view the must-fix items before merge are:

  1. stopPersistentPassthruHost termination handler cleanup is dead code — expectedPersistentHostTermination can go stale and suppress legitimate error alerts
  2. 250ms sleep in companion restart does not wait for actual process exit — new host may race for the keyboard device
  3. Debug system actions (stop-host-passthru, exercise-output-bridge-companion-restart, etc.) reachable in production via keypath://system/... URI
  4. NotificationCenter observer tokens not stored (this review)

The Rust FFI null-checks, HelperXPCCallCompletionState lock-based one-shot completion, test seams via testPersistentHostPID/testStartPersistentError, and the deletion of ProcessManager, ProcessCoordinator, and HelperBackedPrivilegedOperations are all well done. The split-runtime-first architecture is clean and the cutover logic is correct.

@claude
Copy link

claude bot commented Mar 10, 2026

test

@claude
Copy link

claude bot commented Mar 10, 2026

Final Consolidated Review — PR 225: Finalize Split Runtime Cutover

Prior reviews in this thread have been thorough. This comment consolidates the key findings into a single clear merge-readiness verdict.

Must-Fix Before Merge

1. stopPersistentPassthruHost termination handler is dead code

The handler guards on activeHostProcess?.processIdentifier == process.processIdentifier, but stopPersistentPassthruHost already sets activeHostProcess = nil before the process exits. The comparison becomes nil != pid → false so the cleanup body never runs. expectedPersistentHostTermination can therefore remain true after a stop, causing the next unexpected exit to be silently treated as expected and suppressed.

Fix: reset expectedPersistentHostTermination = false explicitly at the end of stopPersistentPassthruHost, or move the nil-assignment inside the termination handler only.

2. 250ms sleep in companion restart does not guarantee old process has exited

restartPersistentPassthruHostAfterCompanionRestart sends SIGTERM via stopPersistentPassthruHost() which returns immediately, sleeps 250ms, then launches the new host. On a loaded system or during VHID teardown the old process may still be alive, causing the new host to fail IOHIDDeviceOpen immediately.

Fix: wait for activeHostProcess?.isRunning == false with a bounded retry, or use a ProcessExitLatch-style actor as already used elsewhere in the codebase.

3. Debug system actions reachable in production via keypath:// URI

handleSystem() in App.swift unconditionally exposes stop-host-passthru, exercise-output-bridge-companion-restart, run-host-passthru-diagnostic, and about seven more diagnostic actions. These write output to /var/tmp/ and can stop keyboard remapping in response to any keypath://system/... deep link, including one embedded in a malicious config.

Fix: wrap these cases in #if DEBUG or gate on ProcessInfo.processInfo.environment["KEYPATH_ENABLE_DIAGNOSTIC_ACTIONS"] == "1".

4. NotificationCenter block-observer tokens discarded in RuntimeCoordinator.init

Two addObserver(forName:...) { [weak self] in ... } calls discard the returned NSObjectProtocol tokens. deinit never calls removeObserver. Because RuntimeCoordinator is instantiated in multiple wizard pages, each live instance permanently registers a new .splitRuntimeHostExited handler. Two live instances on screen simultaneously will both fire AppContextService.shared.stop().

Fix: store the tokens and remove them in deinit.

Recommended to Address

  • Redundant currentSplitRuntimeDecision() on the common restart path --- evaluated in restartKanata before the running check, then again in startKanata
  • Magic string "Split Runtime Host" in RuntimeCoordinator+Lifecycle.swift line 128 --- extract to a constant shared with SystemContextAdapter
  • Duplicate Karabiner conflict diagnostics --- getSystemDiagnostics emits both warning and error for the same karabinerElementsRunning condition
  • Build script should fail fast on non-arm64 host --- add a uname -m guard at the top of build-kanata-host-bridge.sh

What Is Well Done

  • Removal of ProcessManager (257 lines), ProcessCoordinator (124 lines), HelperBackedPrivilegedOperations, and KanataService (571 lines) --- substantial dead complexity removed cleanly
  • HelperXPCCallCompletionState lock-based one-shot guard correctly prevents double-resume under the timeout vs. XPC callback race
  • Rust FFI null-checks on all pointer parameters with a consistent error-buffer convention
  • KanataOutputBridgeSmokeService with injectable operation closures --- highly testable design
  • Hard-failing in startKanata when split runtime is not viable --- no silent legacy fallback is architecturally correct
  • Test seams testPersistentHostPID and testStartPersistentError follow the established KeyPathTestCase pattern
    -kanataReadinessTimeout increase from 8s to 20s with a clear explanatory comment

Summary

The four must-fix items are correctness and security issues that could surface during normal usage. The architecture is sound and the split-runtime-first cutover is a clean, intentional commitment to the new path. Fix those four and this is ready to merge.

Generated with Claude Code

@claude
Copy link

claude bot commented Mar 10, 2026

PR Review: Finalize Split Runtime Cutover

Overall this is a well-structured architectural change. The split runtime approach is cleaner than the legacy LaunchDaemon path, and removing ProcessManager, ProcessCoordinator, HelperBackedPrivilegedOperations, etc. meaningfully reduces the maintenance surface. Below are findings organized by severity.


🔴 Critical / Bugs

1. Double refreshManagementState() in getServiceStatus() hot path (CLAUDE.md anti-pattern)

ServiceHealthChecker.getServiceStatus() calls KanataDaemonManager.shared.refreshManagementState() at the top (call 1), then delegates to checkKanataServiceRuntimeSnapshot() which calls refreshManagementState() again internally (call 2):

// ServiceHealthChecker.swift ~line 305
let kanataState = await KanataDaemonManager.shared.refreshManagementState()  // IPC call 1
...
if kanataState.isSMAppServiceManaged {
    let runtimeSnapshot = await checkKanataServiceRuntimeSnapshot()  // triggers call 2!

checkKanataServiceRuntimeSnapshot() (the no-arg overload) also calls refreshManagementState(). CLAUDE.md explicitly warns: "Never call SMAppService.status repeatedly in a hot path — it does synchronous IPC and can block for 10-30+ seconds under load." The overloaded version that accepts managementState + staleEnabledRegistration already exists; the getServiceStatus() fast path should call that version directly, passing the already-fetched state.


🟠 High Severity

2. TOCTOU race in TCP port binding (Rust bridge, lib.rs:295-311)

The code does a preflight TcpListener::bind() to check port availability, then immediately drops it before TcpServer::new() binds again. The comment even flags this: "TcpServer::new binds again below, so there is still a small TOCTOU window."

This isn't just theoretical — Kanata itself can race for the same port. The preflight bind should either be eliminated (accept the better error from TcpServer::new) or the listener should be passed into TcpServer::new rather than re-binding.

3. XPC session activation timeout is 45 seconds

// HelperManager+RequestHandlers.swift:317
try await executeXPCCall("activateKanataOutputBridgeSession", timeout: 45.0)

45 seconds is a very long UI-blocking timeout. If this call hangs on a healthy system it will appear frozen for nearly a minute. Consider a shorter timeout (10-15s) with a more actionable error message, reserving the longer timeout for CI via an environment variable override.

4. restartKanata() computes split decision before it's needed

// RuntimeCoordinator+ServiceManagement.swift:155-159
let splitDecision = await currentSplitRuntimeDecision()  // async eval
if KanataSplitRuntimeHostService.shared.isPersistentPassthruHostRunning {
    let stopped = await stopKanata(...)
    guard stopped else { return false }
    return await startKanata(...)  // splitDecision unused in this branch
}

In the fast path (host already running), splitDecision is evaluated but never consulted. currentSplitRuntimeDecision() calls KanataRuntimePathCoordinator.evaluateCurrentPath() which likely does I/O. This should be deferred to the switch statement below where it's actually used.


🟡 Medium Severity

5. passthru-output-spike is the shipped default, but "spike" implies temporary

Cargo.toml has default = [], but build-kanata-host-bridge.sh defaults BRIDGE_FEATURES to passthru-output-spike. All the corresponding Swift code paths (the companion monitor, startPersistentPassthruHost(includeCapture:)) run only via this feature. The word "spike" reads as a time-boxed investigation, yet this is now the primary runtime path. Rename to something stable like passthru-host to signal production readiness.

6. migrateFromLaunchctl() uses a hard-coded 2-second sleep

// KanataDaemonManager.swift:708
try await Task.sleep(for: .seconds(2))

Fixed sleeps are fragile — too short on slow machines, too long otherwise. Replace with a poll loop (e.g., polling getStatus() every 250ms up to 10s) that exits early once the status reaches .enabled or .requiresApproval.

7. Error message quality in handleSplitRuntimeHostExit

message += ". Toggle the service again to restart the split runtime host."

"Toggle the service" is cryptic to users who don't know what that means. Suggest: "Use the Start button in KeyPath to restart."

8. Intel Mac exclusion undocumented

build-kanata-host-bridge.sh hardcodes --target aarch64-apple-darwin. The comment acknowledges this is intentional, but there's no runtime guard that presents a meaningful error on x86_64 Macs. If the bridge .dylib/.a isn't present for x86_64, the split runtime path will silently fail or crash. Either gate at runtime or document the minimum system requirements explicitly.


🟢 Low Severity / Style

9. Rust bridge: unnecessary clone in create_passthru_runtime

// lib.rs:139
tcp_server_address: tcp_server_address.clone(),

tcp_server_address is moved into ValidatedArgs and then the clone is stored in PassthruRuntime. Since ValidatedArgs is consumed by Kanata::new_with_output_channel, you could move into args and clone only the value stored in PassthruRuntime.

10. build-kanata-runtime-library.sh not linked from build docs

The new Scripts/build-kanata-runtime-library.sh isn't mentioned in Scripts/README.md or CLAUDE.md build commands. Developers discovering the Scripts/ directory won't know when/how to invoke it.

11. KanataDaemonManager.register() catch for .requiresApproval uses string matching

reason.contains("Approval required")

String matching on error messages is fragile — if the message ever changes (localization, wording update), this silently stops working. Consider a dedicated enum case like KanataDaemonError.requiresApproval that callers can pattern-match on.


✅ What's Working Well

  • HelperXPCCallCompletionState elegantly prevents double-resume when both timeout and XPC callback fire
  • KanataHealthDecision enum (.healthy / .transient / .unhealthy) is a clean state model that avoids boolean proliferation
  • ServiceHealthChecker.decideKanataHealth evaluates conditions in the correct priority order (permission issue → healthy → stale → launchctl missing → transient)
  • Removing the processthru-output-spike guard in performInitialization() so the split runtime is now the unconditional preferred path is the right call
  • Companion monitor recovery (checkSplitRuntimeCompanionHealth) gracefully handles output bridge companion crashes without hard-failing the host
  • Null pointer guards on every *mut c_void parameter in the Rust bridge FFI
  • defer { isRecoveringSplitRuntimeCompanion = false } prevents recovery lock from getting stuck on error paths

Reviewed by Claude Sonnet 4.6

@claude
Copy link

claude bot commented Mar 10, 2026

PR #225 Code Review: Finalize Split Runtime Cutover\n\nThis is a large PR.

@claude
Copy link

claude bot commented Mar 10, 2026

PR 225 Code Review: Finalize Split Runtime Cutover\n\nLarge PR (317 files). Several blocking issues found.\n\nCRITICAL Issues:\n1. Restart race in restartKanata() - stopPersistentPassthruHost() returns before process terminates; use waitForPersistentHostExit() pattern.\n2. XPC on MainActor every 5s - startSplitRuntimeCompanionMonitor() calls getKanataOutputBridgeStatus() (30s XPC) on main actor.\n3. evaluateCurrentPath() called multiple times per operation - uncached, duplicated in inspectSystem.\n4. checkKanataInputCaptureStatus() reads legacy daemon log path, not split runtime host path.\n5. TOCTOU in PrivilegedCommandRunner.swift - file written then chmod'd separately.\n\nHigh Severity:\n6. assertionFailure removed from concurrent XPC detection.\n7. RecoveryDaemonService calls SMAppService.status on every start/stop (hot path).\n8. Bridge session files in world-readable /var/tmp.\n9. bootstrapOutputBridgeCompanion blocks XPC thread with up to 2s usleep.\n\nMedium:\n10. currentRuntimeStatus() ignores recovery daemon state.\n11. KanataRuntimeHost.current() uses Bundle.main.bundlePath - breaks in tests.\n12. Foundation.FileManager() used instead of FileManager.default.\n13. isOneShotProbeEnvironment missing hostPassthruCaptureEnvKey.\n14. restartKanata logic bifurcation.\n15. Silent pid == 0 guard in termination handler.\n16. Double launchctl calls per health check.\n\nMust fix before merge: Issues 1-5. Note: ADR-032 is Proposed but PR ships split runtime as production -- update ADR status.\n\nReview generated by Claude Code

@claude
Copy link

claude bot commented Mar 10, 2026

PR #225 Code Review: Finalize Split Runtime Cutover

This is a large (317 files, +14,701/-5,386 lines), architecturally significant PR. The overall direction is sound, but there are several issues ranging from blocking race conditions to performance hot paths that should be addressed.


CRITICAL / Blocking Issues

1. Restart Race Condition in restartKanata()

Sources/KeyPathAppKit/Services/KanataSplitRuntimeHostService.swift

stopPersistentPassthruHost() calls process.terminate() and returns immediately. restartKanata() then calls startKanata() which calls startPersistentPassthruHost(). But activeHostProcess is still set (cleared asynchronously in the termination handler), so the guard if let activeHostProcess, activeHostProcess.isRunning returns the old PID without launching a new process.

restartPersistentPassthruHostAfterCompanionRestart() correctly calls waitForPersistentHostExit() -- restartKanata() should use the same pattern.

2. XPC on Main Actor Every 5 Seconds

Sources/KeyPathAppKit/Managers/RuntimeCoordinator+Lifecycle.swift

startSplitRuntimeCompanionMonitor() runs a Task { @MainActor in } loop calling checkSplitRuntimeCompanionHealth() every 5 seconds. That method calls getKanataOutputBridgeStatus() (XPC, 30s timeout). Per CLAUDE.md: Do not call SMAppService.status repeatedly in a hot path. The same applies to XPC. Use Task.detached for the XPC call or add result caching.

3. evaluateCurrentPath() Called Multiple Times Per Operation

Sources/KeyPathAppKit/Services/KanataRuntimePathCoordinator.swift

evaluateCurrentPath() makes two XPC calls (testHelperFunctionality + getKanataOutputBridgeStatus) and is called from startKanata(), stopKanata(), and restartKanata(). Additionally, InstallerEngine.inspectSystem() calls evaluateCurrentPath() then calls getKanataOutputBridgeStatus() again immediately -- two XPC calls where one suffices. Cache results within a single validation cycle.

4. checkKanataInputCaptureStatus() Reads from Wrong Log Path

ServiceHealthChecker parses Kanata stderr for iohiddeviceopen error from KeyPathConstants.Logs.kanataStderr (the legacy daemon path). The split runtime host writes to NSTemporaryDirectory() + keypath-host-passthru-live-stderr.log. These are different paths -- this function will never detect IOHIDDeviceOpen errors from the split runtime host.

5. TOCTOU Risk in Temp Script File Creation

Sources/KeyPathCore/PrivilegedCommandRunner.swift -- file is written at 0o600, then chmod'd in a separate syscall. Any process can open the file between write and chmod, and this file is executed with admin privileges via osascript. Write with correct permissions atomically or use a process-private directory.


High Severity Issues

6. assertionFailure Removed from Concurrent XPC Detection

Sources/KeyPathAppKit/Core/HelperManager+RequestHandlers.swift -- the concurrent XPC call guard previously fired assertionFailure in debug builds. This PR replaces it with a log. Concurrent XPC calls now silently proceed in debug builds, potentially hiding regressions.

7. RecoveryDaemonService.evaluateStatus() Calls SMAppService.status on Every Start/Stop

Two Task.detached blocks call SMAppService.daemon(plistName:).status synchronously -- triggered on every isRecoveryDaemonRunning() call from startKanata() and stopKanata(). Per CLAUDE.md this is prohibited in hot paths.

8. Bridge Session Files Written to World-Readable Path

/var/tmp/keypath-host-passthru-bridge-env.txt contains session IDs and socket paths for a privileged bridge. /var/tmp is world-readable. Consider /var/run or a root-owned path.

9. HelperService.bootstrapOutputBridgeCompanion Blocks XPC Thread with usleep

Up to 2 full seconds of blocking sleep (200ms + 400ms + 600ms + 800ms) on the helper XPC service thread across 5 retry attempts. Reduce retry count or sleep duration.


Medium Severity / Code Quality

10. currentRuntimeStatus() Does Not Reflect Recovery Daemon State

On mid-migration systems where the legacy recovery daemon is still running, currentRuntimeStatus() always reports .stopped. The RecoveryDaemonService state is never reflected.

11. KanataRuntimeHost.current() Uses Bundle.main.bundlePath -- Breaks in Tests

In unit tests, Bundle.main.bundlePath points to the XCTest runner. Any test calling KanataRuntimeHost.current() directly (not through the testDecision override) gets wrong binary paths. No test-seam override exists for this method.

12. Foundation.FileManager() vs FileManager.default

Many files now use Foundation.FileManager() (new instance per call). FileManager.default is the thread-safe singleton. Use Foundation.FileManager.default where namespace disambiguation is needed, not Foundation.FileManager().

13. isOneShotProbeEnvironment Missing hostPassthruCaptureEnvKey

AppDelegate.isOneShotProbeEnvironment checks 4 env keys but omits hostPassthruCaptureEnvKey. Running with KEYPATH_ENABLE_HOST_PASSTHRU_CAPTURE=1 will not enter probe mode.

14. restartKanata Logic Bifurcation

Checks isPersistentPassthruHostRunning before the splitDecision switch, creating two independent code paths that can disagree when the companion crashes mid-restart.

15. Silent guard pid > 0 in Termination Handler

handleSplitRuntimeHostExit(pid:) silently returns with no log if pid == 0, making early-exit races invisible in logs.

16. Double launchctl Calls per Health Check

KanataDaemonManager.preferredLaunchctlTargets now returns two targets, doubling subprocess overhead on every health check. Consider only falling back to the second target on failure of the first.


Test Coverage Gaps

Good additions: RuntimeCoordinatorTests, KanataRuntimeHostTests, KanataRuntimePathDecisionTests, KanataOutputBridgeSmokeServiceTests.

Gaps: checkSplitRuntimeCompanionHealth() polling loop has no tests for companion-down, recovery success, or recovery failure. evaluateKanataLaunchctlRunningState fallback path (first target fails, second succeeds) is untested. KanataRuntimePathEvaluator.decide() with requiresPrivilegedBridge: false is untested. RecoveryDaemonServiceIntegrationTests.testStatusRefresh_ShouldDetectChanges asserts XCTAssertNotEqual(status, .unknown) -- too weak; passes even if status is .failed.


CLAUDE.md Compliance

GuidelineStatus
PermissionOracle Apple API priorityCompliant
No SMAppService.status in hot pathsConcern: RecoveryDaemonService calls it on every start/stop
Cache helper results within validation cycleConcern: evaluateCurrentPath() uncached; inspectSystem duplicates XPC call
Don't mark Kanata healthy from timing aloneCompliant (TCP probe required)
InstallerEngine as installation facadeGenerally compliant
No pgrep in testsCompliant (KeyPathTestCase pattern used)
Validation orderGenerally maintained

Recommendations

Must fix before merge:

  1. Fix restart race -- use waitForPersistentHostExit() in restartKanata()
  2. Move companion health check XPC off the main actor
  3. Cache evaluateCurrentPath() within a single operation; deduplicate getKanataOutputBridgeStatus in inspectSystem
  4. Fix checkKanataInputCaptureStatus() to read the split runtime host stderr path
  5. Fix temp script file permissions (write with correct mode atomically)

Should fix:

  • Restore assertionFailure for concurrent XPC detection in debug builds
  • Document or fix currentRuntimeStatus() on legacy/mixed systems
  • Replace Foundation.FileManager() with Foundation.FileManager.default throughout
  • Add tests for health polling and launchctl fallback paths
  • Move bridge session files to a root-owned path

Note: ADR-032 is marked Proposed but this PR ships split runtime as production. Consider updating ADR status to Accepted before merge.


Review generated by Claude Code

@claude
Copy link

claude bot commented Mar 10, 2026

Follow-up Review: Additional Bugs Found

A prior Claude review (#issuecomment-4020945394 through the summary comment) covered the major architecture issues. This comment adds three bugs that were not previously flagged.


Bug: isStartingKanata is declared but never set to true

RuntimeCoordinator.swift:220 declares var isStartingKanata = false.
RuntimeCoordinator+ServiceManagement.swift:181 reads it:

func currentRuntimeStatus() async -> RuntimeStatus {
    if isStartingKanata {      // ← always false
        return .starting       // ← unreachable
    }
    ...
}

startKanata() never sets isStartingKanata = true before the async work, and nothing else in the codebase writes true to it (grep confirms zero hits). The .starting state is dead code. Either:

  • Add isStartingKanata = true at the top of startKanata() with defer { isStartingKanata = false }, or
  • Remove the flag and the .starting branch

Without this fix, UI callers that key off currentRuntimeStatus() == .starting to show a spinner will never see that state.


Bug: handleSplitRuntimeHostExit skips AppContextService.stop() when PID is 0

RuntimeCoordinator+Lifecycle.swift:186-189:

guard pid > 0
else {
    return   // ← AppContextService.shared.stop() is NOT called
}

checkSplitRuntimeCompanionHealth calls it with:

let failedPID = KanataSplitRuntimeHostService.shared.activePersistentHostPID ?? 0
KanataSplitRuntimeHostService.shared.stopPersistentPassthruHost()
await handleSplitRuntimeHostExit(pid: failedPID, ...)

If activePersistentHostPID is nil (host crashed before its PID was recorded, or was stopped from another path), failedPID == 0 and handleSplitRuntimeHostExit returns immediately. AppContextService is never stopped, leaving the app with a stale running context while the host process is dead. At minimum the guard pid > 0 branch should still call AppContextService.shared.stop() and notifyStateChanged() before returning.


Bug: stopKanata() success path for legacy daemon does not clear lastError

RuntimeCoordinator+ServiceManagement.swift:

// Split runtime stop (line 133-134)
lastError = nil      // ✅ cleared
lastWarning = nil

// Legacy recovery daemon stop (line 141-142)
lastWarning = nil    // ❌ lastError NOT cleared

If the split runtime previously set lastError (e.g. a failed start attempt), then the user manually stops via the legacy path, the stale error persists in the UI even though the stop succeeded. Add lastError = nil to the success branch of the legacy stop path to match the split runtime path.


These are all self-contained fixes and independent of each other.

@claude
Copy link

claude bot commented Mar 10, 2026

PR Review: Finalize split runtime cutover

This is a significant architectural milestone (14,764 additions / 5,425 deletions across 100 files). Overall the implementation is solid. Logging is comprehensive, actor isolation is correct, and XPC safety patterns are well-designed.


Strong points

HelperManager+RequestHandlers.swift - HelperXPCCallCompletionState with NSLock is the right pattern for bridging callback-based XPC to async/await. The timeout Task + completion guard correctly prevents double-resume.

lib.rs (Rust bridge) - Error handling is thorough, write_error is consistent, and cfg(feature = "passthru-output-spike") gating cleanly separates stable API from experimental code. The TOCTOU comment on the preflight TCP bind (lines 297–306) is good documentation.

ServiceHealthChecker - Caching managementState and staleEnabledRegistration within a single validation cycle (via the checkKanataServiceRuntimeSnapshot overload) correctly follows the CLAUDE.md anti-pattern about not calling SMAppService.status repeatedly.

RuntimeCoordinator+ServiceManagement.swift - The deliberate removal of legacy fallback (startKanata returns false with a clear error for .useLegacySystemBinary) is the right decision for the cutover.


Potential bugs / races

1. restartKanata evaluates splitDecision before checking host state (ServiceManagement.swift:155–178)

splitDecision is re-fetched inside startKanata, so the binding is unused on the happy path. In the error branch (.useLegacySystemBinary / .blocked), splitDecision may be stale by the time the switch runs (after stopKanata completes). The variable gives a false sense of atomicity. Consider fetching only where used.

2. startKanata VHID check is not atomic with the start (ServiceManagement.swift:60–65)

The VHID daemon health check and startPersistentPassthruHost have a window between them. This accepted TOCTOU is common in system-level code, but a comment explaining why a re-check inside startPersistentPassthruHost is not feasible would help future readers.

3. performInitialization does not auto-start Kanata when not running (Lifecycle.swift:152–160)

The cutover logic handles the "already running via wrong path" case, but if Kanata is not running at all the function logs and returns. A comment confirming this is by design (UI-driven flow) would prevent future readers from adding auto-start here and reintroducing the legacy path.


Design observations

4. keypath_kanata_bridge_run_runtime is a blocking call (lib.rs:414–491)

Kanata::event_loop does not return until the runtime exits. The C header gives no hint of this. A doc comment (or rename to keypath_kanata_bridge_run_runtime_blocking) would prevent a caller from invoking this on the main thread.

5. keypath_kanata_bridge_create_runtime ownership is opaque to the caller (lib.rs:62–89)

The returned void * is heap-allocated via Box::into_raw. Nothing in the header tells the caller they must call keypath_kanata_bridge_destroy_runtime or leak memory. A comment in the header describing the create/destroy ownership transfer would help.

6. void * opaque pointers lose type safety (keypath_kanata_host_bridge.h)

Both the regular runtime and the passthru runtime use void *. If a caller passes a passthru handle to keypath_kanata_bridge_destroy_runtime the result is undefined behavior. Opaque struct typedefs (typedef struct OpaqueKanataRuntime OpaqueKanataRuntime;) would catch this at the C level. Low urgency with one Swift caller today, but worth considering as the surface grows.

7. splitRuntimeCompanionMonitorTask polls outputBridgeStatus() every 5 seconds (Lifecycle.swift:36)

Per CLAUDE.md, be careful about IPC in hot paths. If KanataOutputBridgeCompanionManager.outputBridgeStatus() involves synchronous IPC (similar to SMAppService.status), the 5s polling interval could stretch or block. Worth verifying this call is cheap, or adding a comment confirming it.

8. executeValueXPCCall inner Task is not cancelled on timeout (RequestHandlers.swift:143)

The timeout Task guards the continuation via completionState.tryComplete(), but the inner XPC Task continues running after a timeout. For a 30s default timeout this means up to one orphaned Task per timed-out call. Acceptable for now but worth a tracking comment.


Minor / Nits

9. stopKanata still falls back to the legacy daemon (ServiceManagement.swift:139–150)

stopKanata tries recoveryDaemonService.stopIfRunning() when the split host is not running. This is intentional for cleanup/cutover, but since startKanata now refuses the legacy path, a user with the legacy daemon somehow still running can stop but not restart. A comment clarifying this is an intentional one-way cleanup path would prevent future confusion.

10. BRIDGE_FEATURES defaults to passthru-output-spike (build-kanata-host-bridge.sh:16)

Local developer builds get the experimental passthru feature compiled in by default. Consider whether production CI builds should explicitly override this to empty, or document that the default is intentional for local dev.


Test coverage

  • Rust tests are gated behind cfg(all(test, feature = "passthru-output-spike", target_os = "macos")) so no tests run without the feature or on non-macOS CI. The two existing tests are well-written (full create→start→send→recv→destroy cycle).
  • Swift side correctly uses TestEnvironment.isRunningTests to skip IPC-heavy paths per CLAUDE.md.
  • No new Swift tests for the split runtime lifecycle state transitions (handleSplitRuntimeHostExit, checkSplitRuntimeCompanionHealth). Given the complexity of those error-recovery paths, tests would be valuable to add.

Summary

The cutover is architecturally sound. The two highest-priority items before merging: (7) verify outputBridgeStatus() is cheap enough for 5s polling; and (4) add a doc comment to the header about keypath_kanata_bridge_run_runtime blocking. Everything else is low-priority cleanup.

Generated with Claude Code

malpern and others added 3 commits March 10, 2026 06:01
…nd caching

- Fix Cargo.toml edition (2024 → 2021)
- Add AtomicBool/Mutex to PassthruRuntime for thread-safe state
- Replace &mut casts with shared refs (interior mutability)
- Assign processing_tx before starting loop thread
- Remove TOCTOU preflight TCP bind
- Cache SMAppService.status in config reload hot path
- Wrap notifyStateChanged in MainActor.run for actor isolation
- Use forceRefresh() for fresh permission snapshot
- Await bootstrap() before startEventMonitoring()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 10, 2026

PR 225 Code Review — Finalize split runtime cutover

This is a substantial architectural cutover (~15K additions, ~5.5K deletions) that makes KanataSplitRuntimeHostService the primary runtime path and demotes the legacy KanataService/ProcessCoordinator/ProcessManager trio to a recovery path. The structure is clear and the test expansion is welcome. Several issues below warrant attention before merging.


Critical Bugs

1. currentRuntimeStatus() is blind to the recovery daemon

If only the recovery daemon is running (split runtime failed or has not started yet), currentRuntimeStatus() returns .stopped. Any caller using this as a liveness gate — e.g., the TCP reload in resetToDefaultConfig — will silently skip the reload while keys are actively being remapped by the daemon. The recovery daemon should be reflected in this status check.

2. startKanata() hard-fails on evaluator fallback with no recovery path

When KanataRuntimePathEvaluator returns .useLegacySystemBinary (helper not ready, bridge unhealthy, or dylib missing), startKanata() returns false and sets a vague lastError with no actionable UI path. On first-launch or post-update this strands the user with zero remapping. Either fall back to the recovery daemon or redirect to the installer wizard.

3. ADR-031 postcondition not enforced on split runtime path — false success possible

Per ADR-031, installer/runtime success requires verified runtime readiness (running + TCP responding). The legacy path enforces this via enforceKanataRuntimePostcondition. The split runtime startKanata() path does not — it returns true immediately after startPersistentPassthruHost() resolves, with no TCP health check. If Kanata fails to bind the TCP port or fails to load config after receiving a PID, callers get a false success.

4. splitRuntimeCompanionMonitorTask is never cancelled on host exit or user stop

In RuntimeCoordinator+Lifecycle.swift: when checkSplitRuntimeCompanionHealth() triggers recovery and calls handleSplitRuntimeHostExit, the monitor task is not cancelled — it keeps polling every 5 seconds against a stopped host. stopKanata() also does not cancel it. The task should be explicitly cancelled in both code paths.

5. Initialization cutover stops the legacy daemon before confirming split runtime starts

performInitialization() calls startKanata() which unconditionally calls recoveryDaemonService.stopIfRunning() before attempting to start the split host. If the split host fails to start, the user is left with zero remapping. The daemon should only be stopped after the split host is confirmed healthy.


Architectural Issues

6. InstallerRecipeID.restartCommServer now calls regenerateServiceConfiguration — wrong behavior

The recipe ID says restart but the implementation only regenerates plists without restarting the service. If the TCP comm server is stuck, this silently fails to fix it. Either restart the service or rename the recipe.

7. Install plan gap: VHID healthy but output bridge companion absent

installRequiredRuntimeServices is only added to the plan when vhidServicesHealthy is false. On a system where VHID is healthy but the output bridge companion was never installed, no service installation recipe is generated. Verify the install plan covers this case.

8. Hardcoded /Applications/KeyPath.app dylib path in KeyPathOutputBridge

Sources/KeyPathOutputBridge/main.swift hardcodes the dylib path, breaking ~/Applications installs and renamed bundles. The session JSON or helper protocol should carry the dylib path.

9. shouldUseSplitRuntimeHost() is dead code

Defined in RuntimeCoordinator+ServiceManagement.swift but never called — all call sites use currentSplitRuntimeDecision() directly. Remove it.


Code Quality

10. checkKanataInputCaptureStatus() heuristic is fragile

ServiceHealthChecker.swift reads Kanata stderr log looking for "IOHIDDeviceOpen error ... not permitted". Stale log entries from a prior failed launch can cause false negatives; log rotation can push the relevant line out of the read window. The readiness timeout was bumped from 8s to 20s partly for this. Please add a comment documenting the heuristic limitations.

11. Debounce threshold hardcoded in RecoveryDaemonServiceIntegrationTests

testEvaluateStatus_WhenPIDAndTCPBothFail_ShouldReportFailed() loops refreshStatus exactly 4 times to exhaust a threshold of 3. If the threshold changes, the test silently passes without exercising the failure path. Loop until the expected state is reached with a deadline, not a hardcoded count.

12. Brittle PID assertion after restart in testSplitRuntimeStartStopRestartCycle

Asserting contains("PID 4242") after restart does not verify that startKanata actually updates the PID. Assert that a PID is present and non-zero rather than the specific stub fixture value.

13. NotificationTokenStore.removeAll() — observer removal outside the lock

The lock is released before NotificationCenter.default.removeObserver(token) is called. A concurrent append during teardown could add a token that is never removed.


Minor Suggestions

  • Confirm broker.installRequiredRuntimeServices() covers output bridge companion installation — new infrastructure added in this PR.
  • restartKanata() failure logging: when stopping the recovery daemon fails, the log message is generic. Explicit call-site logging would help future bug investigations.
  • The tcpPortInUse lsof check uses contains("kanata") string matching — unlikely false positive, but worth a comment.

Overall this is well-structured work with meaningful test expansion. Issues 3 and 5 (ADR-031 postcondition bypass and the stop-before-confirm race) are the most likely to cause zero-remapping failure modes in production and should be addressed before merging.

@claude
Copy link

claude bot commented Mar 10, 2026

simple test with single quotes

…tests

The 4 split-runtime tests were calling real pgrep on CI where no
Karabiner daemon exists. Add KarabinerConflictService.testDaemonRunning
static override (matching existing testPIDProvider / testPersistentHostPID
pattern) and set it in the failing tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 10, 2026

PR Review: Finalize Split Runtime Cutover

This is a well-scoped architectural PR promoting the split runtime as the sole normal startup path, removing the legacy process-manager layer, and cleaning up investigation scaffolding. The last commit already addressed several issues from the previous review round. The remaining issues below are ordered by severity.

CRITICAL: ADR-031 Postcondition Violation in startKanata()

File: Sources/KeyPathAppKit/Managers/RuntimeCoordinator+ServiceManagement.swift (approx. line 102)

startKanata() calls KanataSplitRuntimeHostService.shared.startPersistentPassthruHost(includeCapture: true), which returns a PID the moment Process.run() succeeds — before the child process has finished initializing, before the TCP server is bound, and before the output bridge companion is connected. It then returns true immediately.

ADR-031 states: "Installer success requires verified runtime readiness (running + TCP responding) before returning success." The PrivilegedOperationsCoordinator install/recover paths correctly call enforceKanataRuntimePostcondition(), but the everyday startKanata() codepath skips all TCP readiness verification. Any caller that interprets the Bool return as "Kanata is fully ready" gets a false positive. Config reloads triggered immediately after start will silently fail.

Fix: After startPersistentPassthruHost returns a PID, poll TCP on the configured port with a bounded timeout (matching kanataReadinessTimeout) before returning true.

HIGH: Dangling Dylib Handle — dlclose Before PassthruRuntimeHandle Is Used

The createPassthruRuntime() code loads the dylib with dlopen, allocates a PassthruRuntime heap object inside it, then closes the handle via defer { dlclose(handle) } before returning the KanataHostBridgePassthruRuntimeHandle. The returned handle holds a raw pointer into an unmapped segment. Any subsequent call to start(), sendInput(), or destroy_passthru_runtime operates on a dangling pointer — a live crash risk now that this is the primary runtime path.

Fix: Keep the dylib handle open (store it in KanataHostBridgePassthruRuntimeHandle) for the lifetime of the runtime.

HIGH: currentRuntimeStatus() Ignores the Legacy Recovery Daemon

File: RuntimeCoordinator+ServiceManagement.swift (approx. line 180)

The function checks isPersistentPassthruHostRunning but never consults the legacy recovery daemon. If stopKanata() times out on XPC and the recovery daemon remains active, currentRuntimeStatus() returns .stopped while keys are actively being remapped. Code gating on .stopped will then launch a second runtime, causing a TCP port collision and keyboard freeze.

HIGH: Thread-Unsafe activeXPCCalls Set in HelperManager+RequestHandlers.swift

File: Sources/KeyPathAppKit/Core/HelperManager+RequestHandlers.swift (approx. line 70)

activeXPCCalls is read/written with .contains / .insert / .remove without synchronization. executeValueXPCCall spawns a child Task {} (approx. line 143) that escapes the actor, so two concurrent callers can race on the Set.

Fix: Ensure the Set is @MainActor-isolated or guard access with an OSAllocatedUnfairLock.

MEDIUM: CLAUDE.md Violation — SMAppService.status Called in Tight Poll Loop

File: Sources/KeyPathAppKit/Managers/KanataDaemonManager.swift (approx. line 119)

refreshManagementState() is called inside the 500ms poll loop in verifyKanataReadinessAfterInstall(). Each call does synchronous IPC to the ServiceManagement daemon. CLAUDE.md explicitly forbids calling SMAppService.status repeatedly in hot paths ("it does synchronous IPC and can block for 10-30+ seconds under load").

Fix: Cache the status at the start of the verification loop; only re-fetch when a state transition is expected.

MEDIUM: isStartingKanata Guard Does Not Protect the Full Async Span

File: RuntimeCoordinator+ServiceManagement.swift (approx. line 53)

startKanata() checks/sets isStartingKanata but suspends at await currentSplitRuntimeDecision(). During that suspension, another caller can enter and see isStartingKanata == false. The ActionDispatcher and RuntimeCoordinator.init Task block also call startPersistentPassthruHost directly, bypassing this guard entirely.

MEDIUM: TOCTOU TCP Bind Race in lib.rs — Confirm Fix Applied to run_runtime Path

File: Rust/KeyPathKanataHostBridge/src/lib.rs (approx. lines 447-459)

The commit message for 0937189 says "Remove TOCTOU preflight TCP bind," but the keypath_kanata_bridge_run_runtime path appears to still bind a TcpListener and immediately drop it before TcpServer::new. Please confirm the fix was applied to all code paths, not just the passthru path.

LOW: sudoExecuteCommand Should Not Be on the Public Protocol

File: Sources/KeyPathAppKit/Core/PrivilegedOperationsCoordinator.swift (approx. line 22)

sudoExecuteCommand(_ command: String, description: String) is declared on the public PrivilegedOperationsCoordinating protocol, making arbitrary shell execution part of the public API contract. CLAUDE.md notes that executeCommand was removed from HelperProtocol for security reasons. The same principle applies here — demote to a private implementation detail.

LOW: isRunning Removal Makes UI Blind to Non-Split-Host Active Runtimes

File: Sources/KeyPathAppKit/Models/KanataUIState.swift

Removing isRunning and inferring running state from activeRuntimePathTitle != nil means the UI shows a stopped indicator for any Kanata session not originating from the split host (e.g., if the split host fails and Kanata continues via the recovery daemon during cutover).

LOW: Silent Failure in Cutover Path of performInitialization()

File: Sources/KeyPathAppKit/Managers/RuntimeCoordinator+Lifecycle.swift (approx. line 143)

If startKanata() fails during split runtime cutover, the code logs a warning and returns without setting lastError. Users have no indication their session was disrupted. Should set lastError to a user-visible string, consistent with the rest of the codebase.

LOW: Triple Crate Type Wastes Build Time

File: Rust/KeyPathKanataHostBridge/Cargo.toml

crate-type = ["staticlib", "cdylib", "rlib"] means three separate compilations per cargo build. The Swift layer only uses cdylib. staticlib is unused by the build scripts; rlib is only needed for cargo test. Consider dropping staticlib from the default targets to cut clean-build time by ~33%.

Missing Test Coverage

  1. No test for startKanata() TCP-readiness contract — RuntimeCoordinatorTests does not verify startKanata() waits for TCP before returning success.
  2. checkSplitRuntimeCompanionHealth() is untested — the 5-second polling loop with async XPC calls has no test coverage.
  3. Deleted ProcessManager.swift / ProcessCoordinator.swift (381 lines) — no test updates to confirm behavioral equivalence of the new split-runtime paths.
  4. Run cargo audit on the committed Cargo.lock — the 2013-line lockfile for kanata-state-machine's transitive deps should be checked against known CVEs before merging.

Summary

Severity Issue
Critical startKanata() returns true before TCP is ready (ADR-031 violation)
High Dylib closed before PassthruRuntimeHandle is used — dangling pointer crash
High currentRuntimeStatus() ignores legacy daemon — false stopped state
High activeXPCCalls Set race across concurrent Tasks
Medium SMAppService.status in tight poll loop (CLAUDE.md violation)
Medium isStartingKanata guard bypassed at async suspension points
Medium TOCTOU TCP bind race — confirm fixed in run_runtime path
Low sudoExecuteCommand on public protocol
Low isRunning removal blinds UI to non-split-host runtimes
Low Silent error in cutover — should set lastError
Low Triple crate type wastes build time

The dangling dylib handle (High) and ADR-031 TCP postcondition (Critical) are the two blocking issues before merge. The rest can be addressed in follow-up PRs.

@claude
Copy link

claude bot commented Mar 10, 2026

PR Review test - please ignore

@claude
Copy link

claude bot commented Mar 10, 2026

hello test2

@claude
Copy link

claude bot commented Mar 10, 2026

AGENTS.md

These instructions apply to all Codex sessions in this repository.

Web Search Tooling

  • Always use web_search_request for any web lookups.
  • Never call the deprecated tools.web_search tool. Treat it as unavailable.
  • Do not pass the CLI flag --search. Rely on the feature flag or config instead.

Rationale: older CLI/tooling may still expose tools.web_search, which prints a deprecation warning. Enforcing web_search_request avoids the warning and keeps behavior consistent.

General

  • Keep diffs minimal and focused. Preserve directory layout and script entry points.
  • Update docs/tests when behavior or commands change.

Architecture & Patterns

InstallerEngine Façade

  • Use InstallerEngine for ANY system modification task:
    • Installation: InstallerEngine().run(intent: .install, using: broker)
    • Repair/Fix: InstallerEngine().run(intent: .repair, using: broker)
    • Uninstall: InstallerEngine().uninstall(...)
    • System Check: InstallerEngine().inspectSystem()
  • Do NOT use:
    • KanataManager for installation/repair (it's for runtime coordination only).
    • LaunchDaemonInstaller directly (it's an internal implementation detail).
    • WizardAutoFixer directly (superseded by Engine).

Permissions

  • Always use PermissionOracle.shared for permission checks.
  • Never call IOHIDCheckAccess directly outside of PermissionOracle.
  • Never check TCC database directly outside of PermissionOracle.
  • Exception: PermissionOracle itself uses IOHIDCheckAccess and TCC as its
    internal implementation (see ADR-001 and
    ADR-016). Apple APIs are
    authoritative; TCC is a fallback for .unknown results only.

Keyboard Visualization

  • Geometry follows selected PhysicalLayout (user-selected layout ID).
  • Labels follow selected LogicalKeymap (user-selected keymap).
  • Do not add a UI toggle for this; treat it as a single consistent rule.

Service Lifecycle Invariants

  • Mutating installer actions must be postcondition-verified before returning success.
    • Any action that can stop/restart/re-register Kanata must verify runtime readiness (running + TCP responding) or explicit pending-approval state before reporting success.
  • Stale SMAppService recovery bypasses generic install throttle.
    • If state is .enabled but launchd cannot load/run the daemon, recovery install/register logic must run even inside the normal throttle window.
  • Registration is not liveness.
    • Treat SMAppService.status == .enabled as registration metadata only; never infer runtime health from it without process + TCP evidence.

Testing

  • Mock Time: Do not use Thread.sleep. Use Date overrides or mock clocks.
  • Environment: Use KEYPATH_USE_INSTALLER_ENGINE=1 (default now) for tests.

Accessibility (CRITICAL)

  • ALL interactive UI elements MUST have .accessibilityIdentifier()
  • Required for: Button, Toggle, Picker, and custom interactive components
  • Enforcement: Pre-commit hook + CI check (currently warning only)
  • Verification: Run python3 Scripts/check-accessibility.py before committing
  • See: ACCESSIBILITY_COVERAGE.md for complete reference
  • Rationale: Enables automation (Peekaboo, XCUITest) and ensures testability

Available External Tools

These CLI tools are available for agents to use on-demand. Do not add them as MCPs - just call them directly when needed.

Poltergeist (Auto-Deploy)

Watches source files, auto-builds, deploys to /Applications, and restarts. Install: brew install steipete/tap/poltergeist

Command Purpose
poltergeist start Start watching and auto-deploying (~2s per change)
poltergeist status Check build status
poltergeist logs View build output
poltergeist stop Stop watching
poltergeist wait keypath Block until build completes

Workflow tip: Run poltergeist start at session start. After any Swift file edit, the app automatically runs ./Scripts/quick-deploy.sh, deploys to /Applications, and restarts. No manual steps needed.

Peekaboo (UI Automation)

macOS screenshots and GUI automation. Install: brew install steipete/tap/peekaboo

Command Purpose
peekaboo see "prompt" Screenshot + AI analysis
peekaboo click --element-id X Click by element ID
peekaboo type "text" Enter text
peekaboo scroll up/down Scroll
peekaboo app launch Safari App control
peekaboo window maximize Window management
peekaboo menu "File > Save" Menu interaction

KeyPath (Keyboard Control)

Trigger via URL scheme:

open "keypath://layer/vim"           # Switch layer
open "keypath://launch/Safari"       # Launch app
open "keypath://window/left"         # Snap window
open "keypath://fakekey/nav-mode/tap" # Trigger virtual key

Composing Tools

# Example: AI-driven development workflow
poltergeist start                    # Ensure auto-rebuild is running
# ... make code changes ...
poltergeist wait keypath             # Wait for build to complete
peekaboo see "Is the KeyPath app running?" # Check UI state
open "keypath://layer/vim"           # Trigger keyboard action
peekaboo type "Hello world"          # Type into focused app
peekaboo hotkey "cmd+s"              # Save

Recommended agent workflow:

  1. poltergeist start at session start (runs ./Scripts/quick-deploy.sh for incremental builds)
  2. Make code edits
  3. poltergeist wait keypath before testing
  4. Use Peekaboo for UI verification
  5. Use KeyPath URL scheme for keyboard actions

See docs/LLM_VISION_UI_AUTOMATION.md for detailed architecture.

@claude
Copy link

claude bot commented Mar 10, 2026

REPLACE_BODY_PLACEHOLDER

@claude
Copy link

claude bot commented Mar 10, 2026

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project follows Semantic
Versioning once public release tags are established.

[Unreleased]

Added

  • Release-governance docs: SECURITY.md, CODE_OF_CONDUCT.md, and this
    changelog.
  • CI/release-readiness tracking issues labeled as release-blocker in Linear.
  • Mapper/overlay support for optional per-key shifted output customization:
    Shift + key can now send a separate output from tap/default output for
    global keystroke mappings.

Changed

  • Documentation and CI policy are being aligned for open source release quality
    gates.
  • CI now enforces coverage non-regression for the narrow baseline lane:
    KeyPathErrorTests + PermissionOracleTests with an initial floor of
    0.29% TOTAL line coverage.
  • Shifted-output editing is intentionally constrained to global keystroke
    mappings and is disabled for app-specific mappings, system actions, URLs, and
    advanced hold/combo/tap-dance behaviors.

[0.0.0-internal]

Notes

  • Pre-public-release baseline. Historical internal changes prior to OSS launch
    are tracked in git history and project documentation.

@claude
Copy link

claude bot commented Mar 10, 2026

PR Review test complete - please ignore this comment

@claude
Copy link

claude bot commented Mar 10, 2026

KeyPath Action URI System

KeyPath supports a URI-based action system that enables:

  1. Kanata integration - Trigger KeyPath actions from keyboard shortcuts via push-msg
  2. Deep linking - External tools (Raycast, LeaderKey, Alfred) can invoke KeyPath actions
  3. Extensibility - Add new action types without protocol changes

Syntax Formats

KeyPath accepts two equivalent syntaxes:

Context Format Example
Kanata config Shorthand (colon) launch:obsidian
Deep links Full URI keypath://launch/Obsidian

Shorthand Syntax (Recommended for Configs)

[action]:[target][:[subpath]][?query=params]
  • Use lowercase - resolves to Title Case in UI
  • Colons separate action, target, and subpaths
  • Query params use standard ?key=value syntax
(push-msg "launch:obsidian")           ;; → launches "Obsidian"
(push-msg "layer:nav:activate")        ;; → layer "nav", subpath "activate"
(push-msg "notify:?title=Saved")       ;; → notification with title

Full URI Syntax (For Deep Links)

keypath://[action]/[target][/subpath...][?query=params]

Used by external tools (Terminal, Raycast, Alfred):

open "keypath://launch/Obsidian"
open "keypath://notify?title=Hello&body=World"

Supported Actions

launch:{app} / keypath://launch/{app}

Launch an application by name or bundle identifier.

Kanata config (shorthand):

(push-msg "launch:obsidian")
(push-msg "launch:terminal")
(push-msg "launch:visual studio code")

Deep link (full URI):

open "keypath://launch/Obsidian"
open "keypath://launch/com.apple.Terminal"

Resolution order:

  1. Bundle identifier lookup
  2. Application name in /Applications/
  3. Application name in /System/Applications/
  4. Application name in ~/Applications/

Case handling: Lowercase input (obsidian) resolves to Title Case (Obsidian) for display and lookup.

layer:{name} / keypath://layer/{name}

Signal a layer change (for UI feedback, logging, or custom handlers).

Kanata config (shorthand):

(defalias
  nav (multi (push-msg "layer:nav") (layer-switch nav))
  vim (multi (push-msg "layer:vim") (layer-switch vim))
)

Optional subpaths:

  • :activate / /activate - Layer was activated
  • :deactivate / /deactivate - Layer was deactivated

rule:{id} / keypath://rule/{id}[/fired]

Signal that a rule was triggered (for analytics, feedback, or debugging).

Kanata config (shorthand):

(defalias
  caps-escape (multi (push-msg "rule:caps-escape:fired") esc)
)

notify: / keypath://notify

Show a system notification.

Kanata config (shorthand):

(push-msg "notify:?title=Saved&body=Document saved successfully")
(push-msg "notify:?title=Layer&body=Navigation mode&sound=Pop")

Query parameters:

Parameter Description Default
title Notification title "KeyPath"
body Notification body ""
sound macOS sound name (none)

open:{url} / keypath://open/{url}

Open a URL in the default browser.

Kanata config (shorthand):

(push-msg "open:github.com")
(push-msg "open:https://docs.keypath.app")

Notes:

  • URLs without a scheme get https:// prepended
  • URL-encoded characters are decoded automatically

fakekey:{name} / keypath://fakekey/{name}[/{action}]

Trigger a Kanata virtual key (defined via defvirtualkeys or deffakekeys).

Kanata config (shorthand):

;; Define virtual keys in your Kanata config
(defvirtualkeys
  email-sig (macro H e l l o spc W o r l d)
  toggle-mode (layer-toggle special)
)

;; Trigger from KeyPath
(push-msg "fakekey:email-sig")           ;; tap (default)
(push-msg "fakekey:toggle-mode:press")
(push-msg "fakekey:toggle-mode:release")

Deep link (for external tools):

open "keypath://fakekey/email-sig/tap"

Actions:

Action Description
tap Press and immediately release (default)
press Press and hold
release Release a held key
toggle Toggle between pressed and released

Use cases:

  • Trigger macros from external tools (Raycast, Alfred, deep links)
  • Execute complex key sequences via simple URL
  • Remote-control Kanata layers and modes

Example: Email signature via Raycast

#!/bin/bash
# @raycast.title Insert Email Signature
# @raycast.mode silent
open "keypath://fakekey/email-sig/tap"

Kanata Configuration

Naming Conventions

Use full application names in launch aliases, not abbreviations:

✅ Recommended ❌ Avoid
launch-obsidian launch-obs
launch-terminal launch-term
launch-safari launch-saf
launch-slack launch-slk
launch-visual-studio-code launch-vscode

Why?

  • Self-documenting: Anyone reading the config immediately knows which app launches
  • Unambiguous: launch-obs could mean Obsidian or OBS Studio
  • Consistent: Matches the launch:{full-app-name} pattern

Basic Usage

Add push-msg to any alias:

(defalias
  ;; Launch app (use full app name in alias)
  launch-obsidian (push-msg "launch:obsidian")

  ;; Notify on layer switch
  nav (multi (push-msg "layer:nav") (layer-switch nav))

  ;; Track rule usage (optional - consider if you really need it)
  caps-escape (multi (push-msg "rule:caps-escape:fired") esc)
)

Complete Example

(defcfg
  process-unmapped-keys yes
)

(defsrc caps a s d f)

(defalias
  ;; Caps Lock → Escape (no tracking needed for basic functionality)
  caps-escape esc

  ;; Quick app launchers (use full app name in alias, not abbreviations)
  launch-obsidian (push-msg "launch:obsidian")
  launch-slack (push-msg "launch:slack")

  ;; Layer with notification
  to-nav (multi
    (push-msg "layer:nav:activate")
    (push-msg "notify:?title=Nav Mode&sound=Tink")
    (layer-switch nav)
  )
)

(deflayer base
  @caps-escape @launch-obsidian @launch-slack d @to-nav
)

(deflayer nav
  @caps-escape left down up right
)

Deep Linking from External Tools

Terminal / Shell

open "keypath://launch/Obsidian"
open "keypath://notify?title=Hello&body=World"

Raycast

Create a Raycast script command:

#!/bin/bash
# @raycast.title Launch Obsidian via KeyPath
# @raycast.mode silent

open "keypath://launch/Obsidian"

Alfred

Create a workflow with "Open URL" action:

  • URL: keypath://launch/{query}

Keyboard Maestro

Use "Open URL" action with keypath:// URLs.

AppleScript

do shell script "open 'keypath://launch/Obsidian'"

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Entry Points                            │
├──────────────────────┬──────────────────────────────────────┤
│   Kanata (push-msg)  │  External (open keypath://...)       │
│         │            │              │                        │
│         ▼            │              ▼                        │
│  TCP MessagePush     │      URL Scheme Handler               │
│         │            │              │                        │
│         ▼            │              ▼                        │
│  KanataEventListener │      AppDelegate.application(_:open:)│
│         │            │              │                        │
└─────────┴────────────┴──────────────┴────────────────────────┘
                       │
                       ▼
              ┌─────────────────┐
              │ KeyPathActionURI │  (Parses keypath:// URIs)
              └────────┬────────┘
                       │
                       ▼
              ┌─────────────────┐
              │ ActionDispatcher │  (Routes to handlers)
              └────────┬────────┘
                       │
     ┌─────────┬───────┼───────┬─────────┐
     ▼         ▼       ▼       ▼         ▼
 handleLaunch handleNotify handleOpen handleFakeKey
                                          │
                                          ▼
                                   ┌──────────────┐
                                   │KanataTCPClient│
                                   │ ActOnFakeKey │
                                   └──────────────┘

Custom Handlers

Subscribe to ActionDispatcher callbacks for custom handling:

// In your code
ActionDispatcher.shared.onLayerAction = { layerName in
    // Custom layer change handling
    print("Layer changed to: \(layerName)")
}

ActionDispatcher.shared.onRuleAction = { ruleId, path in
    // Custom rule tracking
    Analytics.track("rule_fired", properties: ["rule": ruleId])
}

ActionDispatcher.shared.onError = { message in
    // Show error toast/dialog
    showToast(message)
}

Error Handling

Scenario Behavior
Unknown action type Logs warning, calls onError callback
Missing target Logs warning, calls onError callback
App not found Attempts all search paths, then calls onError
Invalid URL format Logs warning, calls onError callback

Security Considerations

  • Launch action: Only launches apps from standard macOS app directories
  • Open action: Opens URLs in user's default browser (sandboxed)
  • No file system access: Actions cannot read/write arbitrary files
  • No shell execution: cmd action is not supported (use Kanata's native cmd for that)

Virtual Keys Inspector

KeyPath includes a built-in inspector for viewing and testing virtual keys defined in your configuration.

Accessing the Inspector

  1. Open KeyPath Settings (⌘,)
  2. Go to the General tab
  3. Scroll down to the Virtual Keys section

Features

Feature Description
Key List Shows all keys from defvirtualkeys and deffakekeys blocks
Copy URL Copy the deep link URL to clipboard for use in Raycast, Alfred, etc.
Test Button Trigger the virtual key immediately to verify it works
Refresh Re-parse the config file to pick up changes

Requirements

  • Virtual keys must be defined in your Kanata config using defvirtualkeys or deffakekeys
  • Kanata service must be running to test keys (TCP connection required)
  • The inspector is read-only; edit your config file to add/modify keys
  • Feature flag SIMULATOR_AND_VIRTUAL_KEYS_ENABLED must be enabled to access the inspector and trigger virtual keys

Troubleshooting

Issue Solution
"No Virtual Keys Defined" Add defvirtualkeys or deffakekeys blocks to your config
"Network error" on test Ensure Kanata service is running
Key not triggering Verify the key name matches exactly (case-sensitive)
Changes not showing Click Refresh or reopen Settings

Future Extensions

Planned action types (not yet implemented):

  • keypath://clipboard/copy?text=... - Copy text to clipboard
  • keypath://sound/play?name=... - Play a sound
  • keypath://window/focus?app=... - Focus specific window
  • keypath://shortcut/run?name=... - Run macOS Shortcut

@claude
Copy link

claude bot commented Mar 10, 2026

PR Review: Finalize Split Runtime Cutover - See full review at #225

@claude
Copy link

claude bot commented Mar 10, 2026

PR Review: Finalize Split Runtime Cutover

This is a large, well-structured refactor. The overall direction is sound: making the split runtime the primary path and reserving legacy daemon paths for recovery only. Below is a mix of bugs, concerns, and observations across the key changed areas.

Bugs / Correctness Issues

lib.rs - TOCTOU race in keypath_kanata_bridge_run_runtime

The function binds a TcpListener to verify the port, drops it immediately, then lets TcpServer::new bind again. Another process can steal the port in that window. The probe-then-bind pattern does not protect against anything here -- it introduces the race. Either skip the probe and let TcpServer::new return an error, or hold the listener and hand it to the server.

lib.rs - C++ mangled symbol link

The bridge links by a hardcoded C++ name-mangled symbol (__Z9init_sinkv). It will silently break if the linked library is rebuilt with a different ABI, compiler, or optimization settings. Consider exporting the sink initializer with extern C on the C++ side and linking the stable symbol instead.

write_error - silent truncation

When an error string exceeds the buffer, the caller gets a partial message with no indication it was cut. Consider appending ... on truncation, or adding a debug assertion that callers must use sufficiently large buffers.

RuntimeCoordinator+ServiceManagement.swift - confirm isStartingKanata lifecycle

currentRuntimeStatus() returns .starting when isStartingKanata is set. Verify that startKanata() actually sets and clears this flag around the async start path. If the flag is only set from a separate call site, the .starting state may never be returned and the UI will appear stuck in .stopped during startup.

Architectural / Design Concerns

ServiceHealthChecker - redundant launchctl calls within a validation cycle

CLAUDE.md warns explicitly against calling SMAppService.status repeatedly in a hot path. The same risk applies to launchctl print. checkKanataServiceRuntimeSnapshot() (the no-args overload) calls KanataDaemonManager.shared.refreshManagementState() then delegates to the two-arg overload which calls evaluateKanataLaunchctlRunningState again. If a validation cycle also calls isServiceLoaded + isServiceHealthy, management state ends up fetched 3-4 times. A per-cycle cache struct would address this, consistent with ADR-026 intent.

decideKanataHealth - inputCaptureReady masks staleRegistration diagnosis

If Kanata logs an IOHIDDeviceOpen error and the registration is stale, the user sees kanata-cannot-open-built-in-keyboard when the real actionable issue is stale-enabled-registration. Consider checking staleEnabledRegistration first -- a stale registration cannot capture input anyway, so surfacing it first is more useful for recovery.

checkKanataInputCaptureStatus - fragile string matching on stderr

The health decision depends on specific substrings in Kanata stderr log. This breaks silently if Kanata changes its log format, and reads up to 64 KB on every health check invocation. The inputCaptureStatusOverride test seam in DEBUG mode is good; consider adding a comment tying the matched strings to the specific Kanata version range where they were validated.

build-kanata-host-bridge.sh - missing MACOSX_DEPLOYMENT_TARGET

build-kanata-runtime-library.sh sets MACOSX_DEPLOYMENT_TARGET=11.0 but the bridge build script does not. Without it, the Rust toolchain may emit symbols requiring newer OS versions, causing linker warnings or errors on the Swift side. Mirror the deployment target from the other script.

executeValueXPCCall - continuation asymmetry with executeXPCCall

In executeXPCCall the proxy is fetched before the continuation. In executeValueXPCCall the proxy is fetched inside a nested Task inside the continuation. If the timeout fires and resumes the continuation while the proxy Task simultaneously calls completionState.tryComplete(), the guard prevents double-resume but the proxy Task error is silently swallowed. Prefer the executeXPCCall pattern: fetch the proxy before entering the continuation and make the XPC call synchronously inside it.

Positive Observations

Companion health monitor (checkSplitRuntimeCompanionHealth + startSplitRuntimeCompanionMonitor) is cleanly structured with an isRecoveringSplitRuntimeCompanion guard against re-entrant recovery. HelperXPCCallCompletionState with NSLock is correct and avoids the common Bool race. KanataHealthDecision.transient for TCP warmup after restart is a well-named pattern that avoids false-positive unhealthy alerts during restart windows. Deleting ProcessManager.swift and ProcessCoordinator.swift is the right move. DEBUG test seams (runtimeSnapshotOverride, recentlyRestartedOverride, inputCaptureStatusOverride) give good coverage hooks without polluting production paths.

Minor / Polish

Cargo.toml declares crate-type with staticlib, cdylib, and rlib. Producing all three from the same crate increases build time. The build script only uses the static library; the cdylib entry can be dropped unless it is needed elsewhere. Confirm the External/kanata submodule pointer bump is intentional and points to the correct upstream commit for this bridge version. isOneShotProbeMode skips MainAppStateController.shared.configure(with: manager) correctly, but still initializes TypingSoundsManager.shared. That singleton may do unnecessary work in probe mode.

Summary

Main items to address before merge: the TOCTOU port race in lib.rs, the missing MACOSX_DEPLOYMENT_TARGET in the bridge build script, and the executeValueXPCCall continuation asymmetry. The decideKanataHealth ordering and redundant launchctl call concerns are lower urgency but worth tracking to avoid reintroducing the slow-path issues the ADRs documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant