From f508bfaaf8a96f765c4c2bf1c3d5a22880a43a2f Mon Sep 17 00:00:00 2001 From: Ulzii Otgonbaatar Date: Mon, 23 Feb 2026 15:01:58 -0700 Subject: [PATCH 1/2] Add plan for humanizing all computer interaction endpoints Covers click, type, press key, scroll, and drag mouse with performance-first algorithms (zero additional xdotool process spawns). Includes the existing Bezier curve mouse movement as reference. Co-authored-by: Cursor --- plans/humanize-computer-endpoints.md | 210 +++++++++++++++++++++++++++ 1 file changed, 210 insertions(+) create mode 100644 plans/humanize-computer-endpoints.md diff --git a/plans/humanize-computer-endpoints.md b/plans/humanize-computer-endpoints.md new file mode 100644 index 00000000..2eb8962c --- /dev/null +++ b/plans/humanize-computer-endpoints.md @@ -0,0 +1,210 @@ +# Humanize All Computer Interaction Endpoints + +> Add human-like behavior to all computer interaction API endpoints using fast, pre-computed algorithms that add zero additional xdotool process spawns. + +## Performance-First Design Principle + +**The bottleneck is xdotool process spawns** (fork+exec per call), not Go-side computation. Every algorithm below is designed around two rules: + +1. **One xdotool call per API request** -- pre-compute all timing in Go and bake it into a single chained xdotool command with inline `sleep` directives. This is the same pattern already used by `doDragMouse` (see lines 911-951 of `computer.go`). +2. **O(1) or O(n) math only** -- uniform random (`rand.Intn`), simple easing polynomials (2-3 multiplies), no lookup tables, no transcendental functions beyond what `mousetrajectory.go` already uses. + +```mermaid +flowchart LR + Go["Go: pre-compute timing array O(n)"] --> Args["Build xdotool arg slice"] + Args --> OneExec["Single fork+exec"] + OneExec --> Done["Done"] +``` + + + +### Existing proof this works + +`doDragMouse` already chains `mousemove_relative dx dy sleep 0.050 mousemove_relative dx dy sleep 0.050 ...` in a single xdotool invocation. Every strategy below follows this exact pattern. + +--- + +## 0. Move Mouse -- Bezier Curve Trajectory (Already Implemented) + +**Status:** Complete. This is the reference implementation that all other endpoints follow. + +**Cost:** N xdotool calls (one `mousemove_relative` per trajectory point) with Go-side sleeps. Typically 5-80 steps depending on distance. + +**Algorithm:** Bezier curve with randomized control points, distortion, and easing. Ported from Camoufox/HumanCursor. + +- **Bezier curve**: 2 random internal knots within an 80px-padded bounding box around start/end. Bernstein polynomial evaluation produces smooth curved path. O(n) computation. +- **Distortion**: 50% chance per interior point to apply Gaussian jitter (mean=1, stdev=1 via Box-Muller transform). Adds micro-imperfections. +- **Easing**: `easeOutQuad(t) = -t*(t-2)` -- cursor decelerates as it approaches the target, matching natural human behavior. +- **Point count**: Auto-computed from path length (`pathLength^0.25 * 20`), clamped to [5, 80]. Override via `Options.MaxPoints`. +- **Per-step timing**: ~10ms default step delay with +/-2ms uniform jitter. When `duration_ms` is specified, delay is computed as `duration_ms / numSteps`. +- **Screen clamping**: Trajectory points clamped to screen bounds to prevent X11 delta accumulation errors. + +**Key files:** + +- `[server/lib/mousetrajectory/mousetrajectory.go](kernel-images/server/lib/mousetrajectory/mousetrajectory.go)` -- Bezier curve generation (~230 lines) +- `[server/cmd/api/api/computer.go](kernel-images/server/cmd/api/api/computer.go)` lines 104-206 -- `doMoveMouseSmooth` integration + +**API (existing):** `MoveMouseRequest` has `smooth: boolean` (default `true`) and optional `duration_ms` (50-5000ms). + +**Implementation in `doMoveMouseSmooth`:** + +1. Get current mouse position via `xdotool getmouselocation` +2. Generate Bezier trajectory: `mousetrajectory.NewHumanizeMouseTrajectoryWithOptions(fromX, fromY, toX, toY, opts)` +3. Clamp points to screen bounds +4. For each point: `xdotool mousemove_relative -- dx dy`, then `sleepWithContext` with jittered delay +5. Modifier keys held via `keydown`/`keyup` wrapper + +**Note:** This endpoint uses per-step Go-side sleeps (not xdotool inline `sleep`) because the trajectory includes screen-clamping logic that adjusts deltas at runtime. The other endpoints below use inline `sleep` since their timing can be fully pre-computed. + +--- + +## Shared Library: `server/lib/humanize/humanize.go` + +Tiny utility package (no external deps, no data structures) providing: + +```go +// UniformJitter returns a random duration in [base-jitter, base+jitter], clamped to min. +func UniformJitter(rng *rand.Rand, baseMs, jitterMs, minMs int) time.Duration + +// EaseOutQuad computes t*(2-t) for t in [0,1]. Two multiplies. +func EaseOutQuad(t float64) float64 + +// SmoothStepDelay maps position i/n through a smoothstep curve to produce +// a delay in [fastMs, slowMs]. Used for scroll and drag easing. +// smoothstep(t) = 3t^2 - 2t^3. Three multiplies. +func SmoothStepDelay(i, n, slowMs, fastMs int) time.Duration + +// FormatSleepArg formats a duration as a string suitable for xdotool's +// inline sleep command (e.g. "0.085"). Avoids fmt.Sprintf per call. +func FormatSleepArg(d time.Duration) string +``` + +All functions are pure, allocate nothing, and cost a few arithmetic ops each. Tested with table-driven tests and deterministic seeds. + +--- + +## 1. Click Mouse -- Single-Call Down/Sleep/Up + +**Cost:** 1 xdotool call (same as current). Pre-computation: 1-2 `rand.Intn` calls. + +**Algorithm:** Replace `click` with `mousedown sleep mouseup ` in the same xdotool arg slice. No separate process spawns. + +- **Dwell time**: `UniformJitter(rng, 90, 30, 50)` -> range [60, 120]ms. This matches measured human click dwell without needing lognormal sampling. +- **Micro-drift**: Append `mousemove_relative ` between mousedown and mouseup, where dx/dy are `rand.Intn(3)-1` (range [-1, 1] pixels). Trivially cheap. +- **Multi-click**: For `num_clicks > 1`, loop and insert inter-click gaps via `UniformJitter(rng, 100, 30, 60)` -> [70, 130]ms. + +**Single xdotool call example:** + +``` +xdotool mousemove 500 300 mousedown 1 sleep 0.085 mousemove_relative -- 1 0 mouseup 1 +``` + +**API change:** Add `smooth: boolean` (default `true`) to `ClickMouseRequest`. + +--- + +## 2. Type Text -- Chunked Type with Inter-Word Pauses + +**Cost:** 1 xdotool call (same as current). Pre-computation: O(words) random samples. + +**Algorithm:** Instead of per-character keysym mapping (which is complex and fragile for Unicode), split text by whitespace/punctuation into chunks and chain `xdotool type --delay "chunk" sleep ` commands. + +- **Intra-word delay**: Per-chunk, pick `rand.Intn(70) + 50` -> [50, 120]ms. Varies per chunk to simulate burst-pause rhythm. +- **Inter-word pause**: Between chunks, insert `sleep` with `UniformJitter(rng, 140, 60, 60)` -> [80, 200]ms. Longer pauses at sentence boundaries (after `.!?`): multiply by 1.5x. +- **No bigram tables**: The per-word delay variation is sufficient for convincing humanization. Bigram-level precision adds complexity with diminishing returns for bot detection evasion. + +**Single xdotool call example:** + +``` +xdotool type --delay 80 -- "Hello" sleep 0.150 type --delay 65 -- " world" sleep 0.300 type --delay 95 -- ". How" sleep 0.120 type --delay 70 -- " are" sleep 0.140 type --delay 85 -- " you?" +``` + +**API change:** Add `smooth: boolean` (default `false`) to `TypeTextRequest`. When `smooth=true`, the existing `delay` field is ignored. + +**Why this is fast:** We never leave the `xdotool type` mechanism (which handles Unicode, XKB keymaps, etc. internally). We just break it into chunks with sleeps between them. One fork+exec total. + +--- + +## 3. Press Key -- Dwell via Inline Sleep + +**Cost:** 1 xdotool call (same as current). Pre-computation: 1 `rand.Intn` call. + +**Algorithm:** Replace `key ` with `keydown sleep keyup `. + +- **Tap dwell**: `UniformJitter(rng, 95, 30, 50)` -> [65, 125]ms. +- **Modifier stagger**: When `hold_keys` are present, insert a small `sleep 0.025` between each `keydown` for modifiers, then the primary key sequence. Release in reverse order with the same stagger. This costs zero extra xdotool calls -- it's all in the same arg slice. + +**Single xdotool call example (Ctrl+C):** + +``` +xdotool keydown ctrl sleep 0.030 keydown c sleep 0.095 keyup c sleep 0.025 keyup ctrl +``` + +**API change:** Add `smooth: boolean` (default `false`) to `PressKeyRequest`. + +--- + +## 4. Scroll -- Eased Tick Intervals in One Call + +**Cost:** 1 xdotool call (same as current). Pre-computation: O(ticks) easing function evaluations (3 multiplies each). + +**Algorithm:** Replace `click --repeat N --delay 0 ` with N individual `click ` commands separated by pre-computed `sleep` values following a **smoothstep easing curve**. + +- **Easing**: `SmoothStepDelay(i, N, slowMs=80, fastMs=15)` for each tick i. The smoothstep `3t^2 - 2t^3` creates natural momentum: slow start, fast middle, slow end. +- **Jitter**: Add `rand.Intn(10) - 5` ms to each delay. Trivially cheap. +- **Small scrolls (1-3 ticks)**: Skip easing, use uniform delay of `rand.Intn(40) + 30` ms. + +**Single xdotool call example (5 ticks down):** + +``` +xdotool mousemove 500 300 click 5 sleep 0.075 click 5 sleep 0.035 click 5 sleep 0.018 click 5 sleep 0.040 click 5 +``` + +**API change:** Add `smooth: boolean` (default `false`) to `ScrollRequest`. + +**Why not per-tick Go-side sleeps?** That would require N separate xdotool calls (N fork+execs). Inline `sleep` achieves the same timing in one process. + +--- + +## 5. Drag Mouse -- Bezier Path + Eased Delays + +**Cost:** Same as current (1-3 xdotool calls for the 3 phases). Pre-computation: Bezier generation (already proven fast in `mousetrajectory.go`). + +**Algorithm:** When `smooth=true`, auto-generate the drag path using the existing `mousetrajectory.HumanizeMouseTrajectory` Bezier library, then apply eased step delays (instead of the current fixed `step_delay_ms`). + +- **Path generation**: `mousetrajectory.NewHumanizeMouseTrajectoryWithOptions(startX, startY, endX, endY, opts)` -- already O(n) with Bernstein polynomial evaluation. Proven fast. +- **Eased step delays**: Replace the fixed `stepDelaySeconds` in the Phase 2 xdotool chain with per-step delays from `SmoothStepDelay`. Slow at start (pickup) and end (placement), fast in middle. These are already baked into the single xdotool arg slice, so zero extra process spawns. +- **Jitter**: Same `rand.Intn(5) - 2` ms pattern already used by `doMoveMouseSmooth`. + +**API change:** Add `smooth: boolean` (default `false`) to `DragMouseRequest`. When `smooth=true` and `path` has exactly 2 points (start + end), the server generates a Bezier curve between them and replaces `path` with the generated waypoints. + +**No new `start`/`end` fields needed** -- the caller simply provides `path: [[startX, startY], [endX, endY]]` and the server expands it. + +--- + +## Computational Cost Summary + + +| Endpoint | xdotool calls | Pre-computation | Algorithm | +| ------------- | ---------------- | --------------------------------- | ------------------------------------ | +| `move_mouse` | O(points) (done) | O(points) Bezier + Box-Muller | Bezier curve + easeOutQuad + jitter | +| `click_mouse` | 1 (same) | 1-2x `rand.Intn` | Uniform random dwell | +| `type_text` | 1 (same) | O(words) `rand.Intn` | Chunked type + inter-word sleep | +| `press_key` | 1 (same) | 1x `rand.Intn` | Inline keydown/sleep/keyup | +| `scroll` | 1 (same) | O(ticks) smoothstep (3 muls each) | Eased inter-tick sleep | +| `drag_mouse` | 1-3 (same) | O(points) Bezier (existing) | Bezier path + smoothstep step delays | + + +No additional process spawns. No heap allocations beyond the existing xdotool arg slice. No lookup tables. Every random sample is a single `rand.Intn` or `rand.Float64` call. + +--- + +## Files to Create/Modify + +- **Modify:** `[server/openapi.yaml](kernel-images/server/openapi.yaml)` -- Add `smooth` boolean to 5 request schemas +- **Modify:** `[server/cmd/api/api/computer.go](kernel-images/server/cmd/api/api/computer.go)` -- Add humanized code paths (branching on `smooth` flag) +- **Create:** `server/lib/humanize/humanize.go` -- Shared primitives (~50 lines) +- **Create:** `server/lib/humanize/humanize_test.go` -- Table-driven tests +- **Regenerate:** OpenAPI-generated types (run code generation after schema changes) + +No separate per-endpoint library packages needed. The shared `humanize` package plus the existing `mousetrajectory` package cover everything. \ No newline at end of file From 627054d49098fa49c10f380af55ba18cb7827e56 Mon Sep 17 00:00:00 2001 From: Ulzii Otgonbaatar Date: Tue, 24 Mar 2026 13:55:55 -0600 Subject: [PATCH 2/2] Address review feedback on humanize plan - Type text: use O(words) separate xdotool calls with Go-side sleeps instead of single-call chaining, since xdotool type consumes rest of argv and can't be chained with sleep. - Type text: keep trailing delimiters (space, punctuation) with the preceding chunk so pauses happen after word boundaries. - Scroll: use bounded total duration (default 200ms) instead of fixed per-tick slowMs/fastMs, so large tick counts don't block input. Made-with: Cursor --- plans/humanize-computer-endpoints.md | 34 ++++++++++++++++++---------- 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/plans/humanize-computer-endpoints.md b/plans/humanize-computer-endpoints.md index 2eb8962c..c0f1177b 100644 --- a/plans/humanize-computer-endpoints.md +++ b/plans/humanize-computer-endpoints.md @@ -105,23 +105,32 @@ xdotool mousemove 500 300 mousedown 1 sleep 0.085 mousemove_relative -- 1 0 mous ## 2. Type Text -- Chunked Type with Inter-Word Pauses -**Cost:** 1 xdotool call (same as current). Pre-computation: O(words) random samples. +**Cost:** O(words) xdotool calls with Go-side sleeps. Pre-computation: O(words) random samples. -**Algorithm:** Instead of per-character keysym mapping (which is complex and fragile for Unicode), split text by whitespace/punctuation into chunks and chain `xdotool type --delay "chunk" sleep ` commands. +**Algorithm:** Split text into word chunks (keeping trailing whitespace/punctuation with each chunk), then issue one `xdotool type --delay -- ""` call per chunk with Go-side `sleepWithContext` pauses between them. This follows the same pattern as `doMoveMouseSmooth` (O(n) calls with Go-side sleeps). +- **Chunking**: Split at word boundaries, keeping trailing delimiters (spaces, punctuation) attached to the preceding chunk. `"Hello world. How are you?"` becomes `["Hello ", "world. ", "How ", "are ", "you?"]`. This ensures pauses happen *after* typing the delimiter, matching natural rhythm. - **Intra-word delay**: Per-chunk, pick `rand.Intn(70) + 50` -> [50, 120]ms. Varies per chunk to simulate burst-pause rhythm. -- **Inter-word pause**: Between chunks, insert `sleep` with `UniformJitter(rng, 140, 60, 60)` -> [80, 200]ms. Longer pauses at sentence boundaries (after `.!?`): multiply by 1.5x. +- **Inter-word pause**: Between chunks, Go-side sleep with `UniformJitter(rng, 140, 60, 60)` -> [80, 200]ms. Longer pauses at sentence boundaries (chunk ends with `.!?`): multiply by 1.5x. - **No bigram tables**: The per-word delay variation is sufficient for convincing humanization. Bigram-level precision adds complexity with diminishing returns for bot detection evasion. -**Single xdotool call example:** +**Execution sequence example (`"Hello world. How are you?"`):** ``` -xdotool type --delay 80 -- "Hello" sleep 0.150 type --delay 65 -- " world" sleep 0.300 type --delay 95 -- ". How" sleep 0.120 type --delay 70 -- " are" sleep 0.140 type --delay 85 -- " you?" +xdotool type --delay 80 -- "Hello " # fork+exec 1 + [Go sleep 150ms] +xdotool type --delay 65 -- "world. " # fork+exec 2 + [Go sleep 300ms] # sentence boundary: 1.5x pause +xdotool type --delay 95 -- "How " # fork+exec 3 + [Go sleep 120ms] +xdotool type --delay 70 -- "are " # fork+exec 4 + [Go sleep 140ms] +xdotool type --delay 85 -- "you?" # fork+exec 5 ``` **API change:** Add `smooth: boolean` (default `false`) to `TypeTextRequest`. When `smooth=true`, the existing `delay` field is ignored. -**Why this is fast:** We never leave the `xdotool type` mechanism (which handles Unicode, XKB keymaps, etc. internally). We just break it into chunks with sleeps between them. One fork+exec total. +**Why O(words) calls, not 1?** `xdotool type` consumes the rest of argv as text to type, so it cannot be chained with `sleep` or other commands in a single invocation. O(words) fork+execs (typically 5-15 for a sentence) is acceptable -- the inter-word pauses (80-300ms) dwarf the ~1-2ms fork+exec overhead. --- @@ -150,14 +159,15 @@ xdotool keydown ctrl sleep 0.030 keydown c sleep 0.095 keyup c sleep 0.025 keyup **Algorithm:** Replace `click --repeat N --delay 0 ` with N individual `click ` commands separated by pre-computed `sleep` values following a **smoothstep easing curve**. -- **Easing**: `SmoothStepDelay(i, N, slowMs=80, fastMs=15)` for each tick i. The smoothstep `3t^2 - 2t^3` creates natural momentum: slow start, fast middle, slow end. -- **Jitter**: Add `rand.Intn(10) - 5` ms to each delay. Trivially cheap. -- **Small scrolls (1-3 ticks)**: Skip easing, use uniform delay of `rand.Intn(40) + 30` ms. +- **Bounded total duration**: Target a fixed total scroll time regardless of tick count. Default `totalMs = 200` (capped so large scrolls don't block input). Per-tick delay = `totalMs / N`, then shaped by the easing curve. +- **Easing**: `SmoothStepDelay(i, N, slowMs, fastMs)` where `slowMs` and `fastMs` are derived from `totalMs / N`. The smoothstep `3t^2 - 2t^3` creates natural momentum: slow start, fast middle, slow end. Edge delays are ~2x the center delay. +- **Jitter**: Add `rand.Intn(6) - 3` ms to each delay. Trivially cheap. +- **Small scrolls (1-3 ticks)**: Skip easing, use uniform delay of `rand.Intn(20) + 10` ms. -**Single xdotool call example (5 ticks down):** +**Single xdotool call example (5 ticks down, totalMs=200, avg 40ms/tick):** ``` -xdotool mousemove 500 300 click 5 sleep 0.075 click 5 sleep 0.035 click 5 sleep 0.018 click 5 sleep 0.040 click 5 +xdotool mousemove 500 300 click 5 sleep 0.055 click 5 sleep 0.030 click 5 sleep 0.025 click 5 sleep 0.032 click 5 ``` **API change:** Add `smooth: boolean` (default `false`) to `ScrollRequest`. @@ -189,7 +199,7 @@ xdotool mousemove 500 300 click 5 sleep 0.075 click 5 sleep 0.035 click 5 sleep | ------------- | ---------------- | --------------------------------- | ------------------------------------ | | `move_mouse` | O(points) (done) | O(points) Bezier + Box-Muller | Bezier curve + easeOutQuad + jitter | | `click_mouse` | 1 (same) | 1-2x `rand.Intn` | Uniform random dwell | -| `type_text` | 1 (same) | O(words) `rand.Intn` | Chunked type + inter-word sleep | +| `type_text` | O(words) | O(words) `rand.Intn` | Per-word type + Go-side sleep | | `press_key` | 1 (same) | 1x `rand.Intn` | Inline keydown/sleep/keyup | | `scroll` | 1 (same) | O(ticks) smoothstep (3 muls each) | Eased inter-tick sleep | | `drag_mouse` | 1-3 (same) | O(points) Bezier (existing) | Bezier path + smoothstep step delays |