fix: improve editable-element detection and increase safe operation timeout#85
fix: improve editable-element detection and increase safe operation timeout#85eduardofcr wants to merge 1 commit into
Conversation
|
Thanks for the report. I investigated the semanticAction searchbox case on current The proposed editable-role fallback changes the wrong layer: The timeout change also removes the documented/tested headroom between the 25s upstream operation clamp and the 35s wrapper watchdog. Closing this PR as stale/incorrect for the current code. If there is still a reproducible semanticAction fill miss on current |
[FIX] Improve editable-element detection and increase safe operation timeout
Summary
Two small fixes that significantly improve the reliability of
semanticActionfillwith role locators and prevent premature timeouts inqamode on real-world sites.Changes
1.
extensions/agent-browser/lib/results/editable-ref-evidence.tsProblem:
getEditableRefEvidence()relies solely on upstream snapshot metadata flags (editable,contentEditable, etc.) and snapshot text patterns to determine if an element is editable. The upstreamagent-browserdoes not always include these flags for standard<input>elements (e.g.,<input type="search" aria-label="Search">), causingeditableEvidenceto returnundefined. This blockssemanticActionfillwithlocator: "role"from resolving to a directfill @eNcommand, falling back to the less reliable "Rich input recovery" (keyboard typing) path.Fix: Add a fallback checking the element's ARIA role against a set of roles that are inherently editable in the browser accessibility tree:
searchbox,textbox,combobox, andspinbutton. When the role matches and no prior evidence contradicts editability,editableEvidencereturnstrue.2.
extensions/agent-browser/lib/process.tsProblem:
SAFE_AGENT_BROWSER_OPERATION_TIMEOUT_MSis set to 25,000ms. This constant is used to clampAGENT_BROWSER_DEFAULT_TIMEOUTin the child process environment, capping the upstreamwait --loadandwait --fndefault timeout. On slower sites or pages with many assets, theqamode'swait --load domcontentloadedstep can timeout, causing the entire batch to fail. The upstream default (DEFAULT_AGENT_BROWSER_PROCESS_TIMEOUT_MS = 35_000) is already 35s, so clamping to 25s is unnecessarily restrictive.Fix: Increase
SAFE_AGENT_BROWSER_OPERATION_TIMEOUT_MSfrom 25,000 to 35,000, aligning it with the default process timeout.Testing
Fix 1 —
editable-ref-evidence.tsBefore the fix, this call would fail with "Rich input recovery" suggesting keyboard typing instead of filling directly:
{ "semanticAction": { "action": "fill", "locator": "role", "role": "searchbox", "name": "Search", "text": "riverpod" } }After the fix, it successfully resolves to
fill @e3 "riverpod"and fills the searchbox on https://pub.dev.Fix 2 —
process.tsBefore the fix, this
qacall would timeout withfailureCategory: "timeout"at step 5 (wait --load domcontentloaded):{ "qa": { "url": "https://pub.dev/packages/riverpod", "expectedText": "riverpod 3.3.2", "screenshotPath": "/tmp/riverpod-qa.png" }, "timeoutMs": 15000 }After the fix (or with the increased default), the same call passes with screenshot verified on disk when given adequate timeout.
Related
Closes the reliability gap between
semanticActionmode (which has pre-execution ref resolution) andjob/qamode (which depend on upstreamfindcommands or adequate timeouts).Checklist: