✨ feat: agent-browser parity for tap browser (phase 1)#24
Merged
Conversation
get text/html/value/attr/title/url/count/box/styles and is visible/enabled/checked, accepting CSS selectors or @en refs.
dblclick, focus, check/uncheck, scrollintoview, upload, drag, mouse move/down/up/wheel, keyboard type/insert, keydown/keyup.
wait now supports plain durations, --text substring, --url glob, --load load|domcontentloaded|networkidle, --fn JS polling, and --state visible|hidden|attached|detached. Also replaces the open --wait-selector 2s-sleep stub with a real visibility wait.
find role/text/label/placeholder/alt/title/testid/first/last/nth with click/fill/type/hover/focus/check/uncheck/text actions. role resolves via the accessibility tree; others via injected JS to backendNodeID.
storage local/session get/set/clear via Runtime.evaluate, and state save/load in Playwright storageState-compatible JSON (cookies with full attributes + current-origin localStorage).
set viewport/device/geo/offline/headers/media/useragent/clear. CDP overrides are per-connection and tap has no daemon, so settings persist on TabRecord and re-apply in resolveTarget on every invocation.
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ❌ Deployment failed View logs |
tap | 456086a | Jun 12 2026, 04:08 AM |
- find alt/title now reject fill/type instead of silently passing an empty value - network intercept help documents that it blocks until interrupted - click error message mentions @en refs (it always supported them) - SKILL.md find nth synopsis includes the required <n> argument
- canonical ArgsUsage <selector|@en> and two standard missing-arg error strings (with/without @en support) - fill in missing Descriptions/ArgsUsage (mouse move, cookies, snapshot, screenshot, pdf, dialog, forms, scroll, storage set/clear) - keypress help no longer claims per-character typing; points to keyboard type - top-level quick start shows the snapshot → @en workflow - hover now accepts @en refs like the other element commands
Hidden 'tap docs' command walks the cli.Command tree and emits a deterministic Markdown reference. 'mise run docs' writes docs/cli.md and TestDocsDrift fails CI when the committed file is stale, so help text stays the single source of truth.
- docs/cli.md generated via 'mise run docs', linked from README and docs/browser.md - correct the --wait-selector wording (waits for visibility, 30s default; no subprocess involved) - clarify --wait-js applies to fetch/site, not tap browser
Owner
Author
|
Follow-up commits from the post-merge audit:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings
tap browsertoward feature parity with agent-browser, implemented as six independent workstreams:get/isqueries —get text/html/value/attr/title/url/count/box/styles,is visible/enabled/checked; all accept CSS selectors or@eNsnapshot refsdblclick,focus,check/uncheck(idempotent, React-compatible events),scrollintoview,upload,drag,mouse move/down/up/wheel,keyboard type/insert,keydown/keyupwait— plain durations,--text,--urlglob,--load load|domcontentloaded|networkidle,--fnJS poll,--state visible|hidden|attached|detached; also fixes theopen --wait-selector2s-sleep stubfind) —role/text/label/placeholder/alt/title/testid/first/last/nth×click/fill/type/hover/focus/check/uncheck/text; role resolves via the accessibility tree, others via injected JS → backendNodeIDstorage local|sessionget/set/clear;state save|loadin Playwright storageState-compatible JSON (0600 perms)set) —viewport/device/geo/offline/headers/media/useragent/clear. CDP overrides are per-connection and tap has no daemon, so settings persist onTabRecordand re-apply inresolveTargeton every invocationDeliberately deferred (phase 2+)
Streaming/dashboard, chat, React DevTools/vitals, HAR/trace/video recording, auth vault, clipboard, iframe frame switching, batch execution — these need a persistent daemon or are separate product surfaces.
Verification
mise run lint— 0 issues;mise run test— all packages passfind label "Email" fill→find role button click --name Submit→wait --text→ storage set/get →state save→set viewport 800 600confirmed persisting across separate CLI invocations (window.innerWidth= 800 in a fresh process)Known limitations
wait --state hiddenanddetachedboth map tochromedp.WaitNotPresent(CDP has no built-in hidden-but-present waiter)state savecaptures localStorage for the current tab's origin only (no daemon to enumerate origins);loadwarns on origin mismatchnetworkidleuses a resource-count-stability heuristic (500ms quiet window)