Skip to content

agentjido/jido_browser

Repository files navigation

Jido Browser

Hex.pm Hex Docs CI License Website Ecosystem Discord

Browser automation for Jido AI agents.

Overview

Jido.Browser is organized around three simple lanes:

  • web_fetch/2 for stateless HTTP-first retrieval
  • fetch_rich/2 for agent-friendly retrieval with optional browser fallback
  • start_session/1 and end_session/1 for browser-backed workflows
  • Jido.Browser.Pool plus start_session(pool: ...) as an optional acceleration layer

agent-browser remains the default adapter. Web also supports warm pools when you want browser-backed sessions with lower cold-start overhead. Vibium remains available without warm-pool support. Lightpanda is available as an optional limited adapter for lightweight DOM and JavaScript automation, with warm-pool support for prestarted CDP sessions.

The Hex package and OTP app remain jido_browser, while the public Elixir namespace is Jido.Browser.*.

Installation

Add the dependency:

def deps do
  [
    {:jido_browser, "~> 2.0"}
  ]
end

Install the default browser backend:

mix jido_browser.install

That installs the pinned agent-browser binary for the current platform and runs agent-browser install to provision the browser runtime.

Recommended Alias Setup

defp aliases do
  [
    setup: ["deps.get", "jido_browser.install --if-missing"],
    test: ["jido_browser.install --if-missing", "test"]
  ]
end

Installing Specific Backends

mix jido_browser.install agent_browser
mix jido_browser.install vibium
mix jido_browser.install web
mix jido_browser.install lightpanda

Lightpanda support uses optional dependencies. Add them to applications that select Jido.Browser.Adapters.Lightpanda:

def deps do
  [
    {:jido_browser, "~> 2.0"},
    {:light_cdp, "~> 0.2.1"},
    {:lightpanda_ex, "~> 0.1.0"}
  ]
end

Quick Start

{:ok, session} = Jido.Browser.start_session()

{:ok, session, _} = Jido.Browser.navigate(session, "https://example.com")
{:ok, session, snapshot} = Jido.Browser.snapshot(session)

snapshot["snapshot"] || snapshot[:snapshot]

{:ok, session, _} = Jido.Browser.click(session, "@e1")
{:ok, _session, %{content: markdown}} = Jido.Browser.extract_content(session, format: :markdown)

:ok = Jido.Browser.end_session(session)

Selectors remain supported, but ref-based interaction is the preferred 2.0 flow:

  1. snapshot
  2. act on @eN refs
  3. re-snapshot

Stateless Web Fetch

{:ok, result} =
  Jido.Browser.web_fetch(
    "https://example.com/docs",
    format: :markdown,
    allowed_domains: ["example.com"],
    focus_terms: ["API", "authentication"],
    citations: true
  )

result.content
result.passages
result.metadata # present when extraction returns document metadata

web_fetch/2 keeps HTML handling native for selector extraction and markdown conversion, and uses extractous_ex for fetched binary documents such as PDFs, Word, Excel, PowerPoint, OpenDocument, EPUB, and common email formats. Binary document responses may also include result.metadata when extraction returns document metadata.

Req is the default HTTP backend. jido_browser also includes a vendored BrowseyHttp-backed backend when you want a browser-imitating HTTP path for pages that do not require JavaScript execution. Select it globally or per request:

config :jido_browser, :web_fetch,
  backend: Jido.Browser.WebFetch.Backends.Browsey,
  browsey: [
    browser: :chrome,
    timeout: 30_000
  ]

{:ok, result} =
  Jido.Browser.web_fetch(
    "https://example.com/docs",
    format: :markdown,
    backend: :browsey,
    browsey: [browser: :safari]
  )

BrowseyHttp still does not execute JavaScript. Sites that require a rendered browser should use a browser session instead. Egress also matters: datacenter IP ranges, CI traffic, or too many requests from one IP can still trigger challenges even with browser-like HTTP fingerprints. web_fetch/2 passes backend-specific :req and :browsey keyword options from config and runtime opts so applications can supply transport settings without coupling jido_browser to a proxy provider.

BrowseyHttp is vendored from s3cur3/browsey_http under its MIT license because it is not currently published on Hex. The vendored copy keeps jido_browser Hex-publishable; if BrowseyHttp is released on Hex, this project should replace the vendored copy with the upstream Hex dependency.

Agent-Friendly Rich Fetch

Use fetch_rich/2 when an agent needs one retrieval tool that starts with cheap HTTP/document extraction and can fall back to a browser only when explicitly allowed:

{:ok, result} =
  Jido.Browser.fetch_rich(
    "https://example.com/protected-docs",
    http_backends: [:req, :browsey],
    browser_fallback: true,
    pool: :default,
    citations: true
  )

result.retrieval_path # :web_fetch, :browsey, or :browser
result.blocked?
result.content

fetch_rich/2 returns the same core result shape as web_fetch/2 and adds retrieval_path, fallback_reason, and blocked?. web_fetch/2 remains stateless and never uses pools.

State Persistence

state_path = Path.expand("tmp/browser-state.json")
File.mkdir_p!(Path.dirname(state_path))

{:ok, session} = Jido.Browser.start_session()
{:ok, session, _} = Jido.Browser.navigate(session, "https://example.com")
{:ok, session, _} = Jido.Browser.save_state(session, state_path)
:ok = Jido.Browser.end_session(session)

{:ok, restored} = Jido.Browser.start_session()
{:ok, restored, _} = Jido.Browser.load_state(restored, state_path)

Tab Workflow

{:ok, session} = Jido.Browser.start_session()
{:ok, session, _} = Jido.Browser.navigate(session, "https://example.com")
{:ok, session, _} = Jido.Browser.new_tab(session, "https://example.org")
{:ok, session, tabs} = Jido.Browser.list_tabs(session)
{:ok, session, _} = Jido.Browser.switch_tab(session, 1)
{:ok, session, _} = Jido.Browser.close_tab(session, 1)

Warm Session Pools

Warm pools are explicit and optional. They speed up browser-backed workflows, while web_fetch/2 stays stateless and never uses pools.

For OTP applications, prefer adding a named pool to your supervision tree:

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    children = [
	      {Jido.Browser.Pool,
	       name: :default,
	       size: 2,
	       headless: true,
	       startup_timeout: 60_000}
    ]

    Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Supervisor)
  end
end

Then check out pooled sessions by name:

{:ok, session} =
  Jido.Browser.start_session(
    pool: :default,
    checkout_timeout: 5_000
  )

{:ok, session, _} = Jido.Browser.navigate(session, "https://example.com")
:ok = Jido.Browser.end_session(session)

Use start_pool/1 for scripts, tests, or ad hoc startup:

{:ok, _pool} =
  Jido.Browser.start_pool(
    name: :default,
    size: 2,
    headless: true
  )

{:ok, session} =
  Jido.Browser.start_session(
    pool: :default,
    checkout_timeout: 5_000
  )

{:ok, session, _} = Jido.Browser.navigate(session, "https://example.com")
:ok = Jido.Browser.end_session(session)

Warm pools are currently supported by Jido.Browser.Adapters.AgentBrowser, Jido.Browser.Adapters.Lightpanda, and Jido.Browser.Adapters.Web.

  • AgentBrowser pools keep full warm daemon-backed sessions ready for checkout.
  • Lightpanda pools keep prestarted Lightpanda/CDP sessions ready for checkout.
  • Web pools keep reserved warmed profiles ready for checkout.
  • lifecycle: :ephemeral is the default: end_session/1 recycles the checked-out worker and warms a replacement in the background.
  • lifecycle: :persistent returns healthy workers to the pool after normal end_session/1; owner crashes, failed health checks, max_uses, and max_age_ms still recycle workers.

Inspect a pool with:

{:ok, status} = Jido.Browser.pool_status(:default)

status.ready
status.leased
status.lifecycle

For the Web adapter, pooled sessions are still browser sessions, not HTTP fetches. Use web_fetch/2 when you want the simplest request/response API without browser state.

Persistent pools can preserve browser profile continuity, cookies, storage, and session history for application-managed workflows. They do not guarantee access through bot filters; egress, traffic rate, target-site policy, and user-provided state remain application concerns.

Plugin Setup

defmodule MyBrowsingAgent do
  use Jido.Agent,
    name: "browser_agent",
    plugins: [
      {Jido.Browser.Plugin,
       [
         adapter: Jido.Browser.Adapters.AgentBrowser,
         pool: :default,
         checkout_timeout: 5_000,
         headless: true,
         timeout: 30_000
       ]}
    ]
end

Configuration

config :jido_browser,
  adapter: Jido.Browser.Adapters.AgentBrowser

config :jido_browser, :agent_browser,
  binary_path: "/usr/local/bin/agent-browser",
  headed: false

Other adapters can still be configured explicitly:

config :jido_browser, :vibium,
  binary_path: "/path/to/vibium"

config :jido_browser, :web,
  binary_path: "/usr/local/bin/web",
  profile: "default"

config :jido_browser, :lightpanda,
  binary_path: "/usr/local/bin/lightpanda",
  disable_telemetry: true

Optional web fetch settings:

config :jido_browser, :web_fetch,
  backend: Jido.Browser.WebFetch.Backends.Req,
  cache_ttl_ms: 300_000,
  req: [
    connect_options: [
      timeout: 10_000
    ]
  ],
  extractous: [
    pdf: [extract_annotation_text: true],
    office: [include_headers_and_footers: true]
  ]

Configured req, browsey, and extractous options are merged with any per-call options passed to Jido.Browser.web_fetch/2.

Backends

AgentBrowser (Default)

  • native snapshot support with refs
  • supervised daemon per session
  • optional warm session pools with explicit checkout
  • direct JSON IPC from Elixir
  • built-in state save/load and tab management support

Lightpanda (Limited)

  • optional adapter backed by light_cdp
  • supports session lifecycle, navigation, click, type, PNG screenshots, content extraction, and JavaScript evaluation
  • supports warm pools for prestarted Lightpanda/CDP sessions
  • uses lightpanda_ex for pinned Lightpanda binary installation
  • disables Lightpanda telemetry by default with LIGHTPANDA_DISABLE_TELEMETRY=true
  • does not provide AgentBrowser-native refs, state persistence, tab management, or console capture

Vibium (Legacy)

  • retained for transitional compatibility
  • feature-frozen in 2.0

Web (Legacy)

  • retained for transitional compatibility
  • feature-frozen in 2.0

Public API

Core operations:

  • start_pool/1
  • stop_pool/1
  • start_session/1
  • end_session/1
  • navigate/3
  • click/3
  • type/4
  • screenshot/2
  • extract_content/2
  • web_fetch/2
  • evaluate/3

Agent-browser-native operations:

  • snapshot/2
  • wait_for_selector/3
  • wait_for_navigation/2
  • query/3
  • get_text/3
  • get_attribute/4
  • is_visible/3
  • save_state/3
  • load_state/3
  • list_tabs/2
  • new_tab/3
  • switch_tab/3
  • close_tab/3
  • console/2
  • errors/2

Available Actions

Session

  • StartSession
  • EndSession
  • GetStatus
  • SaveState
  • LoadState

Navigation

  • Navigate
  • Back
  • Forward
  • Reload
  • GetUrl
  • GetTitle

Interaction

  • Click
  • Type
  • Hover
  • Focus
  • Scroll
  • SelectOption

Waiting and Queries

  • Wait
  • WaitForSelector
  • WaitForNavigation
  • Query
  • GetText
  • GetAttribute
  • IsVisible

Content and Diagnostics

  • Snapshot
  • Screenshot
  • ExtractContent
  • Console
  • Errors

Tabs

  • ListTabs
  • NewTab
  • SwitchTab
  • CloseTab

Advanced and Composite

  • Evaluate
  • ReadPage
  • SnapshotUrl
  • SearchWeb
  • WebFetch

Using With Jido Agents

defmodule MyBrowsingAgent do
  use Jido.Agent,
    name: "web_browser",
    description: "An agent that can browse the web",
    plugins: [{Jido.Browser.Plugin, [headless: true]}]
end

Jido.Browser.Plugin now exposes 38 browser actions, including snapshot/refs workflows, browser state actions, diagnostics, tab management, and stateless web fetch.

License

Apache-2.0 - See LICENSE for details.