agent-screenshot

Two tools for giving AI agents eyes. Automated web screenshots with Vision-optimised tiling, plus desktop screen capture.

Blog post: Why these tools exist and how I use them

# Screenshot a web page (tiled for AI Vision models)
python screenshot.py https://example.com --full-page

# Capture your desktop screen
python grab.py

What's in the box

Tool	What it does
`screenshot.py`	Playwright-based web screenshots. Full-page captures are automatically tiled into 1072x1072 chunks optimised for Claude Vision and other multimodal models.
`grab.py`	Desktop screen capture using Python's `mss` library. 14 region presets (halves, thirds, quadrants) for targeting specific parts of your display.

Install

git clone https://github.com/rodbland2021/agent-screenshot.git
cd agent-screenshot
pip install -r requirements.txt
playwright install chromium

Verify it works:

python screenshot.py https://example.com
# Should print a file path like /tmp/screenshots/example-com_1234567890.jpg

Screenshot tool

Takes automated screenshots of any URL using headless Chromium.

# Basic screenshot
python screenshot.py https://example.com

# Mobile viewport (375x812)
python screenshot.py https://example.com --mobile

# Full page, tiled for AI Vision models
python screenshot.py https://example.com --full-page

# Dismiss cookie banners and popups
python screenshot.py https://example.com --dismiss-popups --wait-until load --wait 3000

# Custom auth header
python screenshot.py https://my-app.example.com --header "Authorization=Bearer mytoken"

# Screenshot a specific element
python screenshot.py https://example.com --selector ".hero-section"

Why 1072x1072 tiles?

Claude Vision and similar multimodal models have a maximum effective resolution. A 1072x15000 pixel full-page screenshot is too large to process in detail — text blurs, UI elements become unreadable. Tiling the page into 1072x1072 chunks lets the model read every label, button, and table cell.

The tool handles this automatically with --full-page. A long page produces 4-8 tiles, each saved as a separate file.

Options

Flag	Default	Description
`--full-page`	off	Capture entire page, tile into 1072x1072 chunks
`--mobile`	off	Mobile viewport (375x812)
`--dismiss-popups`	off	Auto-close cookie banners, geo redirects, email popups
`--selector`	—	CSS selector to screenshot a specific element
`--wait N`	0	Extra wait in ms after page load (for SPAs)
`--wait-until`	networkidle	`load`, `domcontentloaded`, or `networkidle`
`--width`	1072	Viewport width
`--height`	1072	Viewport height
`--quality`	85	JPEG quality (1-100)
`--png`	off	Output PNG instead of JPEG
`--out`	/tmp/screenshots	Output directory
`--max-height`	15000	Truncate pages taller than this
`--header`	—	Custom HTTP header as `KEY=VALUE` (repeatable)

Grab tool

Captures the physical desktop screen. Useful for sharing visual context that isn't a web page — design mockups, spreadsheets, error dialogs, anything on your monitor.

Requires a display. grab.py needs X11 or Wayland (Linux), a desktop session (macOS/Windows), or a virtual framebuffer (Xvfb). It will not work on headless servers, Docker containers, or CI runners without a display server.

# Full screen
python grab.py

# Left half of screen
python grab.py left

# Top-right quadrant
python grab.py top-right

# Custom output path
python grab.py --out /tmp/my-capture.jpg

Regions

full, left, right, top, bottom, top-left, top-right, bottom-left, bottom-right, left-third, center-third, right-third, left-two-thirds, right-two-thirds

Options

Flag	Default	Description
`--out`	/tmp/screen.jpg	Output file path
`--quality`	85	JPEG quality (1-100)
`--monitor`	1	Monitor number for multi-display setups

MCP server (native agent integration)

The MCP server gives AI agents direct tool access — no shell commands needed. The agent calls screenshot or grab as native tools.

Claude Code

Add to ~/.claude.json:

{
  "mcpServers": {
    "agent-screenshot": {
      "command": "python3",
      "args": ["/path/to/agent-screenshot/mcp_server.py"]
    }
  }
}

Restart Claude Code. The agent now has screenshot and grab tools available natively.

OpenClaw / mcporter

Add to your MCP config (e.g. ~/.mcporter/mcporter.json):

{
  "mcpServers": {
    "agent-screenshot": {
      "command": "python3",
      "args": ["/path/to/agent-screenshot/mcp_server.py"]
    }
  }
}

MCP tool reference

Tool	Parameters	Description
`screenshot`	`url`, `full_page`, `mobile`, `dismiss_popups`, `selector`, `wait_ms`, `quality`, `headers`	Screenshot a URL, optionally tiled for Vision
`grab`	`region`, `output_path`, `quality`, `monitor`	Capture desktop screen

Additional dependency

The MCP server requires the mcp package in addition to the base requirements:

pip install mcp

Add to your agent's instructions

For agents that don't support MCP (or as a fallback), add to your agent's instructions:

Claude Code (CLAUDE.md):

After any visual/UI change, verify with a screenshot before reporting done:
python /path/to/screenshot.py <url> --full-page
Read the resulting screenshots and check for visual regressions.

Cursor (.cursorrules), Aider (.aider.conf.yml), OpenClaw (agent system prompt) — same instruction, adapted to your config format.

Requirements

Python 3.10+ on Linux, macOS, or Windows (including WSL)
screenshot.py: Playwright + Chromium. Works on any platform including headless servers, Docker, and CI.
grab.py: mss + Pillow. Requires a physical or virtual display (X11, Wayland, macOS desktop, or Windows).
mcp_server.py (optional): mcp package. For native tool integration with Claude Code and OpenClaw.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.claude		.claude
.github/workflows		.github/workflows
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
VERSION		VERSION
grab.py		grab.py
mcp_server.py		mcp_server.py
requirements.txt		requirements.txt
screenshot.py		screenshot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-screenshot

What's in the box

Install

Screenshot tool

Why 1072x1072 tiles?

Options

Grab tool

Regions

Options

MCP server (native agent integration)

Claude Code

OpenClaw / mcporter

MCP tool reference

Additional dependency

Add to your agent's instructions

Requirements

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-screenshot

What's in the box

Install

Screenshot tool

Why 1072x1072 tiles?

Options

Grab tool

Regions

Options

MCP server (native agent integration)

Claude Code

OpenClaw / mcporter

MCP tool reference

Additional dependency

Add to your agent's instructions

Requirements

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 1

Languages

Packages