Two tools for giving AI agents eyes. Automated web screenshots with Vision-optimised tiling, plus desktop screen capture.
Blog post: Why these tools exist and how I use them
# Screenshot a web page (tiled for AI Vision models)
python screenshot.py https://example.com --full-page
# Capture your desktop screen
python grab.py| Tool | What it does |
|---|---|
screenshot.py |
Playwright-based web screenshots. Full-page captures are automatically tiled into 1072x1072 chunks optimised for Claude Vision and other multimodal models. |
grab.py |
Desktop screen capture using Python's mss library. 14 region presets (halves, thirds, quadrants) for targeting specific parts of your display. |
git clone https://github.com/rodbland2021/agent-screenshot.git
cd agent-screenshot
pip install -r requirements.txt
playwright install chromiumVerify it works:
python screenshot.py https://example.com
# Should print a file path like /tmp/screenshots/example-com_1234567890.jpgTakes automated screenshots of any URL using headless Chromium.
# Basic screenshot
python screenshot.py https://example.com
# Mobile viewport (375x812)
python screenshot.py https://example.com --mobile
# Full page, tiled for AI Vision models
python screenshot.py https://example.com --full-page
# Dismiss cookie banners and popups
python screenshot.py https://example.com --dismiss-popups --wait-until load --wait 3000
# Custom auth header
python screenshot.py https://my-app.example.com --header "Authorization=Bearer mytoken"
# Screenshot a specific element
python screenshot.py https://example.com --selector ".hero-section"Claude Vision and similar multimodal models have a maximum effective resolution. A 1072x15000 pixel full-page screenshot is too large to process in detail — text blurs, UI elements become unreadable. Tiling the page into 1072x1072 chunks lets the model read every label, button, and table cell.
The tool handles this automatically with --full-page. A long page produces 4-8 tiles, each saved as a separate file.
| Flag | Default | Description |
|---|---|---|
--full-page |
off | Capture entire page, tile into 1072x1072 chunks |
--mobile |
off | Mobile viewport (375x812) |
--dismiss-popups |
off | Auto-close cookie banners, geo redirects, email popups |
--selector |
— | CSS selector to screenshot a specific element |
--wait N |
0 | Extra wait in ms after page load (for SPAs) |
--wait-until |
networkidle | load, domcontentloaded, or networkidle |
--width |
1072 | Viewport width |
--height |
1072 | Viewport height |
--quality |
85 | JPEG quality (1-100) |
--png |
off | Output PNG instead of JPEG |
--out |
/tmp/screenshots | Output directory |
--max-height |
15000 | Truncate pages taller than this |
--header |
— | Custom HTTP header as KEY=VALUE (repeatable) |
Captures the physical desktop screen. Useful for sharing visual context that isn't a web page — design mockups, spreadsheets, error dialogs, anything on your monitor.
Requires a display.
grab.pyneeds X11 or Wayland (Linux), a desktop session (macOS/Windows), or a virtual framebuffer (Xvfb). It will not work on headless servers, Docker containers, or CI runners without a display server.
# Full screen
python grab.py
# Left half of screen
python grab.py left
# Top-right quadrant
python grab.py top-right
# Custom output path
python grab.py --out /tmp/my-capture.jpgfull, left, right, top, bottom, top-left, top-right, bottom-left, bottom-right, left-third, center-third, right-third, left-two-thirds, right-two-thirds
| Flag | Default | Description |
|---|---|---|
--out |
/tmp/screen.jpg | Output file path |
--quality |
85 | JPEG quality (1-100) |
--monitor |
1 | Monitor number for multi-display setups |
The MCP server gives AI agents direct tool access — no shell commands needed. The agent calls screenshot or grab as native tools.
Add to ~/.claude.json:
{
"mcpServers": {
"agent-screenshot": {
"command": "python3",
"args": ["/path/to/agent-screenshot/mcp_server.py"]
}
}
}Restart Claude Code. The agent now has screenshot and grab tools available natively.
Add to your MCP config (e.g. ~/.mcporter/mcporter.json):
{
"mcpServers": {
"agent-screenshot": {
"command": "python3",
"args": ["/path/to/agent-screenshot/mcp_server.py"]
}
}
}| Tool | Parameters | Description |
|---|---|---|
screenshot |
url, full_page, mobile, dismiss_popups, selector, wait_ms, quality, headers |
Screenshot a URL, optionally tiled for Vision |
grab |
region, output_path, quality, monitor |
Capture desktop screen |
The MCP server requires the mcp package in addition to the base requirements:
pip install mcpFor agents that don't support MCP (or as a fallback), add to your agent's instructions:
Claude Code (CLAUDE.md):
After any visual/UI change, verify with a screenshot before reporting done:
python /path/to/screenshot.py <url> --full-page
Read the resulting screenshots and check for visual regressions.Cursor (.cursorrules), Aider (.aider.conf.yml), OpenClaw (agent system prompt) — same instruction, adapted to your config format.
- Python 3.10+ on Linux, macOS, or Windows (including WSL)
- screenshot.py: Playwright + Chromium. Works on any platform including headless servers, Docker, and CI.
- grab.py: mss + Pillow. Requires a physical or virtual display (X11, Wayland, macOS desktop, or Windows).
- mcp_server.py (optional):
mcppackage. For native tool integration with Claude Code and OpenClaw.
MIT