catchapage is an automated page capture toolkit that crawls a curated list of URLs, renders each page in multiple device profiles, and saves both the rendered HTML and a full-page screenshot for every variation. It is built with Bun and Playwright to provide fast, scriptable captures ideal for visual reviews, archival workflows, and regression tracking.
- Multiple device personas (desktop, tablet, mobile) with customisable viewports, locales, and user agents.
- Resilient navigation that retries with fallback strategies and waits for meaningful content before capturing.
- Full artifact bundle per URL: HTML snapshot, PNG screenshot, and synchronized console logs.
- Isolated run folders with timestamped names for easy comparison between capture sessions.
- Single configuration module (
src/config.ts) to adjust timeouts, DNS rules, and device descriptors. - Extensible object-oriented design:
PageCaptureRunner,DeviceContextFactory, andFileLoggerexpose clear extension points for custom behaviour. - Parallel execution.
- Visual regression baselines for QA teams that need multi-device screenshots on demand.
- Competitive research or content archiving where HTML copies are retained alongside imagery.
- Monitoring marketing or landing pages to ensure deployments render consistently over time.
- Internal documentation that benefits from reproducible, high-fidelity page captures without manual browser work.
index.ts: Entry point that wires logging, loads URLs fromlinks.txt, and orchestrates the capture run.src/pageCaptureRunner.ts: Core runner that prepares run directories, drives Playwright, and captures each device variant.src/deviceContextFactory.ts: Builds browser context options for desktop, tablet, and mobile personas, falling back when descriptors are missing.src/loadUrlsFromFile.ts: Normalises and validates URLs sourced fromlinks.txt.src/fileLogger.ts: Streams console output tooutput/<run>/console.logonce the run folder is ready.src/captureOutcome.ts: Lightweight value object representing success or failure of a capture.src/joinPath.ts: Normalises path segments for consistent file system output.links.txt: Input list of URLs (one per line) that will be captured.output/: Generated artifacts grouped by timestamped run folders.
- Orchestrator —
index.tsbootstraps the session, configures logging hooks, and delegates to the runner. - Runner —
PageCaptureRunnercoordinates folder preparation, navigation retries, and per-device capture. - Device providers —
DeviceContextFactorydelivers browser context options and handles descriptor fallbacks gracefully. - I/O utilities —
loadUrlsFromFile,joinPath, andFileLoggerisolate file system, path, and logging responsibilities. - Outcome channel —
CaptureOutcomeencapsulates success or failure metadata for downstream processing.
The design follows OOP principles: every class has a single responsibility, collaboration happens through explicit interfaces, and cross-cutting concerns (logging, observers) are injected rather than hard-coded.
- Bun 1.1 or newer.
- Playwright browser binaries. Install once with
bunx playwright install.
The project uses Bun for dependency management and scripting; avoid substituting Node.js tooling.
bun install-
Prepare URLs
Editlinks.txtand place one URL per line. Bare domains are acceptable; they are normalised tohttps://. -
Run the capture
bun start
or execute the entry point directly:
bun run index.ts
-
Inspect results
- Artifacts are written to
output/<timestamp>/. - Each URL generates a slugged subfolder containing:
page.desktop.html,page.tablet.html,page.mobile.htmlpage.desktop.png,page.tablet.png,page.mobile.png
console.logstores combined stdout/stderr output for the run.
- Artifacts are written to
Run folders are created once per execution and stamped using the en-GB date format (DD-MM-YYYY-HH-MM-SS).
The generated run directory resembles the following structure:
output/
└── 21-02-2025-09-30-12/
├── console.log
├── example-com/
│ ├── page.desktop.html
│ ├── page.desktop.png
│ ├── page.tablet.html
│ ├── page.tablet.png
│ ├── page.mobile.html
│ └── page.mobile.png
└── sametcc-me/
├── page.desktop.html
├── page.desktop.png
├── page.tablet.html
├── page.tablet.png
├── page.mobile.html
└── page.mobile.png
console.log contains every console message emitted during the run (including Bun-side warnings and errors), formatted with timestamps and severity labels.
src/config.ts centralises every tunable. It builds a zod schema that reads from Bun.env, validates types, and materialises a frozen RuntimeConfiguration instance on first import. Each value has a default, so the app starts with no .env, while malformed input fails fast with clear error messages.
- Copy
.env.exampleto.env. - Override any variables you need. Bun automatically loads
.envforbun start,bun run, andbunxscripts, so no extra flags are required. - Re-run
bun start. If a value cannot be parsed, the configuration loader explains which variable is invalid and why.
Alternatively, export variables in your shell or CI job—RuntimeConfiguration always reads from Bun.env.
- Numeric fields must contain valid numbers; otherwise
zodthrows with the original string. - Boolean fields accept only
"true"or"false". - Comma-separated lists (for example
CHROMIUM_HOST_RESOLVER_RULES) are split, trimmed, and filtered for empties. *_COLOR_SCHEMEacceptsdark,light,no-preference,null, orundefined.- Device descriptor names are trimmed. When a descriptor is missing,
DeviceContextFactorylogs a one-time warning and falls back to the manual viewport and screen dimensions.
- Device profiles: Swap Playwright descriptor names (e.g.,
DESKTOP_DEVICE_DESCRIPTOR) or customise viewports, scale factors, and user agents. - Locales & timezones: Adjust locale/timezone constants to emulate regional behaviour.
- Timeouts: Modify navigation, idle waits, and content readiness thresholds to suit slower sites.
- Networking: Supply host resolver rules or custom DNS servers when captures run in isolated environments.
- Air-gapped environments: Populate
CHROMIUM_DNS_SERVERSorCHROMIUM_HOST_RESOLVER_RULESso Chromium resolves domains via internal infrastructure. - High-latency sites: Increase
PRIMARY_NAVIGATION_TIMEOUT_MS,FALLBACK_NAVIGATION_TIMEOUT_MS, orCONTENT_READY_TIMEOUT_MS. - Minimal waiting: Reduce
CAPTURE_STABILIZATION_DELAY_MSwhen pages are lightweight and deterministic. - Light mode captures: Switch
*_COLOR_SCHEMEto"light"to simulate alternative themes.
The tables below show how environment variables map to the exported configuration and the defaults applied when no override is provided. Refer to .env.example for the full strings (for example, user-agent headers).
| Environment variable | Config property | Default | Notes |
|---|---|---|---|
DEFAULT_OUTPUT_DIR |
config.DEFAULT_OUTPUT_DIR |
output |
Root directory for timestamped runs. |
LINKS_FILE |
config.LINKS_FILE |
links.txt |
File containing URLs to capture. |
PARALLEL_CAPTURE_ENABLED |
config.PARALLEL_CAPTURE_ENABLED |
true |
Enables multi-page concurrency. |
| Environment variable | Config property | Default | Notes |
|---|---|---|---|
PRIMARY_NAVIGATION_TIMEOUT_MS |
config.PRIMARY_NAVIGATION_TIMEOUT_MS |
45000 |
Timeout when waiting for networkidle. |
FALLBACK_NAVIGATION_TIMEOUT_MS |
config.FALLBACK_NAVIGATION_TIMEOUT_MS |
60000 |
Timeout for the domcontentloaded fallback strategy. |
POST_NAVIGATION_IDLE_MS |
config.POST_NAVIGATION_IDLE_MS |
1000 |
Delay before stability checks run. |
CAPTURE_STABILIZATION_DELAY_MS |
config.CAPTURE_STABILIZATION_DELAY_MS |
2000 |
Additional wait before screenshots are taken. |
CONTENT_READY_TIMEOUT_MS |
config.CONTENT_READY_TIMEOUT_MS |
10000 |
Max time to wait for meaningful DOM content. |
| Environment variable | Config property | Default value | Notes |
|---|---|---|---|
DESKTOP_DEVICE_DESCRIPTOR |
config.DESKTOP_DEVICE_DESCRIPTOR |
Desktop Chrome |
Playwright preset name. |
DESKTOP_VIEWPORT_WIDTH |
config.DESKTOP_VIEWPORT.width |
1280 |
Pixels. |
DESKTOP_VIEWPORT_HEIGHT |
config.DESKTOP_VIEWPORT.height |
720 |
Pixels. |
DESKTOP_SCREEN_WIDTH |
config.DESKTOP_SCREEN.width |
1920 |
Reported window.screen.width. |
DESKTOP_SCREEN_HEIGHT |
config.DESKTOP_SCREEN.height |
1080 |
Reported window.screen.height. |
DESKTOP_DEVICE_SCALE_FACTOR |
config.DESKTOP_DEVICE_SCALE_FACTOR |
1.5 |
Pixels per device-independent pixel. |
DESKTOP_LOCALE |
config.DESKTOP_LOCALE |
en-US |
Locale passed via context options. |
DESKTOP_TIMEZONE_ID |
config.DESKTOP_TIMEZONE_ID |
Europe/Istanbul |
IANA timezone identifier. |
DESKTOP_COLOR_SCHEME |
config.DESKTOP_COLOR_SCHEME |
dark |
Media emulation for prefers-color-scheme. |
DESKTOP_USER_AGENT |
config.DESKTOP_USER_AGENT |
Chrome on Windows 10 UA | Full string in .env.example. |
| Environment variable | Config property | Default value | Notes |
|---|---|---|---|
MOBILE_DEVICE_DESCRIPTOR |
config.MOBILE_DEVICE_DESCRIPTOR |
Pixel 5 |
Playwright preset name. |
MOBILE_VIEWPORT_WIDTH |
config.MOBILE_VIEWPORT.width |
390 |
Pixels. |
MOBILE_VIEWPORT_HEIGHT |
config.MOBILE_VIEWPORT.height |
844 |
Pixels. |
MOBILE_SCREEN_WIDTH |
config.MOBILE_SCREEN.width |
1080 |
Reported window.screen.width. |
MOBILE_SCREEN_HEIGHT |
config.MOBILE_SCREEN.height |
2340 |
Reported window.screen.height. |
MOBILE_DEVICE_SCALE_FACTOR |
config.MOBILE_DEVICE_SCALE_FACTOR |
3 |
Pixels per device-independent pixel. |
MOBILE_LOCALE |
config.MOBILE_LOCALE |
en-US |
Locale passed via context options. |
MOBILE_TIMEZONE_ID |
config.MOBILE_TIMEZONE_ID |
America/Los_Angeles |
IANA timezone identifier. |
MOBILE_COLOR_SCHEME |
config.MOBILE_COLOR_SCHEME |
dark |
Media emulation for prefers-color-scheme. |
DEFAULT_MOBILE_USER_AGENT |
config.DEFAULT_MOBILE_USER_AGENT |
Chrome on Android UA | Full string in .env.example. |
| Environment variable | Config property | Default value | Notes |
|---|---|---|---|
TABLET_DEVICE_DESCRIPTOR |
config.TABLET_DEVICE_DESCRIPTOR |
iPad (gen 7) |
Playwright preset name. |
TABLET_VIEWPORT_WIDTH |
config.TABLET_VIEWPORT.width |
1024 |
Pixels. |
TABLET_VIEWPORT_HEIGHT |
config.TABLET_VIEWPORT.height |
1366 |
Pixels. |
TABLET_SCREEN_WIDTH |
config.TABLET_SCREEN.width |
1620 |
Reported window.screen.width. |
TABLET_SCREEN_HEIGHT |
config.TABLET_SCREEN.height |
2160 |
Reported window.screen.height. |
TABLET_DEVICE_SCALE_FACTOR |
config.TABLET_DEVICE_SCALE_FACTOR |
2 |
Pixels per device-independent pixel. |
TABLET_LOCALE |
config.TABLET_LOCALE |
en-US |
Locale passed via context options. |
TABLET_TIMEZONE_ID |
config.TABLET_TIMEZONE_ID |
America/Los_Angeles |
IANA timezone identifier. |
TABLET_COLOR_SCHEME |
config.TABLET_COLOR_SCHEME |
dark |
Media emulation for prefers-color-scheme. |
DEFAULT_TABLET_USER_AGENT |
config.DEFAULT_TABLET_USER_AGENT |
Safari on iPad UA | Full string in .env.example. |
| Environment variable | Config property | Default value | Notes |
|---|---|---|---|
CHROMIUM_HOST_RESOLVER_RULES |
config.CHROMIUM_HOST_RESOLVER_RULES |
(empty list) | Comma-separated host resolver rules. |
CHROMIUM_USE_CUSTOM_DNS |
config.CHROMIUM_USE_CUSTOM_DNS |
false |
Enables custom DNS routing when true. |
CHROMIUM_DNS_SERVERS |
config.CHROMIUM_DNS_SERVERS |
["94.140.14.14","94.140.14.15"] |
Comma-separated DNS servers applied when enabled. |
index.tsloads and validates URLs, sets up theFileLogger, and hands control toPageCaptureRunner.PageCaptureRunnerprepares a run folder, then for each link:- Creates a unique directory name.
- Captures desktop, tablet, and mobile variants via
captureVariant. - Uses
navigateWithFallbackto retry navigation (networkidle→domcontentloaded) and waits for DOM stability before saving HTML and PNG artifacts.
- After processing all URLs, a summary report prints to the console and the process exits with a non-zero status if any capture failed.
- Bootstrap: Console methods are wrapped so logs are buffered until the run folder is available.
- Preparation: Timestamped output folder is created, observers attach, and directories for each URL are provisioned.
- Navigation: Primary and fallback strategies attempt to load content reliably while validating HTTP status codes.
- Capture: HTML content and full-page screenshots are produced for desktop, tablet, and mobile contexts.
- Reporting:
CaptureOutcomeaggregates statuses, the runner prints a summary, andprocess.exitCodeis set when failures occur.
bun start— Run the capture pipeline.bun lint— Lint the project with Biome.bun format— Format sources in-place.
- Custom artefacts: Subclass
PageCaptureRunneror wrapcaptureVariantto add PDF exports or additional device profiles. ThewithContexthelper provides an isolated Playwright context ready for further actions (network tracing, accessibility snapshots, etc.). - Augmented logging: Extend
FileLoggerto stream logs to external systems (e.g., S3, HTTP endpoints) or enrich entries with run metadata. - Dynamic URL sources: Replace
loadUrlsFromFilewith a drop-in function that fetches URLs from APIs, databases, or CI artifacts before invoking the runner. - CI integration: Combine with cron or CI pipelines to schedule captures and archive the
outputdirectory as a build artifact.
- Console interception:
FileLoggertakes overconsole.log,console.warn, andconsole.error, emitting identical lines to stdout and toconsole.logon disk. - Buffered writes: Messages are buffered until the run folder exists, ensuring no output is lost during bootstrap.
- Structured formatting: Each entry contains timestamp, severity, and serialized payloads so log processors can parse them easily.
- Observer pattern:
RunFolderLoggingObserverdemonstrates how additional observers (metrics, notifications) can be attached without modifyingPageCaptureRunner.
-
Can I capture only a subset of devices?
Yes. Remove the unnecessarycaptureVariantcalls insidecapturePageor guard them behind feature flags. -
How do I inject authentication headers or cookies?
ExtendcaptureVariantto setpage.context().addCookies(...)or configure extra HTTP headers before callingnavigateWithFallback. -
What is required to run behind a proxy?
UpdatebuildChromiumLaunchOptionsto append--proxy-server=and, if needed, configurecontext.setExtraHTTPHeadersfor authentication. -
Is parallel execution safe?
Yes. Each run writes to its own timestamped directory and opens isolated Playwright contexts, so concurrent processes do not collide.
Happy capturing!