Evidence-bounded website audit workflow for internal prospecting, digital presence analysis, and brand-development intelligence.
website-auditor accepts a public website URL, captures authorized public evidence, produces deterministic audit findings and category scores, then optionally uses a bounded Gemini synthesis layer to translate accepted evidence into internal prospect intelligence.
The core architectural rule is simple:
The deterministic audit engine creates audit truth. The LLM may synthesize accepted evidence, but it cannot invent findings, scores, metrics, traffic claims, revenue claims, or audit facts.
This repository is public for portfolio and reference purposes. The deployed Vercel app is private and no public demo is currently exposed.
Website Audit Agent is an internal audit system that evaluates public websites through a controlled workflow:
- Capture public website evidence.
- Extract technical, content, SEO, UX, and brand-relevant signals.
- Generate deterministic findings and category scores.
- Label evidence as measured, observed, or inferred.
- Assemble a structured audit report.
- Optionally use a bounded Gemini agent to translate accepted findings into prospecting intelligence.
The project is not a chatbot that gives website opinions. It is a bounded audit workflow where evidence capture, scoring logic, report generation, and LLM synthesis are separated by design.
This project demonstrates that I can:
- Design bounded AI workflows where the model is not the source of truth.
- Build full-stack internal tools with intake, storage, workers, reporting, and access control.
- Combine browser automation, deterministic scoring, database persistence, and LLM synthesis.
- Structure AI systems around evidence, traceability, validation, and human review.
- Translate a messy business-development problem into a repeatable decision-support system.
- Build AI-assisted tools that are useful without becoming uncontrolled autonomous agents.
Most AI audit tools blur three things that should stay separate:
- Measured evidence — what the system actually captured.
- Deterministic findings — what rules can safely conclude from that evidence.
- Strategic synthesis — how those findings may translate into business-development opportunities.
This project separates those layers.
It is not a system where an LLM “looks at a website” and invents conclusions. It is a controlled audit pipeline with evidence capture, deterministic scoring, persistence, worker execution, access control, and a constrained LLM synthesis layer.
The purpose is to create reliable internal prospect intelligence without turning model interpretation into fake measurement.
- Accepts a public website URL through an internal intake flow.
- Validates the URL before network activity.
- Creates an
audit_runrecord in Postgres. - Enqueues an
audit.runjob throughpg-boss.
- Runs an event-driven worker route inside the Vercel app.
- Captures homepage evidence with a browser-first strategy.
- Falls back to authorized public static evidence when rendering is blocked or unavailable.
- Stores page snapshots and page evidence.
- Tracks capture fidelity so reports communicate evidence quality.
- Produces deterministic findings and category scores.
- Labels claims as
Measured,Observed, orInferred. - Excludes unsupported categories when capture fidelity is too low.
- Prevents inferred claims from being presented as measured facts.
- Assembles report-ready audit narratives.
- Communicates scope, caveats, and evidence quality.
- Separates captured evidence from interpretation.
- Produces structured internal audit outputs.
- Uses a bounded Gemini agent to create internal prospect intelligence.
- Allows the LLM to summarize implications from accepted evidence.
- Prevents the LLM from inventing findings, changing scores, or fabricating metrics.
- Validates agent output with a strict schema before persistence.
- Protects the deployed app behind internal access controls.
- Requires signed session cookies for protected routes.
- Uses a separate
WORKER_SECRETguard for the worker process route.
This project is intentionally scoped.
It is not:
- a public SaaS product
- a generic website crawler
- an anti-bot bypass system
- a Lighthouse replacement
- a full SEO or accessibility scanner
- a fully autonomous AI auditor
- a system where the LLM decides audit truth
- a tool for scanning private, authenticated, or restricted pages
The system only works with authorized public website evidence.
The deployed Vercel app is private, so no public live demo is currently exposed.
Suggested review materials:
- Audit intake screen
- Audit report screen
- Example generated recommendation
- Capture fidelity badge
- Workflow diagram
- Example prospect intelligence output
Recommended local folder for visual documentation:
docs/screenshots/
audit-intake.png
audit-report.png
workflow-diagram.png
prospect-intelligence.pngflowchart TD
A["Internal user enters domain"] --> B["submitDomainAction()"]
B --> C["Create audit_run in Postgres"]
C --> D["Enqueue audit.run job via pg-boss"]
D --> E["Trigger /api/worker/process"]
E --> F["Capture pipeline"]
F --> G["Browser-first homepage capture"]
F --> H["Static public fallback"]
G --> I["page_snapshots + page_evidence"]
H --> I
I --> J["Deterministic audit engine"]
J --> K["Findings + category scores"]
K --> L["Report assembly"]
K --> M["Optional Prospect Audit Agent"]
M --> N["prospect_intelligence"]
L --> O["Internal audit report"]
N --> O
The project is built around a strict separation between deterministic audit logic and LLM synthesis.
| Layer | Owns | Cannot do |
|---|---|---|
| Capture pipeline | Public page capture, browser/static evidence, snapshots | Bypass anti-bot systems or access private pages |
| Deterministic audit engine | Findings, scores, category evaluation, evidence labels | Invent evidence not captured by the system |
| Report assembly | Report structure, category notes, scope notes, risk language | Present inferred claims as measured facts |
| Prospect Audit Agent | Strategic synthesis from accepted evidence | Accept/reject findings, change scores, invent metrics, invent revenue/traffic claims |
The LLM layer is downstream. It reads accepted evidence; it does not create the audit record.
Every finding carries an evidence posture.
| Label | Meaning |
|---|---|
Measured |
Directly measured from captured evidence, markup, HTTP response, or stored snapshot data |
Observed |
Supported by captured website evidence but not necessarily numeric |
Inferred |
Strategic interpretation based on accepted evidence; never presented as measured fact |
This avoids a common failure mode in AI audit systems: turning model interpretation into fake measurement.
The report communicates how reliable the captured evidence was.
| Capture status | Report badge | Meaning |
|---|---|---|
rendered_browser + complete |
Rendered audit | Browser capture completed successfully |
rendered_browser + partial_complete |
Mixed capture | Browser evidence exists but is incomplete |
static_public |
Static fallback | Static public evidence was used instead of full rendering |
secondary_static |
Partial/static | Secondary or limited static evidence was used |
Static-only and secondary-static reports intentionally exclude visual, mobile, and above-the-fold scoring.
The Prospect Audit Agent is a bounded LLM synthesis layer.
Its job is to transform accepted audit evidence into internal prospecting intelligence. It is designed for business-development interpretation, not audit authority.
| Area | Contract |
|---|---|
| Agent type | Bounded LLM synthesis agent |
| Model layer | Gemini |
| Input | Accepted findings, evidence labels, category scores, capture fidelity, report context |
| Output | Structured prospect intelligence |
| Allowed behavior | Interpret accepted evidence, summarize implications, identify business-development angles |
| Forbidden behavior | Invent findings, alter scores, fabricate traffic/revenue/conversion metrics, claim visual evidence without rendered capture |
| Validation | Strict schema validation before persistence |
The deterministic engine answers:
What did we find?
The synthesis layer answers:
Why might this matter to a prospect?
Those are different jobs.
The project includes several controls designed to keep the workflow bounded.
- Public URL validation before network activity.
- SSRF-oriented guards.
- Redirect and final URL validation.
- Browser-first capture with static fallback.
- No anti-bot bypass behavior.
- No authenticated/private page scanning.
- LLM receives accepted evidence only.
- Strict prompt boundaries.
- Strict JSON/Zod output validation.
- No authority to create audit truth.
- No invented metrics, revenue claims, traffic estimates, or visual claims without browser evidence.
The public repository does not mean the deployed app is public.
The Vercel deployment is protected by an internal login flow. Protected routes require a signed session cookie. The worker process route uses a separate WORKER_SECRET header check.
- Worker processing runs inside the Vercel app project.
- Manual worker drain exists only as an emergency recovery action.
- Migrations are applied manually, not automatically during deploy.
- Secrets are documented in
.env.examplewith placeholders only.
| Layer | Technology |
|---|---|
| App framework | Next.js App Router |
| Language | TypeScript |
| Runtime | Node.js |
| Hosting | Vercel |
| Database | Postgres |
| Job queue | pg-boss |
| Browser capture | Playwright Core + @sparticuz/chromium |
| LLM synthesis | Gemini |
| Validation | Zod |
| Testing | Vitest |
| CI | GitHub Actions |
src/
app/ Next.js App Router pages, layouts, route handlers
components/ UI components for intake, dashboard, and reports
lib/ Shared types, env validation, scoring helpers
server/ Orchestration, capture, scoring, report assembly
server/agents/ Prospect Audit Agent prompt, schema, runner
db/ Raw pg client and audit repositories
worker/ Legacy Playwright package, not production dependency
migrations/ Reversible SQL migrations
tests/ Unit, integration, and security tests
docs/agentic/ Architecture and prompt governance documentation
public/ Static assets
.github/workflows/ CI and manual worker-drain workflowsStart here:
workflow.yaml— audit pipeline and system logicsrc/server/— orchestration, capture, scoring, and report assemblysrc/server/agents/— Prospect Audit Agent prompt, schema, and runnersrc/app/— application routes, protected pages, and route handlerstests/— audit logic, reporting, integration, and security tests.github/workflows/— CI and manual worker recovery workflowsdocs/agentic/— architecture and prompt governance documentation
This repo is best reviewed as an AI workflow architecture project, not just as a website audit app.
- Node.js
- npm
- Postgres database
- Gemini API key, if running synthesis locally
cp .env.example .env.local
npm install
npm run migrate:up:local
npm run devLocal app:
http://localhost:3000In local development, the access gate is open when INTERNAL_ACCESS_COOKIE_SECRET is not set.
All variables are documented in .env.example with placeholder values only.
| Variable | Description |
|---|---|
DATABASE_URL |
Postgres connection string |
WORKER_SECRET |
Auth header for /api/worker/process; minimum 16 characters |
AUDIT_API_KEY |
Auth for report enrichment routes; minimum 16 characters |
INTERNAL_ACCESS_PASSWORD |
Password for /internal-login; minimum 8 characters |
INTERNAL_ACCESS_COOKIE_SECRET |
HMAC signing key for session cookie; minimum 32 characters |
GEMINI_API_KEY |
Gemini API key for Prospect Audit Agent synthesis |
| Variable | Description |
|---|---|
GEMINI_MODEL |
Defaults to gemini-2.5-flash |
STORAGE_PROVIDER |
local or vercel_blob |
BLOB_READ_WRITE_TOKEN |
Required when using Vercel Blob |
BROWSER_DRIVER |
playwright or browser_use |
APP_URL |
App base URL |
NEXT_PUBLIC_APP_URL |
Public app base URL |
Generate a cookie secret:
openssl rand -base64 32| Command | Purpose |
|---|---|
npm run dev |
Start Next.js dev server |
npm run build |
Production build |
npm run lint |
Run ESLint |
npm run typecheck |
TypeScript check with no emit |
npm test |
Run Vitest unit tests |
npm run test:coverage |
Run tests with coverage |
npm run test:integration |
Run integration tests |
npm run migrate:up:local |
Apply local migrations from .env.local |
npm run migrate:down:local |
Roll back local migrations from .env.local |
npm run migrate:up:vercel:prod |
Pull Vercel production env and apply migrations |
The repository includes tests for audit logic, scoring, security-sensitive behavior, reporting, integrations, and agent-related constraints.
CI runs:
npm run lint
npm run typecheck
npm test
npm run buildThe target is not only to verify that the app builds. The goal is to keep the audit workflow bounded, typed, and resistant to common failure modes.
Deployment is Vercel-only.
Audit processing runs inside the same app project. No external worker host is required.
Migrations do not run automatically on deploy. Apply them manually:
npm run migrate:up:vercel:prodThe manual worker-drain workflow exists only for emergency recovery of stuck jobs. It is triggered through workflow_dispatch and is not scheduled.
| Route | Guard |
|---|---|
/intake |
Signed session cookie |
/audits |
Signed session cookie |
/report/:path* |
Signed session cookie |
/api/audits/:path* |
Signed session cookie |
/api/reports/:path* |
Signed session cookie |
/api/worker/:path* |
Signed session cookie |
/api/worker/process |
WORKER_SECRET header; cookie exempt |
Public routes:
/
/internal-login
/internal-logout
/_next/*
/favicon.ico
/robots.txt
/sitemap.xml- The deployed app is private; no public demo is currently exposed.
- Production private artifact storage through Vercel Blob still needs access-control validation.
- Static-only and secondary-static reports intentionally exclude visual, mobile, and above-the-fold scoring.
- Prospect Intelligence is internal prospecting guidance, not audit truth.
- End-to-end operational smoke validation on a live Vercel deployment is still pending.
- The system is not designed to audit authenticated pages, private pages, or protected environments.
- The current implementation is an internal audit/prospecting workflow, not a generalized public scanning platform.
This project demonstrates:
- hybrid workflow-agent architecture
- deterministic truth boundaries around LLM synthesis
- scoped AI agent behavior
- evidence-backed reporting
- private internal tooling
- worker-based execution
- access-controlled deployment
- TypeScript-first product engineering
- reliability-oriented AI system design
- product thinking applied to brand, marketing, and business-development workflows
The main point of the project is not that it uses AI.
The point is that it shows how to wrap AI inside a controlled workflow where evidence, validation, permissions, and human interpretation remain separated.
MIT