feat: exhaustive-by-default scans with a design-judgment layer by gabelul · Pull Request #16 · gabelul/pixelslop

gabelul · 2026-06-09T23:46:23Z

The full direction change: Pixelslop stops optimizing for "fewer defensible measured findings only" and becomes exhaustive by default, with a labeled judgment layer on top of the measured backbone. Goal: be good enough that you never run a second design tool. Four commits.

1. Exhaustive by default

deep and thorough now default to true, because Pixelslop is usually driven by an AI agent that won't remember the flags — the default has to be the thorough one. thorough shows low-confidence findings tagged instead of hiding them. --fast is the opt-out.

2. Measured vs judgment layers

Every finding carries kind: "measured" (default, evidence-backed) or "judgment" (subjective). The report keeps them in separate labeled sections so an opinion never reads as a measured fact. The /20 stays measured-only.

3. The design-director pass

A 7th evaluator that looks at the screenshots and opines like a design director: does this read as AI-generated, is the composition generic, what's the missed opportunity, where does it overload the user. It emits judgment findings only, never a score. The guard against vague "make it pop" noise is a mandatory second pass where it argues against its own findings, drops what it can't defend or what a measured evaluator already caught, respects intentional bold design, and tags confidence. This is what beats the alternative instead of just matching it: a deep measured backbone plus the subjective read, not the subjective half alone.

4. Project-specific personas

The 8 built-in personas were the only lens, and the documented custom-persona discovery was never actually wired. Now personas write/list manage validated custom personas, and the orchestrator generates 1-2 from the project's audience/brand so the persona findings fit your users.

Tests

1000 passing, zero dependencies. New: report-layers, design-director contract, personas-tool, plus updated default + evaluator-count tests.

Regular merge, not squash, so release-please keeps each feat entry.

Pixelslop is usually driven by an AI agent, and an agent won't remember to pass --thorough or --deep any more than a person will. So the default behaviour was the only behaviour, and the default was the minimal one: low-confidence findings hidden, shallow collection. That made scans miss things. Flip deep and thorough to default true. thorough now shows lower-confidence findings tagged with their confidence instead of hiding them; deep doubles the collection budgets for more evidence. The opt-out is --fast, which turns both back off for a quick high-confidence-only pass. Personas already defaulted to all. SKILL.md and the settings docs explain the exhaustive-by-default posture and the --fast escape hatch.

Findings now carry a kind: "measured" (the default, evidence-backed) or "judgment" (a subjective read). The HTML report keeps them in separate labeled sections so an opinion never reads as a measured fact, and judgment findings show their confidence inline. A scan with only measured findings looks exactly as before — the judgment layer only appears when there is something in it. This is the report foundation for the design-director pass. The /20 score stays measured-only; judgment is additive coverage, not a score input.

The six measured evaluators score what's measurable. None of them can say whether a page is actually any good, which is the thing a designer catches by eye and the reason a measured-only scan misses things. The design-director is a seventh evaluator that looks at the screenshots and opines: does this read as AI-generated, is the composition generic, what's the missed opportunity, where does the page make the user think too hard. It emits judgment findings only and never touches the /20 — the score stays measured. The guard against turning into vague "make it pop" noise is a mandatory second pass: it argues against each of its own findings, drops the ones it can't defend or that a measured evaluator already caught, respects intentional bold design, and tags what survives with a confidence. The orchestrator spawns it alongside the six and routes its findings to the report's Design judgment layer.

Pixelslop shipped 8 generic personas, and the docs claimed custom ones in .pixelslop/personas/ were auto-discovered — but nothing actually loaded them. So every project got the same generic lens, and a wedding-planner site was never tested by "the bride three weeks out." Adds a personas tool group: `personas write` validates a persona (required fields, slug-only id, no built-in collision, no path traversal) and saves it to .pixelslop/personas/; `personas list` returns built-ins plus custom. The orchestrator now generates 1-2 personas from the project's audience and brand and evaluates them alongside the built-ins, leading with the project-specific one when it surfaces a real audience issue.

…drift An agent invoking /pixelslop reads SKILL.md, and SKILL.md was advertising almost none of what Pixelslop can do — the description was three releases stale, the args list was missing --fast and --deep, and personas, the design-director, trends, and tokens went unmentioned. Capabilities nobody knows about may as well not exist. Rewrites the frontmatter description and args to match reality, and adds a single canonical "Capabilities & Options" menu near the top of SKILL.md — the first thing an agent reads — that also tells it to surface the relevant option to the user when a scan finishes. The durable part is the guard: skill-discoverability.test.js pulls the setting keys straight from SETTING_DEFS and fails the build if SKILL.md doesn't mention each one, plus curated checks for every flag, command, and capability. Add a feature without advertising it and the build breaks. That's the mechanism that keeps this from rotting again.

Knowing the options isn't the same as knowing the best one. An agent could read the capabilities menu and still just run defaults, or open with a wall of settings questions. Neither is advice. The skill now carries an advisory playbook: infer the user's intent — a quick look, a pre-launch review, a CI run, tracking progress — and lead with a recommendation plus the one tradeoff, only asking when there's a real fork. The user shouldn't need to know --fast or --deep exist; translating intent into flags is the agent's job. The drift guard now also asserts the advisory section stays.

…Code SKILL.md told the agent to use AskUserQuestion in ~14 places, but that's a Claude Code tool. Codex CLI has no choice-prompt popup (it's an open request upstream), and the installer only rewrites paths — so a Codex-installed skill was asking the agent to use a tool that doesn't exist there. Adds an "Asking the user" protocol at the top: the AskUserQuestion blocks are the question content, and each harness renders them its own way. Claude Code uses the tool; Codex and others present a numbered menu and wait for a reply; non-interactive runs skip the question and use the default. One SKILL.md works everywhere, no per-harness rewriting. Drift-guarded.

Gabi added 4 commits June 10, 2026 01:45

gabelul changed the title ~~feat: make scans exhaustive by default~~ feat: exhaustive-by-default scans with a design-judgment layer Jun 10, 2026

Gabi added 3 commits June 10, 2026 02:36

gabelul merged commit 7c8d0b3 into main Jun 10, 2026
3 checks passed

gabelul deleted the feat/exhaustive-by-default branch June 10, 2026 09:15

gabelul mentioned this pull request Jun 10, 2026

fix: make agent spawning work under Codex (inline fallback + native TOML) #18

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: exhaustive-by-default scans with a design-judgment layer#16

feat: exhaustive-by-default scans with a design-judgment layer#16
gabelul merged 7 commits into
mainfrom
feat/exhaustive-by-default

gabelul commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gabelul commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Exhaustive by default

2. Measured vs judgment layers

3. The design-director pass

4. Project-specific personas

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gabelul commented Jun 9, 2026 •

edited

Loading