Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions .claude/skills/ui-qa/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
name: ui-qa
description: Use when the user says "ui qa", "visual QA the frontend", "polish pass on the UI", "run the UI personas", or after changes that affect rendering. Agent-driven QA of the web UI against the live demo stack — screenshot matrix critique plus task-driven persona walkthroughs.
---

# UI QA

Agent-driven QA of the Hassette web UI against the live demo stack. Two modes that catch
different bug classes: **screens** (screenshot matrix + design-rule critique — visual
defects) and **personas** (task-driven walkthroughs — navigation dead ends and missing
affordances). The CSS guard scripts in `tools/` already catch structural drift
mechanically; this skill covers what only rendering and usage can reveal.

Use when the user says: "ui qa", "visual QA the frontend", "polish pass on the UI",
"run the UI personas", "review the UI", or after any change that affects rendering.

## Arguments

- empty or `all` — both modes, full scope
- `screens [pages...]` — screenshot critique only, optionally scoped to pages
- `personas [names...]` — walkthroughs only, optionally scoped to personas
- a description of a change ("the logs table at mobile") — infer the scope: pick
affected pages/viewports and the persona selection table in `references/personas.md`

## Phase 1: Environment

Read `references/harness.md` and follow it: start the demo stack in the background, wait
for `DEMO_READY=true`, then **wait ~2 minutes more** for failure/activity data before
capturing anything. Get a tmpdir via `get-skill-tmpdir ui-qa`.

Skip the wait only when the stack is already running from earlier in the session.

## Phase 2a: Screens mode

1. Capture the matrix (scoped — full matrix only for explicit full audits):

```bash
uv run python tools/frontend/ui_qa_capture.py --base-url $DEMO_FRONTEND_URL --output-dir $TMPDIR/shots [--pages ...] [--viewports ...]
```

2. Dispatch one **Sonnet** analysis subagent per page (all viewports/themes of that page
to one agent, so it can compare breakpoints). The prompt names the screenshot files,
tells the agent to Read them, and includes the paths to `frontend/DESIGN_RULES.md` and
`frontend/src/tokens.css` as the standard to judge against — findings must cite a
rule or token, not taste. Enforce with `schema`:

```json
{
"type": "object",
"properties": {
"page": {"type": "string"},
"findings": {"type": "array", "items": {"type": "object", "properties": {
"viewport": {"type": "string"},
"theme": {"type": "string"},
"severity": {"type": "string", "enum": ["broken", "degraded", "polish"]},
"description": {"type": "string"},
"design_rule": {"type": "string"},
"suggestion": {"type": "string"}
}, "required": ["viewport", "theme", "severity", "description", "design_rule", "suggestion"]}}
},
"required": ["page", "findings"]
}
```

Severity: **broken** = content unusable (cropped, overlapping, unreadable);
**degraded** = works but violates a stated design rule; **polish** = defensible but
improvable. An agent reporting zero findings for a page is a valid result — do not
prompt for a minimum count (that manufactures findings).

Screenshots are the sweep, not a wall: when a finding hinges on behavior a static
image can't show (does truncated text expand on tap? does this region scroll?), the
analysis agent should load the live page via Playwright and check — include
`DEMO_FRONTEND_URL` in its prompt. Same shared-browser constraint as personas:
agents that go interactive must run sequentially.

## Phase 2b: Personas mode

1. Read `references/personas.md`; select personas per its table (or the user's scope).
2. Dispatch personas **sequentially, not in parallel** — they share the one Playwright
MCP browser, and parallel agents fight over it. Each subagent prompt contains: the
persona block verbatim, `DEMO_FRONTEND_URL`, the instruction to set the viewport
first and stay in character, and a hard cap (~25 browser actions) so a stuck persona
reports "stuck" instead of wandering. Enforce with `schema`:

```json
{
"type": "object",
"properties": {
"persona": {"type": "string"},
"verdict": {"type": "string", "enum": ["completed", "completed-with-friction", "stuck", "abandoned"]},
"path": {"type": "array", "items": {"type": "string"}},
"findings": {"type": "array", "items": {"type": "object", "properties": {
"url": {"type": "string"},
"attempted": {"type": "string"},
"friction": {"type": "string", "enum": ["dead-end", "cant-find", "cropped-content", "tap-target", "lost-context", "unexplained-term", "misleading-label", "no-feedback"]},
"description": {"type": "string"},
"suggestion": {"type": "string"}
}, "required": ["url", "attempted", "friction", "description", "suggestion"]}},
"summary": {"type": "string"}
},
"required": ["persona", "verdict", "path", "findings", "summary"]
}
```

`attempted` is mandatory by design: a finding without the action it blocked is an
opinion, and opinions are out of scope.

## Phase 3: Collate and present

Merge findings; when screens and personas flag the same spot, say so (highest
confidence). Lead with `broken`/`stuck`. Cross-reference open `area:ui` issues
(`gh-issue list`) — mark findings that are already filed instead of re-reporting them.

End with verdict tables (`—` = not run):

| Page | broken | degraded | polish |
|------|--------|----------|--------|

| Persona | Verdict | Findings |
|---------|---------|----------|

Then ask the user: fix the quick wins inline, file issues for the rest, or both.
Tear down the demo stack when done (orphaned stacks thrash the machine).

## Design decisions

**Why a live demo instead of mocks?** Mock data renders idealized states; the demo's
example apps produce real tracebacks, real timing data, and a deliberately failing job.
Realism is load-bearing: a persona chasing fake data reports fake friction.

**Why personas are tasks, not page lists.** Per-page review can't see between pages —
and the costliest UI bugs (hidden pages, dead ends, lost context) live between pages.

**Why findings must cite a design rule or an attempted action.** LLM reviewers
hallucinate taste-based findings under pressure to produce output. Anchoring every
finding to `DESIGN_RULES.md`/`tokens.css` (screens) or a blocked action (personas) makes
findings checkable and keeps "I'd have used more padding" out of the report.

**Why sequential personas.** One shared Playwright MCP browser. Three sequential
personas cost ~15 minutes; debugging two agents interleaving navigation costs more.
83 changes: 83 additions & 0 deletions .claude/skills/ui-qa/references/harness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# UI QA Harness — Demo Stack Lifecycle

Everything in this skill runs against the demo environment, not mocks. The demo gives
agents real behavior: a live HA container, the example apps generating activity and a
deliberate failure (`demo_stimulator.sensor_health_check`), and a Vite dev server with
hot reload so CSS/TSX edits apply without rebuilds.

## Starting

```bash
# From the repo root (requires Docker; takes 60–90s)
uv run python scripts/hassette_demo.py
```

Run it in the background and poll its output for readiness. It prints machine-parseable
lines when up:

```text
DEMO_HA_URL=http://localhost:NNNNN
DEMO_HASSETTE_URL=http://localhost:NNNNN
DEMO_FRONTEND_URL=http://localhost:NNNNN
DEMO_HASSETTE_LOG=/tmp/hassette-demo-XXXX/hassette.log
DEMO_VITE_LOG=/tmp/hassette-demo-XXXX/vite.log
DEMO_READY=true
```

Use `DEMO_FRONTEND_URL` for all browser work. `DEMO_HASSETTE_URL` is the REST API
(useful for `/api/health`, app start/stop).

## Gotchas (each of these has burned a session)

- **Stale telemetry**: `.demo-data/` persists between runs. If the dashboard shows
hours-old errors or inflated counts, stop the stack, `rm -rf .demo-data`, restart.
- **Editing app source requires a stack restart.** Reloading a failed app via
`POST /api/apps/{key}/reload` re-runs the cached module — the traceback will show the
*new* source lines while executing the *old* code (#1005). Do not chase that ghost;
restart the whole demo script.
- **Failure data takes ~2 minutes.** `demo_stimulator`'s failing job needs a few cycles
before error spotlights, sparklines, and log volume look representative. Don't
screenshot or dispatch personas immediately at `DEMO_READY`.
- **Theme is localStorage**, key `hassette:theme`, value `"light"` or `"dark"`
(JSON-encoded string — the quotes are part of the value).
- **Teardown**: SIGTERM the script and it cleans up Vite, hassette, and the HA
container. If a `hassette-demo-ha-*` container survives, `docker rm -f` it.

## Screenshot matrix

```bash
uv run python tools/frontend/ui_qa_capture.py --base-url $DEMO_FRONTEND_URL --output-dir $TMPDIR/shots
```

Captures pages × viewports (320/375/768/900/1280) × themes. Filter with `--pages`,
`--viewports`, `--themes` when the change under review is scoped (e.g. only
`--pages logs --viewports 320 375` for a mobile logs fix). A full matrix is ~70 images —
scope it unless the request is a full audit.

The breakpoints matter: 768px and 900px are the responsive boundaries
(`frontend/DESIGN_RULES.md`), 320px is the floor, 375px is the standard phone, 1280px is
desktop.

## Project context for analysis agents

Feed analysis subagents these sources rather than letting them invent design opinions:

| Source | What it defines |
|--------|-----------------|
| `frontend/DESIGN_RULES.md` | Responsive rules, table behavior, density, hierarchy |
| `frontend/src/tokens.css` | All design tokens — anything not derived from these is a finding |
| `design/interface-design/` | Design system specification |

## Verification battery (after any fix)

```bash
cd frontend && npx tsc --noEmit && npm run lint && npx prettier --check 'src/**/*.{ts,tsx,css}' && npx vitest run
uv run python tools/frontend/check_global_css_allowlist.py
uv run python tools/frontend/check_dead_global_css.py
uv run python tools/frontend/check_css_module_globals.py
uv run python tools/frontend/check_undefined_css_refs.py
timeout 580 uv run pytest -m e2e -n auto
```

Two e2e drawer-backdrop tests are flaky under `-n auto` (#1006) — rerun failures in
isolation before treating them as regressions.
75 changes: 75 additions & 0 deletions .claude/skills/ui-qa/references/personas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# UI QA Personas

Each persona is a *task*, not a page list. The agent gets a goal and discovers the UI the
way a real user would — that's what surfaces navigation dead ends and missing
affordances that per-page screenshot critique cannot. (Both headline findings of the
first UI polish session — mobile column cropping and the hidden diagnostics page — were
persona-findable and not guard-script-findable.)

Personas drive the live demo UI through Playwright. They report friction, not opinions:
every finding must name the action they attempted and what blocked or slowed it.

---

## Morgan — the 2am phone responder

**Who**: Runs Hassette at home. An automation misbehaved while they were asleep; they're
checking from their phone, in bed, annoyed.

**Viewport**: 375×812. Never resize. Touch-style interaction — no hover.

**Task**: "Something woke you up that shouldn't have. Find out which automation is
failing, what the error is, and when it last fired. You'll fix the code tomorrow — right
now you just want to know what broke and whether you can stop it from your phone."

**Friction lens**: Anything that requires horizontal scrolling, tap targets that are easy
to miss, content cropped or truncated past comprehension, information that takes more
than ~3 taps to reach, actions (stop/reload) that aren't reachable on mobile.

---

## Riley — the new user on day one

**Who**: Just installed Hassette following the getting-started guide, copied the example
apps, opened the web UI for the first time. No mental model of the page structure yet.

**Viewport**: 1280×800.

**Task**: "You just started Hassette. Confirm everything is healthy — the framework
itself and your apps. Then figure out what each running app actually does and when it
last did anything. You don't know what any page is called; explore."

**Friction lens**: Unexplained jargon (handlers? invocations? listeners?), pages or data
whose purpose isn't self-evident, empty/zero states that read as broken, anything where
Riley can't tell whether what they see is good news or bad news.

---

## Devon — the power user mid-debug

**Who**: Has run Hassette for months, writes their own apps, comfortable with logs and
tracebacks. One handler is failing and Devon wants the full picture before opening an
editor.

**Viewport**: 1280×800. Uses keyboard shortcuts when offered.

**Task**: "`sensor_health_check` in demo_stimulator is failing. Build the complete story:
the exception and traceback, how often and since when it fails, whether other handlers in
the app are affected, and the log lines around a recent failure. Move between related
views — handler detail, app detail, logs — and note every place where the next hop is
missing or makes you re-enter context (re-filter, re-search, re-navigate)."

**Friction lens**: Dead ends between related data, lost filter/context on navigation,
missing links from an entity to its logs/executions, information that exists somewhere
but isn't linked from where you'd look for it.

---

## Persona selection

| Change under review | Personas |
|---------------------|----------|
| Mobile/responsive work | Morgan |
| New pages, navigation, information architecture | Riley + Devon |
| Error display, telemetry, log views | Devon |
| Full audit / "how's the UI?" | All three |
Binary file modified docs/_static/web_ui_app_detail_code.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_app_detail_config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_app_detail_handlers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_app_detail_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_apps.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_detail_command_palette.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_detail_log_drawer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_detail_sidebar.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_detail_status_bar.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/web_ui_diagnostics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_handlers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/web_ui_logs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
51 changes: 51 additions & 0 deletions docs/pages/web-ui/diagnostics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Check Framework Health

The Diagnostics page (sidebar > diagnostics) answers one question: is the framework
itself healthy? It covers Hassette's internal services, startup issues, and telemetry
pipeline health — the layer below your apps.

![Diagnostics page](../../_static/web_ui_diagnostics.png)

## Stats strip

The strip at the top summarizes the page in four numbers:

| Cell | Meaning |
|------|---------|
| services | Total internal services registered |
| running | Services currently in the `running` state — green when all are running, amber otherwise |
| boot issues | Problems detected during startup — red when non-zero |
| drops | Telemetry records dropped across all categories — amber when non-zero |

## Services

The services panel lists every internal service (Bus, Scheduler, Api, DatabaseService,
and the rest) as a compact grid. A healthy service shows only its name and a green dot —
status text appears when there is something to say.

Services that are not running sort to the top and span the full row, showing their status,
readiness phase, and — for a service in cooldown after repeated failures — when the
supervisor will retry. A failed service with a captured exception gets a "show exception"
toggle that expands the full traceback inline.

Service states update live over the WebSocket connection. When the connection drops, a
`stale` badge appears next to the panel heading and the data reflects the last known state.

## Boot issues

The boot issues panel appears only when startup produced warnings or errors — a missing
app directory, an app that failed to import, a config problem. Issues sort errors-first,
each with a label and detail text. A clean startup renders no panel; the stats strip's
zero is the confirmation.

## Telemetry health

The telemetry panel appears when the telemetry pipeline is degraded or has dropped
records. Drop counters are broken out by cause: buffer overflow, failed writes, drops
during shutdown, and error-handler failures. A degraded banner means writes may be
failing or the database is unavailable — some historical data may be missing.

## Related pages

- [Web UI Overview](index.md) — layout, navigation, and alert banners
- [Configure Health Checks](health-endpoints.md) — the REST endpoints behind this page
1 change: 1 addition & 0 deletions docs/pages/web-ui/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,5 @@ The **command palette** opens with Ctrl+K or Cmd+K. It jumps to pages, apps, han
- **[Debug a Failing Handler](debug-handler.md)**: find why a handler is not firing or is throwing errors
- **[Read and Filter Logs](logs.md)**: search, filter, and stream logs in real time
- **[Inspect Configuration and Code](inspect-config-code.md)**: view global and per-app config, read app source
- **[Check Framework Health](diagnostics.md)**: confirm internal services are running, see boot issues and telemetry drops
- **[Configure Health Checks](health-endpoints.md)**: choose the right endpoint for restart automation, traffic routing, or monitoring
17 changes: 6 additions & 11 deletions docs/screenshots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,12 @@
height: 1656
wait: 2000

- url: "http://localhost:{port}/diagnostics"
output: docs/_static/web_ui_diagnostics.png
width: 1400
height: 900
wait: 2000

# demo_stimulator has active handlers, errors, and timing data — more informative than motion_lights
- url: "http://localhost:{port}/apps/demo_stimulator/overview"
output: docs/_static/web_ui_app_detail_overview.png
Expand Down Expand Up @@ -140,17 +146,6 @@
wait_for: "document.querySelector('[data-testid=\"cmd-palette\"]')"
javascript: "document.dispatchEvent(new KeyboardEvent('keydown', {key: 'k', ctrlKey: true, bubbles: true}))"

# Taller viewport so the popover (anchored to the bottom-right grid button) has room
- url: "http://localhost:{port}/logs"
output: docs/_static/web_ui_detail_column_picker.png
selector: "[data-testid='column-picker-popover']"
padding: 8
width: 1400
height: 1100
wait: 2000
wait_for: "document.querySelector('[data-testid=\"column-picker-popover\"]')"
javascript: "document.querySelector('[data-testid=\"column-picker\"]')?.click()"

- url: "http://localhost:{port}/apps/demo_stimulator/overview"
output: docs/_static/web_ui_detail_error_spotlight.png
selector: "[data-testid='overview-error-spotlight']"
Expand Down
Loading
Loading