NodeJSmith · NodeJSmith · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026
diff --git a/.claude/skills/ui-qa/SKILL.md b/.claude/skills/ui-qa/SKILL.md
@@ -0,0 +1,140 @@
+---
+name: ui-qa
+description: Use when the user says "ui qa", "visual QA the frontend", "polish pass on the UI", "run the UI personas", or after changes that affect rendering. Agent-driven QA of the web UI against the live demo stack — screenshot matrix critique plus task-driven persona walkthroughs.
+---
+
+# UI QA
+
+Agent-driven QA of the Hassette web UI against the live demo stack. Two modes that catch
+different bug classes: **screens** (screenshot matrix + design-rule critique — visual
+defects) and **personas** (task-driven walkthroughs — navigation dead ends and missing
+affordances). The CSS guard scripts in `tools/` already catch structural drift
+mechanically; this skill covers what only rendering and usage can reveal.
+
+Use when the user says: "ui qa", "visual QA the frontend", "polish pass on the UI",
+"run the UI personas", "review the UI", or after any change that affects rendering.
+
+## Arguments
+
+- empty or `all` — both modes, full scope
+- `screens [pages...]` — screenshot critique only, optionally scoped to pages
+- `personas [names...]` — walkthroughs only, optionally scoped to personas
+- a description of a change ("the logs table at mobile") — infer the scope: pick
+  affected pages/viewports and the persona selection table in `references/personas.md`
+
+## Phase 1: Environment
+
+Read `references/harness.md` and follow it: start the demo stack in the background, wait
+for `DEMO_READY=true`, then **wait ~2 minutes more** for failure/activity data before
+capturing anything. Get a tmpdir via `get-skill-tmpdir ui-qa`.
+
+Skip the wait only when the stack is already running from earlier in the session.
+
+## Phase 2a: Screens mode
+
+1. Capture the matrix (scoped — full matrix only for explicit full audits):
+
+   ```bash
+   uv run python tools/frontend/ui_qa_capture.py --base-url $DEMO_FRONTEND_URL --output-dir $TMPDIR/shots [--pages ...] [--viewports ...]
+   ```
+
+2. Dispatch one **Sonnet** analysis subagent per page (all viewports/themes of that page
+   to one agent, so it can compare breakpoints). The prompt names the screenshot files,
+   tells the agent to Read them, and includes the paths to `frontend/DESIGN_RULES.md` and
+   `frontend/src/tokens.css` as the standard to judge against — findings must cite a
+   rule or token, not taste. Enforce with `schema`:
+
+   ```json
+   {
+     "type": "object",
+     "properties": {
+       "page": {"type": "string"},
+       "findings": {"type": "array", "items": {"type": "object", "properties": {
+         "viewport": {"type": "string"},
+         "theme": {"type": "string"},
+         "severity": {"type": "string", "enum": ["broken", "degraded", "polish"]},
+         "description": {"type": "string"},
+         "design_rule": {"type": "string"},
+         "suggestion": {"type": "string"}
+       }, "required": ["viewport", "theme", "severity", "description", "design_rule", "suggestion"]}}
+     },
+     "required": ["page", "findings"]
+   }
+   ```
+
+   Severity: **broken** = content unusable (cropped, overlapping, unreadable);
+   **degraded** = works but violates a stated design rule; **polish** = defensible but
+   improvable. An agent reporting zero findings for a page is a valid result — do not
+   prompt for a minimum count (that manufactures findings).
+
+   Screenshots are the sweep, not a wall: when a finding hinges on behavior a static
+   image can't show (does truncated text expand on tap? does this region scroll?), the
+   analysis agent should load the live page via Playwright and check — include
+   `DEMO_FRONTEND_URL` in its prompt. Same shared-browser constraint as personas:
+   agents that go interactive must run sequentially.
+
+## Phase 2b: Personas mode
+
+1. Read `references/personas.md`; select personas per its table (or the user's scope).
+2. Dispatch personas **sequentially, not in parallel** — they share the one Playwright
+   MCP browser, and parallel agents fight over it. Each subagent prompt contains: the
+   persona block verbatim, `DEMO_FRONTEND_URL`, the instruction to set the viewport
+   first and stay in character, and a hard cap (~25 browser actions) so a stuck persona
+   reports "stuck" instead of wandering. Enforce with `schema`:
+
+   ```json
+   {
+     "type": "object",
+     "properties": {
+       "persona": {"type": "string"},
+       "verdict": {"type": "string", "enum": ["completed", "completed-with-friction", "stuck", "abandoned"]},
+       "path": {"type": "array", "items": {"type": "string"}},
+       "findings": {"type": "array", "items": {"type": "object", "properties": {
+         "url": {"type": "string"},
+         "attempted": {"type": "string"},
+         "friction": {"type": "string", "enum": ["dead-end", "cant-find", "cropped-content", "tap-target", "lost-context", "unexplained-term", "misleading-label", "no-feedback"]},
+         "description": {"type": "string"},
+         "suggestion": {"type": "string"}
+       }, "required": ["url", "attempted", "friction", "description", "suggestion"]}},
+       "summary": {"type": "string"}
+     },
+     "required": ["persona", "verdict", "path", "findings", "summary"]
+   }
+   ```
+
+   `attempted` is mandatory by design: a finding without the action it blocked is an
+   opinion, and opinions are out of scope.
+
+## Phase 3: Collate and present
+
+Merge findings; when screens and personas flag the same spot, say so (highest
+confidence). Lead with `broken`/`stuck`. Cross-reference open `area:ui` issues
+(`gh-issue list`) — mark findings that are already filed instead of re-reporting them.
+
+End with verdict tables (`—` = not run):
+
+| Page | broken | degraded | polish |
+|------|--------|----------|--------|
+
+| Persona | Verdict | Findings |
+|---------|---------|----------|
+
+Then ask the user: fix the quick wins inline, file issues for the rest, or both.
+Tear down the demo stack when done (orphaned stacks thrash the machine).
+
+## Design decisions
+
+**Why a live demo instead of mocks?** Mock data renders idealized states; the demo's
+example apps produce real tracebacks, real timing data, and a deliberately failing job.
+Realism is load-bearing: a persona chasing fake data reports fake friction.
+
+**Why personas are tasks, not page lists.** Per-page review can't see between pages —
+and the costliest UI bugs (hidden pages, dead ends, lost context) live between pages.
+
+**Why findings must cite a design rule or an attempted action.** LLM reviewers
+hallucinate taste-based findings under pressure to produce output. Anchoring every
+finding to `DESIGN_RULES.md`/`tokens.css` (screens) or a blocked action (personas) makes
+findings checkable and keeps "I'd have used more padding" out of the report.
+
+**Why sequential personas.** One shared Playwright MCP browser. Three sequential
+personas cost ~15 minutes; debugging two agents interleaving navigation costs more.
diff --git a/.claude/skills/ui-qa/references/harness.md b/.claude/skills/ui-qa/references/harness.md
@@ -0,0 +1,83 @@
+# UI QA Harness — Demo Stack Lifecycle
+
+Everything in this skill runs against the demo environment, not mocks. The demo gives
+agents real behavior: a live HA container, the example apps generating activity and a
+deliberate failure (`demo_stimulator.sensor_health_check`), and a Vite dev server with
+hot reload so CSS/TSX edits apply without rebuilds.
+
+## Starting
+
+```bash
+# From the repo root (requires Docker; takes 60–90s)
+uv run python scripts/hassette_demo.py
+```
+
+Run it in the background and poll its output for readiness. It prints machine-parseable
+lines when up:
+
+```text
+DEMO_HA_URL=http://localhost:NNNNN
+DEMO_HASSETTE_URL=http://localhost:NNNNN
+DEMO_FRONTEND_URL=http://localhost:NNNNN
+DEMO_HASSETTE_LOG=/tmp/hassette-demo-XXXX/hassette.log
+DEMO_VITE_LOG=/tmp/hassette-demo-XXXX/vite.log
+DEMO_READY=true
+```
+
+Use `DEMO_FRONTEND_URL` for all browser work. `DEMO_HASSETTE_URL` is the REST API
+(useful for `/api/health`, app start/stop).
+
+## Gotchas (each of these has burned a session)
+
+- **Stale telemetry**: `.demo-data/` persists between runs. If the dashboard shows
+  hours-old errors or inflated counts, stop the stack, `rm -rf .demo-data`, restart.
+- **Editing app source requires a stack restart.** Reloading a failed app via
+  `POST /api/apps/{key}/reload` re-runs the cached module — the traceback will show the
+  *new* source lines while executing the *old* code (#1005). Do not chase that ghost;
+  restart the whole demo script.
+- **Failure data takes ~2 minutes.** `demo_stimulator`'s failing job needs a few cycles
+  before error spotlights, sparklines, and log volume look representative. Don't
+  screenshot or dispatch personas immediately at `DEMO_READY`.
+- **Theme is localStorage**, key `hassette:theme`, value `"light"` or `"dark"`
+  (JSON-encoded string — the quotes are part of the value).
+- **Teardown**: SIGTERM the script and it cleans up Vite, hassette, and the HA
+  container. If a `hassette-demo-ha-*` container survives, `docker rm -f` it.
+
+## Screenshot matrix
+
+```bash
+uv run python tools/frontend/ui_qa_capture.py --base-url $DEMO_FRONTEND_URL --output-dir $TMPDIR/shots
+```
+
+Captures pages × viewports (320/375/768/900/1280) × themes. Filter with `--pages`,
+`--viewports`, `--themes` when the change under review is scoped (e.g. only
+`--pages logs --viewports 320 375` for a mobile logs fix). A full matrix is ~70 images —
+scope it unless the request is a full audit.
+
+The breakpoints matter: 768px and 900px are the responsive boundaries
+(`frontend/DESIGN_RULES.md`), 320px is the floor, 375px is the standard phone, 1280px is
+desktop.
+
+## Project context for analysis agents
+
+Feed analysis subagents these sources rather than letting them invent design opinions:
+
+| Source | What it defines |
+|--------|-----------------|
+| `frontend/DESIGN_RULES.md` | Responsive rules, table behavior, density, hierarchy |
+| `frontend/src/tokens.css` | All design tokens — anything not derived from these is a finding |
+| `design/interface-design/` | Design system specification |
+
+## Verification battery (after any fix)
+
+```bash
+cd frontend && npx tsc --noEmit && npm run lint && npx prettier --check 'src/**/*.{ts,tsx,css}' && npx vitest run
+uv run python tools/frontend/check_global_css_allowlist.py
+uv run python tools/frontend/check_dead_global_css.py
+uv run python tools/frontend/check_css_module_globals.py
+uv run python tools/frontend/check_undefined_css_refs.py
+timeout 580 uv run pytest -m e2e -n auto
+```
+
+Two e2e drawer-backdrop tests are flaky under `-n auto` (#1006) — rerun failures in
+isolation before treating them as regressions.
diff --git a/.claude/skills/ui-qa/references/personas.md b/.claude/skills/ui-qa/references/personas.md
@@ -0,0 +1,75 @@
+# UI QA Personas
+
+Each persona is a *task*, not a page list. The agent gets a goal and discovers the UI the
+way a real user would — that's what surfaces navigation dead ends and missing
+affordances that per-page screenshot critique cannot. (Both headline findings of the
+first UI polish session — mobile column cropping and the hidden diagnostics page — were
+persona-findable and not guard-script-findable.)
+
+Personas drive the live demo UI through Playwright. They report friction, not opinions:
+every finding must name the action they attempted and what blocked or slowed it.
+
+---
+
+## Morgan — the 2am phone responder
+
+**Who**: Runs Hassette at home. An automation misbehaved while they were asleep; they're
+checking from their phone, in bed, annoyed.
+
+**Viewport**: 375×812. Never resize. Touch-style interaction — no hover.
+
+**Task**: "Something woke you up that shouldn't have. Find out which automation is
+failing, what the error is, and when it last fired. You'll fix the code tomorrow — right
+now you just want to know what broke and whether you can stop it from your phone."
+
+**Friction lens**: Anything that requires horizontal scrolling, tap targets that are easy
+to miss, content cropped or truncated past comprehension, information that takes more
+than ~3 taps to reach, actions (stop/reload) that aren't reachable on mobile.
+
+---
+
+## Riley — the new user on day one
+
+**Who**: Just installed Hassette following the getting-started guide, copied the example
+apps, opened the web UI for the first time. No mental model of the page structure yet.
+
+**Viewport**: 1280×800.
+
+**Task**: "You just started Hassette. Confirm everything is healthy — the framework
+itself and your apps. Then figure out what each running app actually does and when it
+last did anything. You don't know what any page is called; explore."
+
+**Friction lens**: Unexplained jargon (handlers? invocations? listeners?), pages or data
+whose purpose isn't self-evident, empty/zero states that read as broken, anything where
+Riley can't tell whether what they see is good news or bad news.
+
+---
+
+## Devon — the power user mid-debug
+
+**Who**: Has run Hassette for months, writes their own apps, comfortable with logs and
+tracebacks. One handler is failing and Devon wants the full picture before opening an
+editor.
+
+**Viewport**: 1280×800. Uses keyboard shortcuts when offered.
+
+**Task**: "`sensor_health_check` in demo_stimulator is failing. Build the complete story:
+the exception and traceback, how often and since when it fails, whether other handlers in
+the app are affected, and the log lines around a recent failure. Move between related
+views — handler detail, app detail, logs — and note every place where the next hop is
+missing or makes you re-enter context (re-filter, re-search, re-navigate)."
+
+**Friction lens**: Dead ends between related data, lost filter/context on navigation,
+missing links from an entity to its logs/executions, information that exists somewhere
+but isn't linked from where you'd look for it.
+
+---
+
+## Persona selection
+
+| Change under review | Personas |
+|---------------------|----------|
+| Mobile/responsive work | Morgan |
+| New pages, navigation, information architecture | Riley + Devon |
+| Error display, telemetry, log views | Devon |
+| Full audit / "how's the UI?" | All three |
diff --git a/docs/_static/web_ui_app_detail_code.png b/docs/_static/web_ui_app_detail_code.png
diff --git a/docs/_static/web_ui_app_detail_config.png b/docs/_static/web_ui_app_detail_config.png
diff --git a/docs/_static/web_ui_app_detail_handlers.png b/docs/_static/web_ui_app_detail_handlers.png
diff --git a/docs/_static/web_ui_app_detail_overview.png b/docs/_static/web_ui_app_detail_overview.png
diff --git a/docs/_static/web_ui_apps.png b/docs/_static/web_ui_apps.png
diff --git a/docs/_static/web_ui_config.png b/docs/_static/web_ui_config.png
diff --git a/docs/_static/web_ui_detail_command_palette.png b/docs/_static/web_ui_detail_command_palette.png
diff --git a/docs/_static/web_ui_detail_log_drawer.png b/docs/_static/web_ui_detail_log_drawer.png
diff --git a/docs/_static/web_ui_detail_sidebar.png b/docs/_static/web_ui_detail_sidebar.png
diff --git a/docs/_static/web_ui_detail_status_bar.png b/docs/_static/web_ui_detail_status_bar.png
diff --git a/docs/_static/web_ui_diagnostics.png b/docs/_static/web_ui_diagnostics.png
diff --git a/docs/_static/web_ui_handlers.png b/docs/_static/web_ui_handlers.png
diff --git a/docs/_static/web_ui_logs.png b/docs/_static/web_ui_logs.png
diff --git a/docs/pages/web-ui/diagnostics.md b/docs/pages/web-ui/diagnostics.md
@@ -0,0 +1,51 @@
+# Check Framework Health
+
+The Diagnostics page (sidebar > diagnostics) answers one question: is the framework
+itself healthy? It covers Hassette's internal services, startup issues, and telemetry
+pipeline health — the layer below your apps.
+
+![Diagnostics page](../../_static/web_ui_diagnostics.png)
+
+## Stats strip
+
+The strip at the top summarizes the page in four numbers:
+
+| Cell | Meaning |
+|------|---------|
+| services | Total internal services registered |
+| running | Services currently in the `running` state — green when all are running, amber otherwise |
+| boot issues | Problems detected during startup — red when non-zero |
+| drops | Telemetry records dropped across all categories — amber when non-zero |
+
+## Services
+
+The services panel lists every internal service (Bus, Scheduler, Api, DatabaseService,
+and the rest) as a compact grid. A healthy service shows only its name and a green dot —
+status text appears when there is something to say.
+
+Services that are not running sort to the top and span the full row, showing their status,
+readiness phase, and — for a service in cooldown after repeated failures — when the
+supervisor will retry. A failed service with a captured exception gets a "show exception"
+toggle that expands the full traceback inline.
+
+Service states update live over the WebSocket connection. When the connection drops, a
+`stale` badge appears next to the panel heading and the data reflects the last known state.
+
+## Boot issues
+
+The boot issues panel appears only when startup produced warnings or errors — a missing
+app directory, an app that failed to import, a config problem. Issues sort errors-first,
+each with a label and detail text. A clean startup renders no panel; the stats strip's
+zero is the confirmation.
+
+## Telemetry health
+
+The telemetry panel appears when the telemetry pipeline is degraded or has dropped
+records. Drop counters are broken out by cause: buffer overflow, failed writes, drops
+during shutdown, and error-handler failures. A degraded banner means writes may be
+failing or the database is unavailable — some historical data may be missing.
+
+## Related pages
+
+- [Web UI Overview](index.md) — layout, navigation, and alert banners
+- [Configure Health Checks](health-endpoints.md) — the REST endpoints behind this page
diff --git a/docs/pages/web-ui/index.md b/docs/pages/web-ui/index.md
@@ -76,4 +76,5 @@ The **command palette** opens with Ctrl+K or Cmd+K. It jumps to pages, apps, han
 - **[Debug a Failing Handler](debug-handler.md)**: find why a handler is not firing or is throwing errors
 - **[Read and Filter Logs](logs.md)**: search, filter, and stream logs in real time
 - **[Inspect Configuration and Code](inspect-config-code.md)**: view global and per-app config, read app source
+- **[Check Framework Health](diagnostics.md)**: confirm internal services are running, see boot issues and telemetry drops
 - **[Configure Health Checks](health-endpoints.md)**: choose the right endpoint for restart automation, traffic routing, or monitoring
diff --git a/docs/screenshots.yml b/docs/screenshots.yml
@@ -84,6 +84,12 @@
   height: 1656
   wait: 2000
 
+- url: "http://localhost:{port}/diagnostics"
+  output: docs/_static/web_ui_diagnostics.png
+  width: 1400
+  height: 900
+  wait: 2000
+
 # demo_stimulator has active handlers, errors, and timing data — more informative than motion_lights
 - url: "http://localhost:{port}/apps/demo_stimulator/overview"
   output: docs/_static/web_ui_app_detail_overview.png
@@ -140,17 +146,6 @@
   wait_for: "document.querySelector('[data-testid=\"cmd-palette\"]')"
   javascript: "document.dispatchEvent(new KeyboardEvent('keydown', {key: 'k', ctrlKey: true, bubbles: true}))"
 
-# Taller viewport so the popover (anchored to the bottom-right grid button) has room
-- url: "http://localhost:{port}/logs"
-  output: docs/_static/web_ui_detail_column_picker.png
-  selector: "[data-testid='column-picker-popover']"
-  padding: 8
-  width: 1400
-  height: 1100
-  wait: 2000
-  wait_for: "document.querySelector('[data-testid=\"column-picker-popover\"]')"
-  javascript: "document.querySelector('[data-testid=\"column-picker\"]')?.click()"
-
 - url: "http://localhost:{port}/apps/demo_stimulator/overview"
   output: docs/_static/web_ui_detail_error_spotlight.png
   selector: "[data-testid='overview-error-spotlight']"