diff --git a/README.md b/README.md
index 2ee0933..3ec95a4 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,7 @@ AI skills for test management workflows with [Testomat.io](https://testomat.io).
 | Skill                  | Description                                                                                  |
 | ---------------------- | -------------------------------------------------------------------------------------------- |
 | `generate-test-cases`  | Generate test cases and checklists from requirements, tickets, or feature descriptions       |
+| `create-test-cases`    | Deep, feature-first manual test-case authoring with UI exploration, AC baselines, self-review gates, and worked examples |
 | `improve-test-cases`   | Analyze and improve existing markdown test cases for clarity                                 |
 | `find-duplicate-cases` | Find duplicate, near-duplicate, and overlapping test cases                                   |
 | `sync-cases`           | Synchronize Markdown test scenarios between local project and Testomat.io                    |
diff --git a/plugins/test-management/skills/create-test-cases b/plugins/test-management/skills/create-test-cases
new file mode 120000
index 0000000..ee54280
--- /dev/null
+++ b/plugins/test-management/skills/create-test-cases
@@ -0,0 +1 @@
+../../../skills/create-test-cases
\ No newline at end of file
diff --git a/skills/create-test-cases/SKILL.md b/skills/create-test-cases/SKILL.md
new file mode 100644
index 0000000..13f57f1
--- /dev/null
+++ b/skills/create-test-cases/SKILL.md
@@ -0,0 +1,174 @@
+---
+name: create-test-cases
+description: Create manual test cases as local MD files (does not publish). MUST USE when asked to "create test cases", "cover feature with tests", "write test cases", generate checklist, or expand test coverage. Two-phase flow: feature phase runs once (gathers shared UI + AC baseline + destructuring map), sub-feature phase runs per suite (generates deep-coverage test cases). After generation, user publishes with a separate skill (e.g. `sync-cases`, or the local `/publish-test-cases-batch`).
+license: MIT
+metadata:
+  author: Testomat.io
+allowed-tools: Read Grep Glob Write Bash Agent ToolSearch mcp__playwright__* mcp__testomatio__* mcp__github__*
+effort: max
+---
+
+# CREATE-TEST-CASES SKILL
+
+Creates manual test cases as local MD files. **Does not publish** — publishing is a separate skill (`/publish-test-cases-batch` — local for now, not yet part of this repo; alternatively use [`sync-cases`](../sync-cases/SKILL.md)) invoked after MD is approved.
+
+**Argument:** $ARGUMENTS — Testomat suite URL or feature name
+
+**References:** [intake-questionnaire.md](intake-questionnaire.md) | [artifacts.md](references/artifacts.md) | [testing-strategy.md](references/testing-strategy.md) | [test-case-format.md](references/test-case-format.md) | [self-review-checks.md](references/self-review-checks.md) | [product-context.md](references/product-context.md) | [destructuring.md](references/destructuring.md) (feature-phase procedure) | [troubleshooting.md](references/troubleshooting.md)
+
+**Step contract template** — every step file declares this block so you can see at a glance what must exist before, what is produced, and whether it's safe to re-run:
+- **Precondition:** artifacts that must exist and validate (refers to IDs in [artifacts.md](references/artifacts.md))
+- **Input:** values consumed from earlier artifacts / user
+- **Output:** artifacts written (with ID)
+- **Postcondition:** invariant that must hold after the step completes
+- **Idempotent:** yes / no — can this step be re-run safely?
+- **Retry policy:** what to do on transient failure; when to escalate
+
+---
+
+## Hierarchy Model — Feature-First
+
+| Concept | Testomat Entity | Example | Branch |
+|---------|----------------|---------|--------|
+| **Feature** (main functionality) | Folder | Manual Test Execution | — |
+| **Sub-feature** (part of feature) | Suite | Environment Configuration | `tc/{feature}/{suite-slug}` |
+| **Sub-feature section** (mandatory when 2+ natural sections of ≥3 tests exist) | Nested suite | CRUD Operations (inside Environment Configuration) | same branch |
+| **Test case** | Test | Select custom environment for a run | inside suite branch |
+
+**Feature-first architecture:** heavy gather-info runs ONCE per feature (Step 1 Feature Phase) and produces shared artifacts (`_ac-baseline.md`, `_shared-ui.md`, `_existing-steps.md`, `destructuring.md`). Each sub-feature then runs in its own conversation as a thin slice (Step 2) + generation (Step 3) + report (Step 4). This eliminates the AC/UI duplication that previously happened across sub-features.
+
+**Key rules:**
+- **Folder = feature.** One folder per feature. The folder is created at publish time by your publisher of choice (e.g. local `/publish-test-cases-batch`, or [`sync-cases`](../sync-cases/SKILL.md)) — this skill only lays out the local `test-cases/{feature}/` directory.
+- **Suite = sub-feature.** Each suite is a self-contained area of the feature with **full, deep coverage** at the depth chosen by Q2. See § Suite Depth Expectation.
+- **Nested suites** are used when a sub-feature has multiple logical sections. Full rule: [self-review-checks.md § 8](references/self-review-checks.md#8-sub-suite-distribution).
+- **Two-phase flow (always):** the feature phase (Step 1) runs once and HARD STOPs. Each sub-feature then runs the sub-feature phase (Steps 2-4) in a separate conversation. There is no "single-suite mode" — even a feature with one sub-feature runs the feature phase first, because the `_shared-ui.md` and `_ac-baseline.md` artifacts carry value even for a single suite.
+- **MD only — publishing is a separate skill.** This skill never pushes to Testomat. After Step 4 the user runs the publisher (local `/publish-test-cases-batch` — not yet part of this repo; or [`sync-cases`](../sync-cases/SKILL.md)) with the saved path.
+- **Whole-feature publish:** when all sub-feature checkboxes in `destructuring.md` flip to `[x]`, the user runs `/publish-test-cases-batch test-cases/{feature}/` (or equivalent) to push everything (one branch per suite: `tc/{feature}/{suite-slug}`).
+- **Style carryover.** After the FIRST sub-feature is approved, Step 4 writes `test-cases/{F}/_style.md`. Subsequent sub-features read it FIRST in Step 3 Phase 2 so the whole feature stays stylistically consistent. See [references/artifacts.md § A-style](references/artifacts.md).
+
+### Sub-suite Distribution Rule
+
+Applied at end of Phase 1 Grouping pass — fully automatic, no user prompt. **Full rule, matrix, anti-patterns, and "what counts as a natural section":** [self-review-checks.md § 8](references/self-review-checks.md#8-sub-suite-distribution) (single source of truth).
+
+Short version: if **≥ 2 natural sections of ≥ 3 tests each** exist (after auto-merging undersized sections) → MUST be nested (directory of MD files). Otherwise flat. Total test count is not part of the decision.
+
+### Suite Depth Expectation
+
+Each sub-feature must receive full, deep coverage — not a thin slice. The feature phase (Step 1) provides `_ac-baseline.md` and `_shared-ui.md`. Each sub-feature phase adds its AC delta (Step 2.1), UI delta (Step 2.2), and generates tests (Step 3) that satisfy:
+
+- **AC coverage:** every AC in the combined pool — baseline ACs applicable to this sub-feature ∪ delta ACs — has ≥ 1 test citing it via `source:` (Gate 1, blocking)
+- **Cross-cutting coverage:** every concern in destructuring.md that affects this sub-feature has ≥ 1 dedicated test (Gate 1b, blocking)
+- **UI element coverage:** ≥ 80% cataloged elements (across shared + delta) appear in action steps (not only preconditions)
+- **Scenario balance:** negative ≥ 20%, boundary ≥ 10%, happy ≤ 50% — thresholds are **shapes**, not quotas
+- **Q2 shape:** smoke = happy paths only; balanced = + negatives + key edges; regression = + boundaries + state transitions + role combos
+
+**Test count is driven by coverage, not by a range.** Do NOT pad to hit a number. Do NOT cap to stay under a number. If a balanced sub-feature naturally produces 8 tests or 40 tests — both are correct as long as the points above are satisfied. Q2 picks the **shape** of coverage, not the size.
+
+---
+
+## Checklist & Dispatch Table
+
+Every step lives in its own file under `steps/`. SKILL.md is a router: Resume Detector below determines the entry point; then load the step file for the current step only.
+
+| # | Step | File | Key output | Phase |
+|---|---|---|---|---|
+| ☐ | **Resume Detector** (below) | this file | `$RESUME_FROM` step pointer | — |
+| 0 | Intake Questionnaire | [steps/00-intake.md](steps/00-intake.md) | A1 intake.md | feature |
+| 1 | Feature Phase (heavy gather once) | [steps/01-feature-phase.md](steps/01-feature-phase.md) | A-shared-ui + A-base + A-steps + A2 | feature |
+| 2 | Sub-feature Slice (thin) | [steps/10-sub-feature-slice.md](steps/10-sub-feature-slice.md) | A3 + A4 + A5 | sub-feature |
+| 3 | Generate & Approve Test Cases | [steps/20-generate.md](steps/20-generate.md) | A6 test-cases.md | sub-feature |
+| 4 | Final Report & Handoff | [steps/40-report.md](steps/40-report.md) | report printed; A2 progress updated; A-style captured on first approval; publish handoff | sub-feature |
+
+**HARD STOPs:** Step 1 ends with HARD STOP (feature phase → new conversation for first sub-feature). Step 4 ends with HARD STOP (sub-feature done → new conversation for next sub-feature).
+
+**Publishing is NOT part of this skill.** After Step 4, run your publisher — local `/publish-test-cases-batch {path}` (separate skill — not yet part of this repo) or [`sync-cases`](../sync-cases/SKILL.md).
+
+**Loading rule:** Read only the step file for the step you're currently executing, plus any referenced `references/*.md` files needed by its contract. Do NOT eagerly Read all step files at start of run.
+
+**Section-map loading (mandatory for large refs):** `references/testing-strategy.md`, `references/self-review-checks.md`, and `references/artifacts.md` each begin with a `## Section map` table listing the **START/END anchor patterns** (regex) for every section. When you need ONE section (e.g. "§ 2.3 balance thresholds" or "Gate 11"):
+
+```bash
+awk '/^START_PATTERN/,/^END_PATTERN/' path/to/ref.md
+```
+
+Copy the two patterns from the section map, paste into the awk one-liner, run via Bash. This is the primary context-window savings in Step 2. **Never Read the whole file** when one section is enough. Anchors are immune to line-number drift — the section map stays valid as long as section headers keep their text.
+
+---
+
+## Pre-step: Resume Detector
+
+Runs BEFORE Step 0 on every invocation. Single authoritative check replacing scattered "if exists skip" mentions elsewhere.
+
+### Contract
+- **Precondition:** none
+- **Input:** `$ARGUMENTS` (feature and optionally sub-feature); current working directory has `test-cases/` root
+- **Output:** detector table printed to user; `$RESUME_FROM` step pointer in working memory
+- **Postcondition:** user has seen which artifacts exist + chosen resume / restart / change path
+- **Idempotent:** yes — pure read-only probe
+- **Retry policy:** none — read errors bubble up as "artifact unreadable"
+
+### Procedure
+
+1. Derive `F` (feature-slug) and, if known from $ARGUMENTS, `S` (suite-slug from destructuring map)
+2. Run the cheap validation commands from [artifacts.md](references/artifacts.md) for each artifact. **Do NOT Read the bodies** — only `head -20` or grep-based checks
+3. Compute state via this decision tree (top-down, first match wins):
+
+| State | Trigger | Route |
+|---|---|---|
+| `empty` | A1 missing | Step 0 intake |
+| `feature-partial` | A1 valid but any of A-shared-ui / A-base / A2 missing or invalid | Step 1 (feature phase) — resume at first missing substep |
+| `feature-done, no sub-feature picked` | All feature artifacts valid, no `S` given and no unchecked sub-feature selected | Ask user which sub-feature to start (show map) |
+| `feature-done, all sub-features [x]` | A2 has all rows `[x]` | Suggest `/publish-test-cases-batch test-cases/{F}/` |
+| `sub-feature-partial` | `S` resolved; some of A3 / A4 / A5 missing or invalid | Step 2 — resume at first missing substep |
+| `sub-feature-ready` | A3-A5 valid, A6 missing | Step 3 (generate) |
+| `sub-feature-done` | A6 exists and valid | Ask: regenerate / next sub-feature / stop |
+
+4. Emit a **two-section** table to the user — feature-level + sub-feature-level:
+
+```
+Resume Detector — test-cases/{F} (sub-feature: {S or "—"})
+
+Feature level:
+| Artifact                  | Exists | Valid |
+|---------------------------|--------|-------|
+| A1 intake.md              |   Y    |   Y   |
+| A-shared-ui _shared-ui.md |   Y    |   Y   |
+| A-base _ac-baseline.md    |   Y    |   Y   |
+| A-steps _existing-steps.md|   Y    |   Y   |
+| A2 destructuring.md       |   Y    |   Y   |
+| A-style _style.md         |   -    |   -   |
+
+Sub-feature level (S={S}):
+| Artifact              | Exists | Valid |
+|-----------------------|--------|-------|
+| A3 {S}-ac-delta.md    |   Y    |   Y   |
+| A4 {S}-ui-delta.md    |   Y    |   Y   |
+| A5 {S}-scope.md       |   Y    |   Y   |
+| A6 {S}-test-cases.md  |   N    |   -   |
+| A7 {S}-review.md      |   -    |   -   |
+
+Suggested resume point: Step 3 (generate)
+  • feature + sub-feature slice complete — A6 missing
+```
+
+5. Ask user (one line):
+```
+[Y] Resume from Step {N}   [R] Restart from Step 0   [P] Pick a different sub-feature   [B] Abort
+```
+
+6. Route:
+   - `Y` → Read the step file for Step {N} and continue
+   - `R` → DANGEROUS. Ask user what scope of reset:
+     - `sub-feature` → delete A3-A7 under `test-cases/{F}/` for the current `S` only (keeps feature artifacts)
+     - `feature` → delete everything under `test-cases/{F}/` (asks explicit confirmation naming the path)
+     Only run `rm` after explicit confirmation.
+   - `P` → ask which sub-feature from A2 map, set new `S`, re-run detector
+   - `B` → exit skill
+
+**Auto-advance rule:** if feature is complete, `S` is unambiguously determined (argument matches a sub-feature in A2), and sub-feature-ready state applies → skip confirmation prompt and auto-resume at Step 3. Log: `Auto-resumed at Step 3 — baselines + delta + scope present.`
+
+---
+
+## Error Handling
+
+On any error (Playwright, push, MCP, partial run resume, blocking conditions) → see [troubleshooting.md](references/troubleshooting.md). It consolidates recoverable/blocking error matrix + Playwright browser traps + nested-suite parser gotchas. Load on demand, not eagerly.
diff --git a/skills/create-test-cases/intake-examples.md b/skills/create-test-cases/intake-examples.md
new file mode 100644
index 0000000..30ab83c
--- /dev/null
+++ b/skills/create-test-cases/intake-examples.md
@@ -0,0 +1,56 @@
+# Intake Questionnaire — Example Filled Intakes
+
+Reference examples for [intake-questionnaire.md](intake-questionnaire.md) Part 3 template.
+
+---
+
+## Example A — Minimal (all defaults)
+
+User argument: `"покрити labels"`
+User answered: `"use defaults"`
+
+```markdown
+# Intake: labels
+
+**Date:** 2026-04-10
+
+## Answers
+
+| # | Question | Answer |
+|---|----------|--------|
+| Q1 | Feature / scope | labels (from argument) |
+| Q1.1 | Feature location | Settings → Labels & Fields |
+| Q1.2 | Sub-feature hint (optional) | pick after map |
+| Q2 | Coverage depth | balanced |
+| Q3 | Sources | UI only |
+| Q4 | Special focus | none |
+
+## Defaults applied
+All questions (user said "use defaults"). Step 1 will produce the sub-feature map; user picks `S` from it before Step 2.
+```
+
+---
+
+## Example B — Full interview
+
+User argument: `"cover the new label-filter feature"`
+
+```markdown
+# Intake: label-filter
+
+**Date:** 2026-04-10
+
+## Answers
+
+| # | Question | Answer |
+|---|----------|--------|
+| Q1 | Feature / scope | new label-filter in tests list |
+| Q1.1 | Feature location | Tests page → top filter bar (next to Tags filter) |
+| Q1.2 | Sub-feature hint (optional) | label-filter (verified against Step 1 map before Step 2) |
+| Q2 | Coverage depth | balanced |
+| Q3 | Sources | https://github.com/testomatio/docs/pull/412, https://figma.com/file/abc123/label-filter |
+| Q4 | Special focus | validation, edge cases |
+
+## Defaults applied
+none
+```
diff --git a/skills/create-test-cases/intake-questionnaire.md b/skills/create-test-cases/intake-questionnaire.md
new file mode 100644
index 0000000..c545da1
--- /dev/null
+++ b/skills/create-test-cases/intake-questionnaire.md
@@ -0,0 +1,177 @@
+# Intake Questionnaire
+
+Front-loaded interview that replaces scattered gates in Steps 2-3. Runs as Step 0 of the skill, before any gathering.
+
+**See also:** [intake-examples.md](intake-examples.md) — filled-in intake samples to learn the expected shape before running.
+
+**Purpose:**
+- Collect all preferences in one sitting so the skill runs without interruption until Step 3
+- Build an audit trail in `test-cases/{feature-slug}/intake.md`
+- Let the skill adapt its behavior based on user answers (sources, depth, focus, language)
+
+**Language rule:** questions are in English, user may answer in any language (English / Ukrainian / mix). Skill interprets natural answers.
+
+---
+
+## Part 1: Flow Rules
+
+### Core rules
+- Run once per invocation, before Step 1, after reading argument + memory
+- Ask **one question at a time** — wait for answer, then ask next
+- If a previous answer makes a later question unnecessary → skip silently
+- `"use defaults"` / `"за замовчуванням"` / `"just go"` → skip all, use defaults
+- `"not sure"` / `"не знаю"` → take the default silently, do NOT re-prompt
+
+### Smart-skip (fill defaults silently)
+
+| Skip | When |
+|---|---|
+| Q1 | Argument is a Testomat suite URL or clear feature name |
+| Q1.1 | Argument is an app URL, or memory has `project_{feature}_*.md` with a path |
+| Q1.2 | Argument names a specific sub-feature (e.g., "Manual Test Execution → Environment Configuration") — record as preferred starting point; still verified against map after Step 1 |
+| Q3 | User's argument already contains resource links |
+
+### Re-run handling
+If `test-cases/{feature-slug}/intake.md` exists, show summary and ask:
+
+```
+I found previous intake for {feature-name}:
+
+  Feature:   {Q1}     Coverage:  {Q2}
+  Focus:     {Q4}     Sources:   {Q3 links or "UI only"}
+  Saved:     {file mtime}
+
+Use these answers?  [Y] Reuse  [N] Start fresh  [E] Edit specific
+```
+
+---
+
+## Part 2: Questions
+
+### Q1: Feature / scope
+**Type:** free text
+**Default:** from argument if present
+**Prompt:**
+```
+Q1. What feature or scope are we covering?
+    (feature name, Testomat suite URL, or short description)
+```
+
+---
+
+### Q1.1: Feature location in the product
+**Type:** free text | **Default:** none — always ask unless argument is a URL
+**Prompt:**
+```
+Q1.1. Where in the product is this feature located?
+      (menu path, URL, or short navigation hint — e.g.
+      "Settings → Labels & Fields", "/projects/{slug}/runs", "right sidebar of test detail page")
+```
+Accept any form: menu path, URL, screen name. Multiple screens → comma-separated. Unknown → start from dashboard.
+
+---
+
+### Q1.2: Sub-feature (suite) hint (optional)
+**Type:** free text | **Default:** `pick after map` — Step 1 (feature phase) will produce a sub-feature map and you can choose from it
+**Prompt:**
+```
+Q1.2. Do you have a specific sub-feature in mind for the FIRST run? (optional)
+      (I'll run the feature phase first and produce a sub-feature map + shared artifacts.
+      After the map is approved, you pick ONE sub-feature per conversation — example:
+      "Environment Configuration".)
+
+      Answer with the sub-feature name, or "pick after map" / "не знаю" to decide later.
+```
+- `pick after map` / `suggest` / `не знаю` / empty → proceed to Step 1 (feature phase) and let the map drive the choice
+- A specific sub-feature name → recorded as a preferred starting point; still verified against the Step 1 map before Step 2 begins
+- Feature-first is always on — the feature phase produces `_shared-ui.md` + `_ac-baseline.md` + destructuring map even when the user knows which sub-feature they want, because those artifacts are reused by every subsequent sub-feature conversation
+
+---
+
+### Q2: Coverage depth
+**Type:** single choice | **Default:** `balanced`
+**Prompt:**
+```
+Q2. Coverage depth?
+    [1] Smoke — only happy paths, skip negatives and edge cases
+    [2] Balanced — happy + negatives + key edge cases, skip rare boundaries (default)
+    [3] Regression — everything incl. boundaries, state transitions, role combos
+    [4] Custom — tell me specifics
+
+    Q2 picks the SHAPE of coverage, not a target count. Each skill run covers ONE sub-feature in depth.
+```
+
+On `[4]` → ask: *"Describe what to include/exclude in 1-2 sentences."*
+
+Test count is driven by what's needed to fully cover the sub-feature (all ACs, scenario balance thresholds, UI elements in action steps) — there is no target range. If balanced naturally produces 8 tests or 40 tests, both are correct. Focus on what's included/excluded, not on hitting a count.
+
+---
+
+### Q3: Sources of truth
+
+**Type:** single open question
+**Default:** UI exploration only (Playwright MCP)
+**Prompt:**
+```
+Q3. Any docs or specs I should use as input?
+    Paste links (GitHub docs, Jira, Figma, Confluence — comma-separated), or "no" for UI-only.
+    UI exploration via Playwright is always included unless you say "no UI".
+```
+
+**Rules:**
+- UI exploration (Playwright) is always on by default — user must explicitly say "no UI" to disable
+- Record each provided link in the intake file
+- Fetch each link in Step 1.1 before AC extraction
+- If user provides no links and says "no UI" → prompt: *"I need at least one source. Can I at least explore the UI?"*
+
+---
+
+### Q4: Special focus
+**Type:** multi-select
+**Default:** none (balanced output)
+**Prompt:**
+```
+Q4. Any special focus areas? (pick 0 or more)
+    [1] Security & permissions — role checks, unauthorized access, data leaks
+    [2] Validation & error messages — exact texts, boundary values, required fields
+    [3] Edge cases & race conditions — double-clicks, concurrent actions, network drops
+    [4] Data persistence — survives refresh, re-login, navigate away and back
+    [5] Bulk operations — multi-select, bulk actions, large selections
+    [6] UI / UX — visual states, empty states, loading states
+    [7] None — balanced (default)
+
+    Answer with comma-separated numbers (e.g. "1, 2") or "none".
+```
+
+Multi-select is fine — multiple focus areas add emphasis to the generated checklist without excluding anything else.
+
+**Slug rules** (for suite directory naming): lowercase, spaces → hyphens, strip special chars.
+
+---
+
+## Part 3: Output Template
+
+Save to `test-cases/{feature-slug}/intake.md`:
+
+```markdown
+# Intake: {feature-name}
+
+**Date:** YYYY-MM-DD
+
+## Answers
+
+| # | Question | Answer |
+|---|----------|--------|
+| Q1 | Feature / scope | {free text} |
+| Q1.1 | Feature location | {menu path / URL / screen name(s)} |
+| Q1.2 | Sub-feature hint (optional) | {sub-feature name or "pick after map"} |
+| Q2 | Coverage depth | smoke (happy only) / balanced (+ negatives + key edges) / regression (+ boundaries + state + roles) / custom: "{text}" |
+| Q3 | Sources | {links or "UI only"} |
+| Q4 | Special focus | {list or "none"} |
+
+## Defaults applied
+{List defaults applied silently, or "none"}
+```
+
+Examples: [intake-examples.md](intake-examples.md)
+
diff --git a/skills/create-test-cases/references/artifacts.md b/skills/create-test-cases/references/artifacts.md
new file mode 100644
index 0000000..fd85ba5
--- /dev/null
+++ b/skills/create-test-cases/references/artifacts.md
@@ -0,0 +1,750 @@
+# Artifact State Machine
+
+Canonical source of truth for every file the skill produces and consumes. Every step in SKILL.md references this file instead of restating schemas.
+
+**Loading rule:** loaded by Step 0 Resume Detector once per run. Other steps may Read only the frontmatter of an artifact for cheap validation — do NOT re-Read this file per step.
+
+## Section map (anchor-based — stable across edits)
+
+Extract ONE section via a single `awk` call — never Read the whole file:
+
+```bash
+awk '/^START_PATTERN/,/^END_PATTERN/' .claude/skills/create-test-cases/references/artifacts.md
+```
+
+| Section | START pattern | END pattern (exclusive) |
+|---|---|---|
+| Artifact list (feature-level + sub-feature-level) | `^## Artifact list` | `^## A1 ` |
+| A1 intake.md (feature) | `^## A1 ` | `^## A2 ` |
+| A2 destructuring.md (feature) | `^## A2 ` | `^## A-base ` |
+| A-base _ac-baseline.md (feature) | `^## A-base ` | `^## A-shared-ui ` |
+| A-shared-ui _shared-ui.md (feature) | `^## A-shared-ui ` | `^## A-steps ` |
+| A-steps _existing-steps.md (feature) | `^## A-steps ` | `^## A3 ` |
+| A3 {S}-ac-delta.md (sub-feature) | `^## A3 ` | `^## A4 ` |
+| A4 {S}-ui-delta.md (sub-feature) | `^## A4 ` | `^## A5 ` |
+| A5 {S}-scope.md (sub-feature) | `^## A5 ` | `^## A6 ` |
+| A6 {S}-test-cases.md / {S}/*.md (sub-feature) | `^## A6 ` | `^## A7 ` |
+| A7 {S}-review.md (sub-feature) | `^## A7 ` | `^## A-style ` |
+| A-style _style.md (feature, captured after first approval) | `^## A-style ` | `^## State transitions` |
+| State transitions | `^## State transitions` | `^## Phase 3a gate-check` |
+| Phase 3a gate-check recipes (intro + vars) | `^## Phase 3a gate-check` | `^### Gate 1 ` |
+| Gate 1 AC coverage | `^### Gate 1 ` | `^### Gate 2 ` |
+| Gate 2 Priority pyramid | `^### Gate 2 ` | `^### Gate 3 ` |
+| Gate 3 Automation classification | `^### Gate 3 ` | `^### Gate 4 ` |
+| Gate 4 Manual-only rate | `^### Gate 4 ` | `^### Gate 5 ` |
+| Gate 5 Forbidden metadata | `^### Gate 5 ` | `^### Gate 6 ` |
+| Gate 6 Scope compliance | `^### Gate 6 ` | `^### Gate 7 ` |
+| Gate 7 Role names | `^### Gate 7 ` | `^### Gate 8 ` |
+| Gate 8 Min step count | `^### Gate 8 ` | `^### Gate 9 ` |
+| Gate 9 Forbidden `id:` | `^### Gate 9 ` | `^### Gate 10 ` |
+| Gate 10 Sub-suite distribution | `^### Gate 10 ` | `^### Gate 11 ` |
+| Gate 11 Impl leakage | `^### Gate 11 ` | `^### Gate 12 ` |
+| Gate 12 Parameterized `${col}` | `^### Gate 12 ` | `^## Resume Detector` |
+| Resume Detector input | `^## Resume Detector` | end-of-file (use `awk '/^## Resume Detector/{p=1} p'`) |
+
+---
+
+## Artifact list
+
+All paths are relative to repo root. `{F}` = feature-slug, `{S}` = suite-slug.
+
+**Feature-level** (`_` prefix = shared across sub-features). Produced once by the Feature Phase (Step 1). Consumed by every Sub-feature Phase run in the same feature.
+
+| # | Path | Producer | Consumers | Required? | Lifecycle |
+|---|------|----------|-----------|-----------|-----------|
+| A1 | `test-cases/{F}/intake.md` | Step 0 | all later steps | **Yes** | Read-only after Step 0. Re-written only on explicit user "start fresh" |
+| A2 | `test-cases/{F}/destructuring.md` | Step 1.1 (feature phase, ui-explorer mode: feature-baseline) | Step 2 (picks sub-feature row), Step 3 Phase 0 (cross-cutting), Step 4 (toggle `[x]`) | **Yes** | **Mutable** — Step 4 toggles `- [ ]` → `- [x]` per approved sub-feature |
+| A-base | `test-cases/{F}/_ac-baseline.md` | Step 1.2 (docs fetch) | Step 2.1 (filter applicable), Step 3 Phase 1 (checklist), test-case-reviewer Gate 1 | **Yes** | Read-only after Step 1.2. Can be manually expanded by user before proceeding |
+| A-shared-ui | `test-cases/{F}/_shared-ui.md` | Step 1.1 (ui-explorer mode: feature-baseline) | Step 2.2 (delta exploration baseline), Step 3 Phase 3b (ui-validator), Gate 11 impl-leak check | **Yes** | Read-only after Step 1.1. May be manually edited if user spots missing shared element |
+| A-steps | `test-cases/{F}/_existing-steps.md` | Step 1.3 (Testomat MCP) | Step 3 Phase 2 (test generation — reuses existing steps) | No (optional; skipped if Testomat MCP unavailable) | Read-only after Step 1.3 |
+| A-style | `test-cases/{F}/_style.md` | Step 4 (FIRST approved sub-feature only) | Step 3 Phase 2 of every SUBSEQUENT sub-feature in the same feature | No (only exists after at least 1 sub-feature approved) | Written once; NOT overwritten on subsequent approvals (user-editable) |
+
+**Sub-feature-level** — produced per sub-feature. Written under `test-cases/{F}/` with `{S}-` prefix (flat) or under a nested directory `test-cases/{F}/{S}/` (nested layout).
+
+| # | Path | Producer | Consumers | Required? | Lifecycle |
+|---|------|----------|-----------|-----------|-----------|
+| A3 | `test-cases/{F}/{S}-ac-delta.md` | Step 2.1 | Step 3 Phase 1 (checklist), Step 3 Phase 3a Gate 1 (combined with A-base) | **Yes** | Read-only after Step 2.1 |
+| A4 | `test-cases/{F}/{S}-ui-delta.md` | Step 2.2 (ui-explorer mode: sub-feature-delta) | Step 3 Phase 1/2, Step 3 Phase 3b (ui-validator, combined with A-shared-ui) | **Yes** | May be **appended** by gap-focused ui-explorer re-invocation (Step 2.x retry) |
+| A5 | `test-cases/{F}/{S}-scope.md` | Step 2.3 | Step 3 Phase 3a (scope compliance), test-case-reviewer subagent | **Yes** | Read-only after Step 2.3 |
+| A6 | `test-cases/{F}/{S}-test-cases.md` **OR** `test-cases/{F}/{S}/*.md` | Step 3 Phase 2 | Step 3 Phase 3a (mechanical), Phase 3b (ui-validator), Step 4, handoff to `/publish-test-cases-batch` | **Yes** | **Mutable** — Phase 3a auto-fix, Phase 3b in-place fixes, Phase 4 iteration |
+| A7 | `test-cases/{F}/{S}-review.md` | test-case-reviewer subagent (suites ≥15) | Phase 4 presentation | Only for ≥15-test suites | Read-only after subagent run |
+
+---
+
+## A1 — intake.md
+
+**Schema:** see [intake-questionnaire.md § Part 3 Output Template](../intake-questionnaire.md).
+
+**Cheap validation** (used by Resume Detector):
+```bash
+grep -q '^| Q1 |' test-cases/{F}/intake.md && \
+grep -q '^| Q1.2 |' test-cases/{F}/intake.md && \
+grep -q '^| Q2 |' test-cases/{F}/intake.md
+```
+All three must match. If any missing → intake is incomplete, re-run Step 0 with "Edit specific".
+
+**Sub-feature hint:** Q1.2 may contain a named starting sub-feature or `pick after map`. Either way, Step 1 (feature phase) runs first and Step 2 resolves `S` against the approved map.
+
+---
+
+## A2 — destructuring.md
+
+**Schema:** see [destructuring.md § Procedure step 4 & step 9](destructuring.md).
+
+**Producer:** Step 1.1 (feature phase). `ui-explorer` invoked with `mode: feature-baseline` traverses the feature holistically and returns BOTH `_shared-ui.md` (A-shared-ui) AND the sub-feature map written here. Cross-cutting concerns section is added in Step 1.4.
+
+**Cheap validation:**
+```bash
+grep -q '^## Cross-cutting concerns' test-cases/{F}/destructuring.md && \
+grep -q '^- \[[x ]\]' test-cases/{F}/destructuring.md
+```
+
+**Mutability:** Step 4 toggles `- [ ]` → `- [x]` for the just-completed sub-feature. Never edit anything else.
+
+**Terminal state:** all lines match `^- \[x\]` → user can run `/publish-test-cases-batch test-cases/{F}/` to push the whole feature.
+
+---
+
+## A-base — _ac-baseline.md (feature-level)
+
+**Purpose:** flat list of ACs extracted once from product docs/help center for the entire feature. Each sub-feature later references applicable AC IDs without re-extracting. Replaces per-sub-feature `{S}-ac-list.md` full body with a shared source.
+
+**Path:** `test-cases/{F}/_ac-baseline.md`
+
+**Producer:** Step 1.2 (feature phase, after UI baseline exists). Docs are fetched with WebFetch; inferred ACs come from product knowledge if docs are thin.
+
+**Consumers:** Step 2.1 (filter baseline ACs applicable to a sub-feature), Step 3 Phase 1 checklist, Gate 1 AC coverage (combined with `{S}-ac-delta.md`).
+
+**Schema (frontmatter required):**
+```yaml
+---
+feature: {feature-slug}
+source: docs | inferred | mixed
+sources:
+  - {url or doc path}
+  - ...
+ac_count: N
+generated_at: {ISO8601}
+---
+
+AC-1: {verifiable requirement}
+AC-2: ...
+```
+
+Body is a flat list of `AC-N: ...` lines. IDs are **feature-global** — `AC-17` in baseline is the same AC-17 for every sub-feature. `UNCLEAR` items keep `AC-N: UNCLEAR — {question}`.
+
+**Cheap validation:**
+```bash
+head -10 test-cases/{F}/_ac-baseline.md | grep -q '^ac_count:' && \
+  actual=$(grep -cE '^AC-[0-9]+:' test-cases/{F}/_ac-baseline.md) && \
+  [ "$actual" -ge 1 ]
+```
+
+**Mutability:** read-only after Step 1.2. User may manually append/clarify ACs before Step 1.5 approval. Never written by any sub-feature phase.
+
+---
+
+## A-shared-ui — _shared-ui.md (feature-level)
+
+**Purpose:** catalog of UI elements shared across the feature (top nav, breadcrumbs, sidebar, feature-wide toolbars, common modals, bulk-action bars, global toasts). Explored once in Step 1.1; each sub-feature delta references but never re-catalogs these.
+
+**Path:** `test-cases/{F}/_shared-ui.md`
+
+**Producer:** Step 1.1 (feature phase). `ui-explorer` subagent invoked with `mode: feature-baseline` — walks the feature's main surfaces at a breadth-first level, identifies candidate sub-features, catalogs shared chrome.
+
+**Consumers:** Step 2.2 (ui-explorer `sub-feature-delta` mode receives this path as `shared_catalog_path` to avoid re-cataloging), Step 3 Phase 3b ui-validator (combined with `{S}-ui-delta.md`), Gate 11 implementation-leak check.
+
+**Schema:** same format as [ui-catalog-format.md](ui-catalog-format.md), with feature-baseline frontmatter:
+
+```yaml
+---
+feature: {F}
+scope: feature-baseline
+entry_url: {url}
+explored_at: {ISO8601}
+explored_by: ui-explorer
+mode: feature-baseline
+shared_surfaces: N        # count of surfaces cataloged (nav, sidebar, modals, etc.)
+candidate_sub_features: M # count of sub-features identified for destructuring
+gaps: count
+---
+```
+
+**Cheap validation:**
+```bash
+head -20 test-cases/{F}/_shared-ui.md | \
+  awk '/^mode: feature-baseline/{m=1} /^shared_surfaces: [1-9]/{s=1} END{exit !(m && s)}'
+```
+
+**Mutability:** read-only after Step 1.1. User may manually add a missed shared element before Step 1.5 approval.
+
+---
+
+## A-steps — _existing-steps.md (feature-level)
+
+**Purpose:** reusable step phrases already defined in Testomat.io for this feature's suite(s) — so Step 3 Phase 2 can prefer matching wording over inventing new phrasings and avoid step-library churn.
+
+**Path:** `test-cases/{F}/_existing-steps.md`
+
+**Producer:** Step 1.3 (feature phase). Uses Testomat MCP (`mcp__testomatio__steps_list` / `steps_search`) to fetch steps tagged or linked to this feature. Optional — skip if MCP unavailable or feature has no prior coverage.
+
+**Consumers:** Step 3 Phase 2 (test generation — reuses existing step phrases when semantically equivalent).
+
+**Schema:**
+```markdown
+---
+feature: {F}
+fetched_at: {ISO8601}
+step_count: N
+source: testomatio-mcp
+---
+
+## Existing step phrases
+
+- I log in as ${role}
+- I open Settings → Labels
+- ...
+```
+
+**Cheap validation:**
+```bash
+test ! -f test-cases/{F}/_existing-steps.md || \
+  (head -10 test-cases/{F}/_existing-steps.md | grep -q '^step_count:')
+```
+(Artifact is optional — validation passes if file absent OR if present with frontmatter.)
+
+**Mutability:** read-only after Step 1.3. Stale after long periods — can be regenerated manually via Testomat MCP if needed.
+
+---
+
+## A3 — {S}-ac-delta.md (sub-feature)
+
+**Purpose:** sub-feature-specific AC layer. Instead of re-listing all feature ACs, records (a) which baseline ACs apply to this sub-feature and (b) additional ACs that emerged from sub-feature-specific exploration or docs.
+
+**Path:** `test-cases/{F}/{S}-ac-delta.md`
+
+**Producer:** Step 2.1 (sub-feature slice). Reads `_ac-baseline.md`, filters applicable IDs, adds delta.
+
+**Consumers:** Step 3 Phase 1 checklist (merged with baseline), Phase 3a Gate 1 AC coverage (combined pool = `baseline_acs_applicable` + delta `AC-*`).
+
+**Schema (frontmatter required):**
+```yaml
+---
+feature: {feature-slug}
+suite: {suite-slug}
+references: _ac-baseline.md
+baseline_acs_applicable: [AC-3, AC-7, AC-12]   # IDs from _ac-baseline.md that this sub-feature must cover
+delta_ac_count: N                              # count of NEW ACs below (ac-delta-*)
+source: inferred | docs | mixed
+---
+
+## Baseline ACs applicable to {S}
+- AC-3 (baseline)
+- AC-7 (baseline)
+- AC-12 (baseline)
+
+## Delta ACs (sub-feature-specific)
+
+ac-delta-1: {new verifiable requirement found only in this sub-feature}
+ac-delta-2: ...
+```
+
+Delta AC IDs use `ac-delta-N` prefix to keep them visually distinct from feature-global `AC-N`. If a sub-feature has zero delta, `delta_ac_count: 0` and the Delta ACs section is empty but the heading remains.
+
+**Cheap validation:**
+```bash
+head -15 test-cases/{F}/{S}-ac-delta.md | grep -q '^references: _ac-baseline.md' && \
+  head -15 test-cases/{F}/{S}-ac-delta.md | grep -qE '^baseline_acs_applicable: \[' && \
+  grep -qE '^## Baseline ACs applicable' test-cases/{F}/{S}-ac-delta.md
+```
+At least one baseline AC must be applicable (sub-features that touch zero baseline ACs suggest destructuring error — flag to user).
+
+**Mutability:** read-only after Step 2.1.
+
+---
+
+## A4 — {S}-ui-delta.md (sub-feature)
+
+**Purpose:** UI elements **specific** to this sub-feature that are NOT in `_shared-ui.md`. Avoids re-cataloging feature-wide chrome (nav, sidebar, common modals).
+
+**Path:** `test-cases/{F}/{S}-ui-delta.md`
+
+**Producer:** Step 2.2 (sub-feature slice). `ui-explorer` invoked with `mode: sub-feature-delta` and `shared_catalog_path: test-cases/{F}/_shared-ui.md`. Subagent reads the shared catalog first, then explores this sub-feature's unique surfaces and writes ONLY the delta.
+
+**Consumers:** Step 3 Phase 1/2 (generation references both shared + delta), Phase 3b ui-validator (combined view), Gate 11 impl-leak check.
+
+**Schema:** same format as [ui-catalog-format.md](ui-catalog-format.md), with delta frontmatter:
+
+```yaml
+---
+feature: {F}
+suite: {S}
+scope: sub-feature-delta
+references: _shared-ui.md
+entry_url: {sub-feature url}
+explored_at: {ISO8601}
+explored_by: ui-explorer
+mode: sub-feature-delta
+delta_elements: N         # NEW elements added here (excluding anything in _shared-ui.md)
+verified_flows: K         # sub-feature-specific flows
+gaps: count
+---
+```
+
+Body contains ONLY elements not already in `_shared-ui.md`. If an element visible in this sub-feature also appears in shared (e.g., global Save toolbar), it is NOT repeated — consumers read both files.
+
+**Cheap validation:**
+```bash
+head -20 test-cases/{F}/{S}-ui-delta.md | \
+  awk '/^mode: sub-feature-delta/{m=1} /^references: _shared-ui.md/{r=1} /^delta_elements: [0-9]+/{d=1} END{exit !(m && r && d)}'
+```
+`delta_elements: 0` is allowed (sub-feature has no unique elements, lives entirely on shared surfaces) but rare — flag to user for confirmation when this happens.
+
+**Mutability:** `ui-explorer` may append a "Gap-focused follow-up" section on retry. Frontmatter `gaps` count is updated in place.
+
+---
+
+## A5 — {S}-scope.md
+
+**Schema:**
+```markdown
+# Scope: {feature} / {suite}
+
+**Date:** YYYY-MM-DD
+
+## In Scope
+- {item} — covers AC-N
+- ...
+
+## Out of Scope
+- {item} — reason: {justification}
+- ...
+
+## Unclear ACs
+- AC-N: {question} — {resolution or "deferred"}
+
+## Sources Used
+- {source link or "UI catalog only"}
+```
+
+**Cheap validation:**
+```bash
+for h in '^## In Scope' '^## Out of Scope' '^## Unclear ACs' '^## Sources Used'; do
+  grep -q "$h" test-cases/{F}/{S}-scope.md || { echo "missing: $h"; exit 1; }
+done
+```
+All four headings must exist.
+
+---
+
+## A6 — {S}-test-cases.md / {S}/*.md
+
+**Schema:** see [test-case-format.md](test-case-format.md).
+
+**Flat vs nested decision:** single file if Phase 1 Grouping wrote `Output: flat`; directory if `Output: nested`. Mismatch = Phase 3a violation.
+
+**Cheap validation:**
+```bash
+# flat
+test -f "test-cases/{F}/{S}-test-cases.md" && \
+  grep -cE '^## ' "test-cases/{F}/{S}-test-cases.md"
+# nested
+test -d "test-cases/{F}/{S}" && \
+  find "test-cases/{F}/{S}" -name '*.md' -type f | wc -l
+```
+
+Test count must equal number of `^## ` lines (minus 1 for `## Steps` inside each test — grep for `^## (?!Steps)` is more precise; use `grep -cE '^## ' file | awk '{print $1 - N_steps_headings}'`).
+
+See [Phase 3a gate-check recipes](#phase-3a-gate-check-recipes) below for the canonical counting commands.
+
+**Mutability:**
+- Phase 3a mechanical: test-case-reviewer auto-fixes (scope compliance, role names, step quality)
+- Phase 3b ui-validator: in-place fixes for element names, toast text, step mechanics
+- Phase 4: regeneration on user reject (overwrites)
+
+---
+
+## A7 — {S}-review.md
+
+Produced by `test-case-reviewer` subagent for suites ≥15 tests. Free-form violations report.
+
+**Cheap validation:**
+```bash
+test -f test-cases/{F}/{S}-review.md && \
+  grep -q '^violations:' test-cases/{F}/{S}-review.md
+```
+
+---
+
+## A8 — {S}-validation-log.md (sub-feature)
+
+**Purpose:** audit trail written by the `ui-validator` subagent during Step 3 Phase 3b. Captures which tests were walked in the real UI, which `_Expected_:` mismatches were fixed, and which gaps (unresolvable element names, missing UI surfaces) remain. **Not published to Testomat.io.**
+
+**Path:** `test-cases/{F}/{S}-validation-log.md` — sub-feature-level aggregate. For nested suites, ALL section files (e.g. `dialog-lifecycle.md`, `form-fields.md`) share a single log file; each validation run appends a new `## Section: {section_label}` block.
+
+**Why separate from A6 test MD:** Testomat.io's markdown parser treats top-level `##` headings as suite/test boundaries. Appending a `## Validation Log` section inside the test MD creates a phantom test or breaks the nested-suite structure. Regression detected 2026-04-20: 20 test files had `## Validation Log` leaked from Phase 3b and had to be stripped retroactively.
+
+**Schema:**
+```markdown
+# Validation Log: {suite_slug}
+
+Aggregated audit trail written by ui-validator subagent during Step 3 Phase 3b. Not published to Testomat.io.
+
+## Section: {section_label}
+
+**Validated by:** ui-validator (subagent)
+**Validated at:** {ISO8601}
+**Test MD:** {relative path}
+**Tests walked:** {test titles}
+**Mismatches fixed:** {count}
+**Gaps:** {count} — {one-line per gap}
+
+{Optional free-form notes: actual UI strings verified, discrepancies with catalog, etc.}
+```
+
+**Producer:** `ui-validator` subagent (Step 3 Phase 3b). Creates file on first validation; appends new `## Section:` block on subsequent validations within the same sub-feature.
+
+**Consumer:** humans reviewing what changed; Step 4 report (can reference `Mismatches fixed: N` in the final sub-feature report); future audits / retrospectives.
+
+**Cheap validation:**
+```bash
+test -f test-cases/{F}/{S}-validation-log.md && \
+  grep -qE '^(# Validation Log|## Section:)' test-cases/{F}/{S}-validation-log.md
+```
+
+**Mutability:** skill appends; user may manually prune old entries. Never deleted by the skill automatically.
+
+**Edge cases:**
+- **Zero mismatches, zero gaps** — still write a section entry with `Mismatches fixed: 0 | Gaps: 0` so the audit trail records that validation happened. Silence would be ambiguous ("didn't run" vs "ran clean").
+- **Sub-feature is flat (single MD file)** — `section_label` equals the suite slug itself; file name still follows `{S}-validation-log.md`.
+
+---
+
+## A-style — _style.md (feature-level)
+
+**Purpose:** capture style patterns from the FIRST approved sub-feature of a feature so every subsequent sub-feature stays stylistically consistent (preconditions phrasing, step granularity, voice, title pattern, tags, automation defaults). Prevents drift between sub-feature #1 and #5 when each is generated in a separate conversation (context-isolated).
+
+**Path:** `test-cases/{F}/_style.md` — `_` prefix signals feature-level shared artifact (not suite-scoped).
+
+**Schema:** see template in [steps/40-report.md § step 3 Style capture](../steps/40-report.md).
+
+**Producer:** Step 4, only on the FIRST sub-feature to flip from `[ ]` to `[x]` in destructuring.md. On subsequent approvals Step 4 checks existence and **skips write** — the user may have edited the file manually and the skill must not clobber their edits.
+
+**Consumer:** Step 3 Phase 2 of every subsequent sub-feature in the same feature. The Phase 2 header instructs: "if `_style.md` exists → Read it FIRST, match conventions exactly".
+
+**Cheap validation:**
+```bash
+test -f test-cases/{F}/_style.md && \
+  grep -qE '^## (Preconditions|Steps|Titles|Tags)' test-cases/{F}/_style.md
+```
+At least one of the four section headers must be present. If validation fails → treat as absent (do not read); Step 3 of the next approved sub-feature will overwrite.
+
+**Mutability:** write-once by the skill; user-editable thereafter. If the user wants to reset the style (e.g., they realize after #2 that #1's style was wrong), they manually delete `_style.md` → next approval re-captures from the new reference.
+
+**Edge cases:**
+- **Single-sub-feature feature (only one row in destructuring.md):** file is still created on first approval — harmless, since there's no "next" sub-feature to reuse it; it simply documents style for future additions
+- **Style contradicts current sub-feature's needs:** generator should still match `_style.md`; if a Phase 3 gate fails because of the style constraint → surface to user with option "accept one-off deviation for this sub-feature" vs "update `_style.md` going forward"
+
+---
+
+## State transitions
+
+```
+FEATURE PHASE (one conversation, HARD STOP at end)
+
+  (empty) ──Step 0──▶ A1 intake
+                       │
+                       ├─Step 1.1─▶ A-shared-ui + A2 (map)   (ui-explorer: feature-baseline)
+                       ├─Step 1.2─▶ A-base (_ac-baseline.md)  (docs fetch)
+                       ├─Step 1.3─▶ A-steps (optional)        (Testomat MCP)
+                       └─Step 1.4─▶ A2 cross-cutting section added
+                       [HARD STOP — user approves map + baselines]
+
+SUB-FEATURE PHASE (one conversation per sub-feature, repeats N times)
+
+  A1 + A2 + A-base + A-shared-ui (+ A-steps)
+                       │
+                       ├─Step 2.1─▶ A3 {S}-ac-delta (references A-base)
+                       ├─Step 2.2─▶ A4 {S}-ui-delta (ui-explorer: sub-feature-delta, references A-shared-ui)
+                       └─Step 2.3─▶ A5 {S}-scope
+                                    │
+                                    └─Step 3 Phase 2─▶ A6 test-cases
+                                                       │
+                                                       ├─Phase 3a/3b (mutates A6, may produce A7)
+                                                       │
+                                                       └─Step 4─▶ toggles A2 [x]; on FIRST approval writes A-style (_style.md)
+                                                                  [HARD STOP — next sub-feature starts fresh conversation]
+```
+
+Publishing (branch creation, push, label application, batch) is owned by `/publish-test-cases-batch` — outside this state machine.
+
+---
+
+## Phase 3a gate-check recipes
+
+Every check below exits 0 on pass, non-zero on fail. SKILL.md Phase 3a runs these as a batch — any non-zero exit blocks Phase 4 and routes back to regeneration.
+
+**Convention in this section:** bash vars `F` = feature-slug, `S` = suite-slug, `MD` = A6 path (flat file **or** concatenation of nested files).
+
+### Set up variables for a run
+
+```bash
+F="manual-tests-execution"
+S="time-tracking"
+# Resolve A6 path
+if [ -f "test-cases/$F/$S-test-cases.md" ]; then
+  MD="test-cases/$F/$S-test-cases.md"
+  MDS=("$MD")
+elif [ -d "test-cases/$F/$S" ]; then
+  MDS=(test-cases/$F/$S/*.md)
+  MD="${MDS[0]}"  # primary for header checks
+else
+  echo "A6 missing"; exit 1
+fi
+```
+
+### Gate 1 — Every AC has ≥1 test
+
+AC pool = (baseline ACs applicable to this sub-feature) ∪ (delta ACs from `{S}-ac-delta.md`).
+
+```bash
+# Applicable baseline ACs come from the frontmatter list `baseline_acs_applicable: [AC-3, AC-7, ...]`
+base_acs=$(awk '/^baseline_acs_applicable:/{
+  sub(/^baseline_acs_applicable: *\[/, ""); sub(/\].*$/, "");
+  gsub(/,/, " "); gsub(/ +/, " "); print; exit
+}' "test-cases/$F/$S-ac-delta.md")
+delta_acs=$(grep -oE '^ac-delta-[0-9]+' "test-cases/$F/$S-ac-delta.md" | sort -u)
+ac_ids="$base_acs $delta_acs"
+missing=0
+for ac in $ac_ids; do
+  [ -z "$ac" ] && continue
+  if ! grep -qE "^source:.*\b$ac\b" "${MDS[@]}"; then
+    echo "uncovered: $ac"
+    missing=$((missing+1))
+  fi
+done
+[ $missing -eq 0 ]
+```
+
+### Gate 2 — Priority pyramid in bounds
+
+```bash
+total=$(grep -cE '^priority:' "${MDS[@]}" | awk -F: '{s+=$NF} END{print s}')
+crit=$(grep -cE '^priority: critical' "${MDS[@]}" | awk -F: '{s+=$NF} END{print s}')
+high=$(grep -cE '^priority: high' "${MDS[@]}" | awk -F: '{s+=$NF} END{print s}')
+norm=$(grep -cE '^priority: normal' "${MDS[@]}" | awk -F: '{s+=$NF} END{print s}')
+low=$(grep -cE '^priority: low' "${MDS[@]}" | awk -F: '{s+=$NF} END{print s}')
+# Advisory for <15 tests, blocking for ≥15
+if [ $total -ge 15 ]; then
+  awk -v c=$crit -v h=$high -v n=$norm -v l=$low -v t=$total 'BEGIN{
+    if (c/t > 0.15) { print "critical > 15%"; exit 1 }
+    if (h/t > 0.35) { print "high > 35%"; exit 1 }
+    if (n/t < 0.35) { print "normal < 35%"; exit 1 }
+    if (l/t < 0.05) { print "low < 5%"; exit 1 }
+  }'
+fi
+```
+
+### Gate 3 — Automation classification present on every test
+
+```bash
+tests=$(grep -cE '^## ' "${MDS[@]}" | awk -F: '{s+=$NF} END{print s}')
+# subtract "## Steps" headings
+steps=$(grep -cE '^## Steps$' "${MDS[@]}" | awk -F: '{s+=$NF} END{print s}')
+real_tests=$((tests - steps))
+auto=$(grep -cE '^automation: (candidate|deferred|manual-only)' "${MDS[@]}" | awk -F: '{s+=$NF} END{print s}')
+[ "$auto" -eq "$real_tests" ]
+```
+
+### Gate 4 — Manual-only rate ≤30%
+
+```bash
+manual=$(grep -cE '^automation: manual-only' "${MDS[@]}" | awk -F: '{s+=$NF} END{print s}')
+awk -v m=$manual -v t=$real_tests 'BEGIN{exit !(t==0 || m/t <= 0.30)}'
+```
+
+### Gate 5 — No forbidden metadata fields (break push)
+
+```bash
+! grep -qE '^(tags|labels):' "${MDS[@]}"
+```
+
+### Gate 6 — Scope compliance (no OUT OF SCOPE items leaked in)
+
+```bash
+# OUT OF SCOPE item lines look like "- {item} — reason: ..."
+oos=$(awk '/^## Out of Scope/{flag=1;next} /^## /{flag=0} flag && /^- /{sub(/ *— *reason.*/,""); sub(/^- /,""); print}' \
+        "test-cases/$F/$S-scope.md")
+leak=0
+while IFS= read -r item; do
+  [ -z "$item" ] && continue
+  if grep -qiF "$item" "${MDS[@]}"; then
+    echo "scope leak: $item"
+    leak=$((leak+1))
+  fi
+done <<< "$oos"
+[ $leak -eq 0 ]
+```
+
+This is a text-match heuristic — if OUT OF SCOPE items have descriptive names that naturally appear in prose ("Dashboard"), the check over-flags. In that case the reviewer must manually confirm. Gate 6 is advisory only when the suite has short out-of-scope labels — blocking otherwise. See SKILL.md Phase 3a for the escalation hook.
+
+### Gate 7 — Role names match product-context.md § User Roles
+
+```bash
+! grep -E '^\*\*Preconditions:\*\*.*(mainUser|qaUser|managerUser|readonlyUser)' "${MDS[@]}"
+```
+
+### Gate 8 — Minimum step count per standalone test (blocking if >20%)
+
+```bash
+# Count steps per test: for each "## Title" block, count "^- " bullets under "## Steps"
+# Short script: awk pass that tracks current test and step count
+short=$(awk '
+  /^## / && !/^## Steps$/ { if (test && count < 3) short++; test=$0; count=0; next }
+  /^## Steps$/ { in_steps=1; next }
+  /^## / { in_steps=0; next }
+  in_steps && /^[-*] / { count++ }
+  END { if (test && count < 3) short++; print short+0 }
+' "${MDS[@]}")
+awk -v s=$short -v t=$real_tests 'BEGIN{exit !(t==0 || s/t <= 0.20)}'
+```
+
+### Gate 9 — Forbidden `id:` field (server-generated only)
+
+```bash
+! grep -qE '^id: @[ST]' "${MDS[@]}"
+```
+
+### Gate 10 — Sub-suite distribution consistency
+
+If the Phase 1 header says `Output: nested` → A6 must be a directory. If `Output: flat` → A6 must be a single file. See [self-review-checks.md § 8](self-review-checks.md#8-sub-suite-distribution).
+
+```bash
+header=$(head -5 "$MD" | grep -oE '^Output: (nested|flat)' | awk '{print $2}')
+case "$header" in
+  nested) [ -d "test-cases/$F/$S" ] ;;
+  flat)   [ -f "test-cases/$F/$S-test-cases.md" ] ;;
+  *)      echo "missing Phase 1 Output: header"; exit 1 ;;
+esac
+```
+
+### Gate 11 — No implementation leakage in Step / Expected text
+
+Forbidden in Steps and `_Expected_:` lines: CSS class names, MDI icon class names, `data-test-id`, `aria-describedby`, `aria-labelledby`, raw `<script>`/`<img onerror>`/`<iframe onload>` payloads, inline AC refs (`(AC-N)` or bare `AC-N` outside the `source:` frontmatter). Full rationale + rewrite table: [test-case-format.md § Anti-patterns in test case bodies](test-case-format.md).
+
+```bash
+# Target lines: "- step line" or lines starting "  _Expected_:" (incl. bullet sub-items under _Expected_).
+# Skip frontmatter (`source: AC-...` is correct there).
+leak_lines=$(awk '
+  /^<!-- test/,/^-->$/ { next }                            # skip test frontmatter blocks
+  /^(  - |- |  _Expected_:|_Expected_:)/ {
+    if ($0 ~ /\b(bg-|text-|ring-|hover:|focus:)[a-z0-9-]+/) { print FILENAME":"NR": css-class: "$0; flag=1 }
+    if ($0 ~ /\bmd-icon-[a-z-]+/)                          { print FILENAME":"NR": icon-class: "$0; flag=1 }
+    if ($0 ~ /\.(keyboard-shortcut-[a-z-]+|[a-z-]+-icon)\b/) { print FILENAME":"NR": css-selector: "$0; flag=1 }
+    if ($0 ~ /\b(data-test-id|aria-describedby|aria-labelledby)\b/) { print FILENAME":"NR": selector-attr: "$0; flag=1 }
+    if ($0 ~ /\(AC-[0-9]+\)|\bAC-[0-9]+\b/)                { print FILENAME":"NR": inline-AC-ref: "$0; flag=1 }
+    if ($0 ~ /<script[ >]|<img[^>]*onerror|<iframe[^>]*onload/) { print FILENAME":"NR": raw-exploit-payload: "$0; flag=1 }
+  }
+' "${MDS[@]}")
+[ -z "$leak_lines" ] || { echo "$leak_lines"; echo "Gate-11 FAIL"; exit 1; }
+```
+
+### Gate 12 — Parameterized tests reference `${col}` in title
+
+Any test with an `<!-- example -->` table MUST reference at least one column name via `${col_name}` in its title — otherwise every row renders with an identical title in Testomat.io. Full rule: [test-case-format.md § Parameterized title rule](test-case-format.md).
+
+```bash
+# For each MD: walk tests in order; if a test block contains <!-- example -->, its title must have ${...}
+for f in "${MDS[@]}"; do
+  awk -v file="$f" '
+    /^## / && !/^## Steps$/ { title=$0; has_example=0 }
+    /<!-- example/ { has_example=1 }
+    /^## / && !/^## Steps$/ && NR>1 {
+      # When we hit the next test heading, verify the previous one
+      if (prev_has_example && prev_title !~ /\$\{[a-zA-Z_]+\}/) {
+        print file": parameterized title missing ${col}: "prev_title; flag=1
+      }
+      prev_title=title; prev_has_example=has_example
+      next
+    }
+    { prev_title=title; prev_has_example=has_example }
+    END {
+      if (prev_has_example && prev_title !~ /\$\{[a-zA-Z_]+\}/) {
+        print file": parameterized title missing ${col}: "prev_title; flag=1
+      }
+      exit flag
+    }
+  ' "$f" || { echo "Gate-12 FAIL"; exit 1; }
+done
+```
+
+### Gate 13 — No process-metadata sections leaked into test MD
+
+Test MDs must contain ONLY test cases (H1 suite title + test H2s + `## Steps` + `## Preconditions`). **Process metadata** — validation logs, reviewer notes, TODOs, placeholders — must NOT appear at top level, because Testomat.io's markdown parser treats `## Foo` as either a suite boundary or a test title, which creates ghost tests or breaks nested-suite structure.
+
+Forbidden H2 headings in test MDs:
+- `## Validation Log` (belongs in `{S}-validation-log.md` — see A8)
+- `## Review Log` / `## Self-Review` / `## Gap Log` (belongs in `{S}-review.md` — see A7)
+- `## Notes` / `## Comments` / `## TODO` / `## FIXME` (stray dev annotations)
+- Any `## Section:` block outside a dedicated log file
+
+Also forbidden: placeholder tokens (`[TBD]`, `[UNDEFINED]`, `[NEEDS REVIEW]`, `PLACEHOLDER`) and stray `TODO:` / `FIXME:` / `XXX:` lines.
+
+Regression origin: 2026-04-20 — 20 test files had `## Validation Log` appended by the ui-validator subagent; required retroactive cleanup + separate A8 artifact.
+
+```bash
+# Build MDS array the normal way (see § Set up variables for a run)
+forbidden_h2='^## (Validation Log|Review Log|Self-Review|Gap Log|Notes|Comments|TODO|FIXME|Section:)'
+placeholder_tokens='\[TBD\]|\[UNDEFINED\]|\[NEEDS\b|\bPLACEHOLDER\b'
+dev_markers='^[[:space:]]*(TODO|FIXME|XXX):'
+
+viol=$(grep -nEH "$forbidden_h2" "${MDS[@]}" 2>/dev/null)
+viol+=$(grep -nEH "$placeholder_tokens" "${MDS[@]}" 2>/dev/null)
+viol+=$(grep -nEH "$dev_markers" "${MDS[@]}" 2>/dev/null)
+
+if [ -n "$viol" ]; then
+  echo "Gate-13 FAIL — process metadata / placeholders leaked into test MD:"
+  echo "$viol"
+  exit 1
+fi
+```
+
+**Auto-fix policy:** Gate 13 is **not** auto-fixable by the skill. Each hit points to a subagent (usually ui-validator or test-case-reviewer) writing to the wrong file. Route: strip offending section from the test MD + re-check the producing subagent's output path + re-run the phase that produced it.
+
+---
+
+## Resume Detector input
+
+The Step 0 Resume Detector reads each artifact path above, runs the cheap validation, and emits a two-section table (feature state + sub-feature state):
+
+**Feature-level state:**
+
+| Artifact | Exists | Valid | Action |
+|---|---|---|---|
+| A1 intake | Y/N | Y/N | reuse / re-run Step 0 |
+| A-shared-ui _shared-ui.md | Y/N | Y/N | if N → resume Step 1.1 |
+| A-base _ac-baseline.md | Y/N | Y/N | if N → resume Step 1.2 |
+| A-steps _existing-steps.md | Y/N/- | Y/N/- | optional; - = skipped |
+| A2 destructuring | Y/N | Y/N | if N → resume Step 1.4 |
+| A-style _style.md | Y/N/- | Y/N/- | - = no sub-feature approved yet |
+
+**Sub-feature-level state** (for the current `{S}` in focus):
+
+| Artifact | Exists | Valid | Action |
+|---|---|---|---|
+| A3 {S}-ac-delta | Y/N | Y/N | - |
+| A4 {S}-ui-delta | Y/N | Y/N | - |
+| A5 {S}-scope | Y/N | Y/N | - |
+| A6 {S}-test-cases | Y/N | Y/N | - |
+| A7 {S}-review | Y/N/- | Y/N/- | - = suite < 15 tests |
+
+**State decision tree:**
+- No A1 → Step 0
+- A1 only → Step 1 feature phase
+- A1 + partial feature artifacts → resume first missing feature substep
+- All feature artifacts + no sub-feature picked → ask user which sub-feature to start
+- Sub-feature partial (some of A3-A5) → resume Step 2 at first missing
+- A3-A5 complete, A6 missing → Step 3
+- A6 exists → Phase 3/4; on done, ask "next sub-feature? regenerate? stop?"
+
+See SKILL.md § Step 0 Resume Detector for the runbook.
diff --git a/skills/create-test-cases/references/destructuring.md b/skills/create-test-cases/references/destructuring.md
new file mode 100644
index 0000000..4c32500
--- /dev/null
+++ b/skills/create-test-cases/references/destructuring.md
@@ -0,0 +1,127 @@
+# Feature Destructuring — reference procedure for Step 1 (Feature Phase)
+
+**Used by:** [steps/01-feature-phase.md](../steps/01-feature-phase.md) substeps 1.1 (sub-feature map) and 1.4 (cross-cutting concerns).
+
+**Purpose:** explore the feature UI, identify ALL logical sub-features, record cross-cutting concerns, and produce an approved `destructuring.md` map. The map is a feature-level artifact (A2) consumed by every sub-feature conversation — each sub-feature is then covered in a separate conversation to keep context clean.
+
+**Always on:** under the feature-first flow there is no "single-suite mode". Even a feature with one sub-feature runs Step 1 so the `_shared-ui.md`, `_ac-baseline.md`, and destructuring map exist as shared baselines.
+
+---
+
+## Procedure
+
+1. **Login & Navigate** — use Playwright MCP (invoked via the `ui-explorer` subagent in `mode: feature-baseline`). Login with `TESTOMATIO_EMAIL` / `TESTOMATIO_PASSWORD`. Navigate to the Q1.1 entry point URL.
+
+2. **Explore the feature surface** — systematically catalog all distinct functional areas:
+   - Identify tabs, sections, panels, dialogs, settings groups
+   - Click through each top-level interactive area
+   - Use `browser_evaluate` for DOM inspection and `browser_snapshot` for accessibility tree when needed
+
+3. **Apply decomposition criteria** — an area warrants its own suite when it meets **2+ of these**:
+   - Has a **form with 3+ fields** (inputs, dropdowns, toggles)
+   - Has its own **CRUD lifecycle** (create, read, update, delete)
+   - Has **3+ distinct states or modes** (e.g., pending/in-progress/finished)
+   - Has **its own entry point or navigation** (separate tab, panel, dialog)
+   - Has **interactions with 2+ other entities** (e.g., environment selection affects run + report)
+   - If an area meets only 1 criterion → it's likely a section within a larger suite, not its own suite
+   - If an area meets 4+ → it's deep, flag as `estimated tests: 30+`
+
+4. **Draft sub-feature map with scope boundaries** — each sub-feature gets explicit ownership:
+   ```
+   Feature: {feature-name}
+   Entry point: {Q1.1 URL}
+
+   Sub-features identified:
+
+   1. **{Sub-feature Name}** — {1-line description}
+      Owns: {what this suite tests — specific actions, UI areas, states}
+      Does NOT own: {adjacent functionality tested elsewhere}
+      Key elements: {main buttons/forms/controls}
+      Estimated tests: {10-15 / 20-30 / 30+}
+      Criteria met: {which decomposition criteria from step 3}
+
+   2. **{Sub-feature Name}** — {1-line description}
+      Owns: ...
+      Does NOT own: ...
+      ...
+   ```
+
+   **Overlap rules:**
+   - If two sub-features share a UI element → one OWNS it (tests CRUD), the other USES it (precondition only)
+   - If a workflow spans two sub-features → the one that initiates the action owns the test
+   - Flag any unresolved overlaps for user decision in step 5
+
+5. **Identify cross-cutting concerns (mandatory)** — after drafting the sub-feature map, analyze which elements **change behavior across multiple sub-features** when activated. These are **structural modifiers** — not simple fields, but mode switches that create variant flows throughout the feature.
+
+   **How to identify:**
+   - Review each sub-feature's key elements for **multi-select fields, mode toggles, or entity multipliers**
+   - For each candidate, ask: "If this is set to a non-default value, does it change the behavior of 2+ sub-features?"
+   - If yes → it's a cross-cutting concern
+
+   **Common structural modifiers:**
+   - Multi-environment selection (creates N parallel runs → affects creation, execution, detail, list, archive)
+   - Multi-user assignment (changes who sees/executes what → affects creation, execution, detail)
+   - Checklist mode toggle (changes execution UX → affects creation, execution)
+   - Bulk operations mode (multi-select + batch action → affects execution, detail, list)
+
+   **Document each cross-cutting concern:**
+   ```
+   Cross-cutting concerns:
+
+   A. **{Concern Name}** — {what it does}
+      Trigger: {what activates it — e.g., "selecting 2+ environments in run creation form"}
+      Affects: #{sub-feature-1} ({how}), #{sub-feature-2} ({how}), ...
+      Must-test scenarios: {key variant scenarios that each affected sub-feature must include}
+   ```
+
+   **Enforcement rule:** During test generation for each sub-feature (Step 3), the cross-cutting concerns list is a **mandatory input**. For each concern that lists the current sub-feature in "Affects" → at least 1 dedicated test must exist. This is checked in Phase 3 quality checks (self-review-checks.md § 1b Cross-cutting coverage).
+
+6. **Recommend execution order** — propose order with rationale:
+   - **Core first:** sub-features with the main workflow (create, execute, complete)
+   - **Variations next:** sub-features that modify or extend the core (configuration, filtering, bulk)
+   - **Edge last:** sub-features that handle exceptions (error recovery, permissions, archiving)
+   - Rationale: earlier suites establish patterns and preconditions that later suites build on
+
+7. **Present to user for approval (Step 1.5 HARD STOP)** — numbered list with recommended order + cross-cutting concerns. Options: confirm and start #1, edit the list/order, pick specific sub-feature to start with.
+
+8. **Save to file:** `test-cases/{feature-slug}/destructuring.md` with the approved map + cross-cutting concerns section + timestamp + user's chosen order.
+
+9. **Track & hand off:** each sub-feature is covered in a **separate skill invocation** (separate conversation) to avoid context overflow. The map file tracks progress:
+   ```markdown
+   - [x] Sub-feature 1 — completed 2026-04-14
+   - [ ] Sub-feature 2
+   - [ ] Sub-feature 3
+   ```
+   On next invocation, the Resume Detector reads `destructuring.md`, sees which sub-features are done, and proposes the next unchecked one. Intake answers (Q2-Q4) carry over — only `S` changes per suite. Reuse the feature directory under `test-cases/{feature-slug}/` — do NOT recreate it.
+
+---
+
+## Publishing
+
+This skill never publishes. Once all sub-features are `[x]`, the user runs:
+
+```
+/publish-test-cases-batch test-cases/{feature-slug}/
+```
+
+That skill handles batch publish (one branch per suite, `tc/{feature}/{suite-slug}`).
+
+---
+
+## File structure under feature-first flow
+
+```
+test-cases/{feature-slug}/
+  intake.md                     ← A1 (Step 0)
+  destructuring.md              ← A2 (Step 1.1 + 1.4)
+  _ac-baseline.md               ← A-base (Step 1.2)
+  _shared-ui.md                 ← A-shared-ui (Step 1.1, ui-explorer feature-baseline)
+  _existing-steps.md            ← A-steps (Step 1.3)
+  _style.md                     ← A-style (Step 4, first approval)
+  {suite-1-slug}-ac-delta.md    ← A3 (Step 2.1)
+  {suite-1-slug}-ui-delta.md    ← A4 (Step 2.2, ui-explorer sub-feature-delta)
+  {suite-1-slug}-scope.md       ← A5 (Step 2.4)
+  {suite-1-slug}-test-cases.md  ← A6 (Step 3, flat) — OR {suite-1-slug}/*.md for nested
+  {suite-1-slug}-review.md      ← A7 (test-case-reviewer, ≥15 tests)
+  ...
+```
diff --git a/skills/create-test-cases/references/product-context.md b/skills/create-test-cases/references/product-context.md
new file mode 100644
index 0000000..98d7c3c
--- /dev/null
+++ b/skills/create-test-cases/references/product-context.md
@@ -0,0 +1,54 @@
+# Testomat.io — Product Context
+
+Roles, entities, and relationships for test case design. Read this in Step 1.
+
+---
+
+## User Roles & Permissions
+
+| Role | Code name | What they CAN do | What they CANNOT do |
+|------|-----------|-----------------|---------------------|
+| **Owner** | `mainUser` | Everything — full CRUD on all entities, manage users, project settings, billing | — |
+| **Manager** | `managerUser` | Manage runs, plans, users. Create/edit/delete most entities | Change billing, delete project |
+| **QA** | `qaUser` | Execute tests, create runs, edit test cases, manage own content | Delete other users' content, change project settings |
+| **Read-only** | `readonlyUser` | View all content, search, navigate, view history | Create, edit, delete anything. UI shows `Read-only` badge |
+
+### Role-based testing rules
+
+**Every sub-feature MUST include role-based tests when the feature has write actions:**
+1. At minimum: test the happy path with `owner` + verify `read-only user` CANNOT perform write actions
+2. If the feature has delete/manage operations: also test `QA` (often cannot delete others' content)
+3. Format in preconditions: `"User is logged in as read-only user."` — never use code names
+
+**Common role-based scenarios to always consider:**
+- Read-only user sees the page but create/edit/delete buttons are hidden or disabled
+- QA user can create but cannot delete another user's entity
+- Manager can manage team content but not project-level settings
+- Switching roles mid-session (login as different user) — permissions update correctly
+
+---
+
+## Key Entities
+
+| Entity | Lives in | Key operations | Notes |
+|--------|----------|---------------|-------|
+| **Tests** | Tests page, inside suites | CRUD, label, tag, assign, attach files, view history | Can be manual or automated |
+| **Suites** | Tests page | CRUD, copy, convert to/from folder, nest | Container for tests |
+| **Folders** | Tests page | CRUD, convert to suite | Organizational grouping above suites |
+| **Plans** | Plans page | CRUD, select tests, hide/unhide | Define what to run |
+| **Runs** | Runs page | Create (manual/auto), launch, finish, relaunch, archive, purge | Execution instances |
+| **Run Groups** | Runs page | CRUD, multi-environment | Group related runs |
+| **Labels** | Settings → Labels & Fields | CRUD, assign to tests (single + bulk), color | Protected: `test_label` |
+| **Tags** | Inline in test titles | Apply, remove, search by | `@smoke`, `@basic`, etc. |
+| **Steps** | Steps page | Search, reuse in tests | Shared step library |
+| **Requirements** | Requirements page | CRUD, link to tests | Traceability |
+| **Templates** | Settings → Templates | CRUD, set defaults | Protected defaults exist |
+| **Branches** | Branches page | CRUD, switch, merge | Paid plan feature |
+| **Projects** | Dashboard | Create, delete, switch | Workspace container |
+
+### Entity relationships (important for cross-feature tests)
+- **Labels → Tests**: deleting a label removes it from all tests that had it
+- **Plans → Runs**: a run can be created from a plan (inherits test selection)
+- **Suites → Tests**: deleting a suite deletes all tests inside
+- **Tags → Tests**: tags live in test titles, searchable globally
+- **Branches → Everything**: branching creates a parallel version of tests/suites
diff --git a/skills/create-test-cases/references/self-review-checks.md b/skills/create-test-cases/references/self-review-checks.md
new file mode 100644
index 0000000..af2b23b
--- /dev/null
+++ b/skills/create-test-cases/references/self-review-checks.md
@@ -0,0 +1,321 @@
+# Coverage Self-Review — Check Definitions
+
+Detailed definitions for the 9 check groups used in [SKILL.md Step 2 Phase 3](../SKILL.md). Each group is **blocking** — fix during generation before presenting to user.
+
+**Loading guidance:** This file is **reference detail, not a runbook**. SKILL.md Phase 3 has the running check table. Only load specific sections of this file (via anchor links from the Phase 3 table) when you hit a borderline case or need to clarify mechanics. Do NOT eagerly Read the full file — it wastes parent context.
+
+**Small suite exception (< 15 tests):** percentage thresholds in checks #3 (balance) and #6 (E2E) become **advisory** — report violations but do not block. With 10 tests, one test off makes 10% swing, so strict thresholds are impractical. Still aim for the targets.
+
+**Quick pre-check** (run before the 6 mechanical checks):
+- Every test starts with a user action (not "Verify..." / "Check...")
+- Every step has `_Expected_:` with observable result
+- All UI element names match the catalog exactly
+- No 3+ tests with identical setup that should be consolidated or parameterized
+
+## Section map (anchor-based — stable across edits)
+
+Extract ONE section via a single `awk` call — never Read the whole file:
+
+```bash
+awk '/^START_PATTERN/,/^END_PATTERN/' .claude/skills/create-test-cases/references/self-review-checks.md
+```
+
+| § | Topic | START pattern | END pattern (exclusive) |
+|---|---|---|---|
+| 1 | Coverage (UI / AC / inferred AC) | `^## 1\. Coverage` | `^## 1b\. ` |
+| 1b | Cross-Cutting Coverage | `^## 1b\. Cross-Cutting` | `^## 2\. ` |
+| 2 | Scope Compliance | `^## 2\. Scope Compliance` | `^## 3\. ` |
+| 3 | Balance (scenario + semantic + priority) | `^## 3\. Balance` | `^## 4\. ` |
+| 4 | Step Quality | `^## 4\. Step Quality` | `^## 5\. ` |
+| 5 | UI Reality Check (Playwright) | `^## 5\. UI Reality` | `^## 6\. ` |
+| 6 | E2E + Parameterization | `^## 6\. E2E` | `^## 6b\. ` |
+| 6b | Implementation Leakage | `^## 6b\. Implementation` | `^## 7\. ` |
+| 7 | Automation Readiness | `^## 7\. Automation` | `^## 8\. ` |
+| 8 | Sub-suite Distribution | `^## 8\. Sub-suite` | `^## Output Format` |
+| — | Output Format (report template) | `^## Output Format` | end-of-file (use `awk '/^## Output Format/{p=1} p'`) |
+
+---
+
+## 1. Coverage
+
+### UI element coverage (from Step 1.3 catalog)
+- Count buttons/toggles/inputs/tabs in catalog
+- Count how many are referenced in at least one test step
+- Flag unused elements: *"Buttons referenced: 12/14 (unused: **'Import'**, **'Export'**)"*
+- Flag toasts not asserted: *"Toasts asserted: 5/7 (missing: `"Label has been updated"`, `"Label deletion failed"`)"*
+
+### AC coverage (from Step 1.1 AC list)
+- For each AC, count linked tests via `source` field
+- Flag uncovered: *"AC-4 NOT COVERED"*
+- Flag over-covered (>5 tests per AC) as possible redundancy
+
+### Independent AC validation (when AC list is `(inferred)`)
+When ACs were built from UI exploration (not from docs/specs), cross-check against [product-context.md](product-context.md) entity operations:
+- For each entity involved in the feature, verify the AC list covers: **Create, Read, Update, Delete** (where applicable)
+- Check entity relationships: if entity A affects entity B (e.g., deleting a label removes it from tests), verify an AC exists for the cross-entity behavior
+- Compare with role-based rules: verify ACs exist for permission-denied scenarios (at minimum: read-only user)
+- Flag gaps: *"Entity 'Run' has no AC for Update operation"*, *"No cross-entity AC for Plan→Run relationship"*
+- This reduces circular dependency — product-context.md provides an independent checklist that the UI exploration may have missed
+
+---
+
+## 1b. Cross-Cutting Coverage (from destructuring.md)
+
+`destructuring.md` is always produced by Step 1 (feature phase) under the feature-first flow, so this check is always on. If the file is missing → treat as a Step 1 precondition violation, not a skip.
+
+Read the `Cross-cutting concerns` section from `destructuring.md`. For each concern that lists the **current sub-feature** in its "Affects" field:
+- Verify at least 1 test exists that exercises the variant flow (not just mentions the element in preconditions)
+- The test must **activate the modifier** (e.g., select 2+ environments) and verify the **changed behavior** in this sub-feature's scope
+
+**Flag format:** *"Cross-cutting concern '{name}' affects this sub-feature but has 0 dedicated tests"*
+
+**Blocking:** any affected concern with 0 tests → add at least 1 test before proceeding.
+
+**Example:** If concern "Multi-environment" says `Affects: #2 Run Execution (N parallel runs, env context during execution)` and the current suite is Run Execution → at least 1 test must select 2+ environments and verify environment-specific execution behavior.
+
+---
+
+## 2. Scope Compliance (from Step 1.4)
+
+- Verify no test falls under OUT OF SCOPE items
+- If any do → flag and ask user before proceeding
+
+---
+
+## 3. Balance
+
+### Scenario type balance
+- **Category definitions and thresholds:** see [testing-strategy.md § 2.3](testing-strategy.md) — single source of truth. Do not restate here.
+- **Mechanics:** count tests per category (happy / negative / boundary / state / role), compute %, compare to thresholds in § 2.3
+- Flag skew: *"80% happy path — under-testing error cases"*
+
+### Semantic balance validation (anti-gaming)
+Classify tests by **content**, not by tag — verify each category is genuine:
+- **Negative** = test where the user action **fails, is denied, or produces an error**. If the test ends with a success outcome → it's NOT negative regardless of tag
+- **Boundary** = test where input is **at or beyond a limit** (0, 1, max, max+1, empty, whitespace-only, special chars). If the input is a normal valid value → it's NOT boundary
+- **Role-based** = test where the **same action is performed by a different role** with a different expected outcome. If the role doesn't affect the outcome → it's NOT role-based
+- Flag misclassified: *"t.12 tagged @negative but ends with success toast — reclassify as happy path"*
+- Re-check thresholds after reclassification
+
+### Priority distribution ([testing-strategy.md § 8](testing-strategy.md))
+- **Healthy pyramid:** `critical` ≤ 15%, `high` ≤ 35%, `normal` ≥ 35%, `low` ≥ 5%
+- Flag inverted pyramid or high overload
+- All 4 thresholds are blocking — fix before proceeding
+
+---
+
+## 4. Step Quality
+
+### Atomic step check
+- Grep for step lines containing ` and `, ` then `, ` after which ` — compound steps
+- Flag each for splitting
+
+### Observable expected result check
+- Grep for `_Expected_:` lines matching vague patterns (full banned list — see [testing-strategy.md § 3.3](testing-strategy.md)):
+  `is saved`, `works`, `is correct`, `should work`, `updates`, `approximately`, `correctly`, `appropriately`, `properly`, `as expected`, `works fine`
+- Flag each for rewording to observable state (toast text, element visibility, counter value, specific user avatar on a named test)
+- **Blocking:** any match → fix before proceeding
+
+### Test independence
+- Flag tests whose Preconditions reference "previous test" or "after test X"
+
+### Minimum step count ([testing-strategy.md § 1.3](testing-strategy.md))
+- Count standalone tests with < 3 steps
+- Flag each: *"t.5 has 1 step — add navigation context or merge with t.4"*
+- **Blocking:** if >20% of standalone tests have < 3 steps → fix before proceeding
+
+### Precondition role names ([product-context.md § User Roles](product-context.md))
+- Scan all `**Preconditions:**` lines for role names
+- Verify every role matches the "Role" column in product-context.md (`owner`, `manager`, `QA`, `read-only user`)
+- Flag internal code names: `mainUser`, `qaUser`, `managerUser`, `readonlyUser`
+- Flag missing role: *"t.3 precondition has no role specified"*
+- **Blocking:** any internal code name found → replace before proceeding
+
+---
+
+## 5. UI Reality Check (Playwright)
+
+Pick 3 tests representing different categories: 1 happy-path, 1 negative, 1 boundary or complex flow. Open browser via Playwright MCP and execute each test step-by-step literally.
+
+For each step:
+- Perform the exact action described
+- Compare `_Expected_:` with what actually happens in the UI
+- Flag mismatches: wrong toast text, missing dialog, element not visible, wrong navigation
+
+**Common catches:**
+- Toast text is different from what was written (e.g., `"Run saved"` vs actual `"Run has been saved"`)
+- A confirmation dialog appears that the test doesn't mention
+- An element is only visible after scrolling or expanding a section
+- Navigation goes to a different URL than expected
+- A loading state blocks the next action (missing wait step)
+
+**Blocking:** if any test has >1 step mismatch → fix all mismatches in MD, then re-verify the fixed steps.
+
+---
+
+## 6. E2E + Parameterization
+
+### E2E format compliance ([testing-strategy.md § 1.3](testing-strategy.md))
+- Scan every test title: if it starts with a noun/adjective describing UI state → NOT E2E
+- Flag titles matching: `"X is visible..."`, `"X is active..."`, `"X shows..."` (without prior user action)
+- Group tests by entry point + first 2 steps: if 3+ share them → flag as consolidation candidates
+- **Blocking:** if >20% of tests fail E2E check → fix first
+
+### Cross-screen outcome verification ([testing-strategy.md § 1.6](testing-strategy.md))
+- For each test whose `source` references an AC with outcome verbs (`applied`, `distributed`, `assigned`, `saved`, `created`, `deleted`, `updated`, `launched`): scan the steps
+- If the last step is a button click (Launch / Save / Submit / Delete / Assign / Finish) without a follow-up step that inspects the resulting screen (`execution page`, `detail page`, `list`, `tree`) → flag
+- Flag format: *"t.X source references 'distributed' but steps stop at 'Click Launch' — add step that inspects execution page with concrete match (specific test + specific avatar)"*
+- Exception: test title explicitly says it's about modal behavior (`"...button text updates"`, `"...strategy selector disappears"`) — declare the modal-only scope in the title
+- **Blocking:** any outcome test that stops at the action button → fix before proceeding
+
+### Parameterization check ([testing-strategy.md § 5.5](testing-strategy.md))
+- Group tests by identical first 2 steps
+- If 3+ in a group differ only by input data → flag as parameterization candidates
+- **Blocking:** if parameterization candidates exist → consolidate before proceeding
+
+### Parameterized title check ([test-case-format.md § Parameterized title rule](test-case-format.md))
+- For every test that has an `<!-- example -->` block with parameter columns, scan the test title for `${col_name}` syntax referencing at least one column from that example block
+- Flag tests whose title is missing all `${col}` placeholders: *"t.7 has example block with columns [sub_status, hotkey] but title uses no `${}` — each row would render with an identical title"*
+- **Blocking:** any parameterized test without `${col}` in the title → fix before proceeding
+
+---
+
+## 6b. Implementation Leakage
+
+**Blocking** — anything on this list must be stripped from test titles, Steps, and `_Expected_:` before proceeding. These belong in `ui-elements.md` (LLM cross-check context) only, never in the generated test case body. Full anti-pattern table: [test-case-format.md § Anti-patterns in test case bodies](test-case-format.md).
+
+Run these greps across the generated MD file(s) and flag every hit:
+
+| Grep pattern | Leaks |
+|---|---|
+| `\b(bg-\|text-\|ring-\|hover:\|focus:)[a-z0-9-]+` | Tailwind-style CSS class names |
+| `\bmd-icon-[a-z-]+` | MDI icon class names |
+| `\.[a-z-]+-icon\|\.keyboard-shortcut-[a-z-]+` | Class-selector leak |
+| `data-test-id\|aria-describedby\|aria-labelledby` | Selector / automation-flavored attributes |
+| `\(AC-[0-9]+\)` or `\bAC-[0-9]+\b` inside Steps or `_Expected_:` lines | Inline AC refs (they belong in `source:` frontmatter only) |
+| `<script[ >]\|<img[^>]*onerror\|<iframe[^>]*onload` inside Steps or `_Expected_:` | Raw exploit payloads pasted into tester instructions |
+
+**How to fix:** replace with semantic descriptions (tooltip text, visual state prose, role/position). See the mapping table in `test-case-format.md`.
+
+**Note:** AC refs in the `<!-- test -->` frontmatter `source:` field are **correct** — the grep must target Step/Expected lines only, not the frontmatter.
+
+---
+
+## 7. Automation Readiness
+
+- Count tests by `automation` field: candidate, deferred, manual-only
+- Flag missing classification
+- Report distribution: *"Automation: candidate 45 (55%), deferred 25 (31%), manual-only 11 (14%)"*
+- If >30% manual-only → flag for review
+
+---
+
+## 8. Sub-suite Distribution
+
+**Single source of truth for the rule. All other mentions in SKILL.md / test-case-format.md / subagents must be stubs that link here.**
+
+**Why this exists:** testers reported flat suites are unnavigable — even at 6-10 tests, mixed CRUD + Permissions + Edge Cases without structure forces them to scan the whole list. The trigger is whether **logical sections exist**, not how many tests there are.
+
+### Decision matrix (fully automatic, no user prompt)
+
+| Natural sections found | Section sizes (after auto-merge) | Output |
+|---|---|---|
+| ≥ 2 sections | every section ≥ 3 tests | **Nested** — directory with one MD file per section (**MANDATORY**) |
+| ≥ 2 sections | one or more sections < 3 tests | **Auto-merge** undersized into nearest semantic neighbour, re-apply matrix |
+| 0-1 sections | — | **Flat** — single MD file |
+
+**No user questions on borderline cases.** Auto-merge silently (e.g. "Edge Cases: 2" + "Boundaries: 4" → "Boundaries & Edge Cases: 6"). Phase 4 approval gate is the user's chance to push back. Total test count is **NOT** part of the decision.
+
+### What counts as a "natural section"
+
+A coherent slice of the sub-feature that a tester would search for as one unit. Section boundaries must come from AC list / UI catalog, not arbitrary splits. Common patterns:
+
+- **CRUD** — Create, Read, Update, Delete of the primary entity
+- **Search & Filter** — query, facet, sort
+- **Permissions** — role-based access variants
+- **Validation** — form field errors, constraint violations
+- **Edge Cases** — off-nominal data, race conditions
+- **Multi-environment / Multi-user** — structural modifiers from destructuring cross-cutting concerns
+- **Workflow stage** — Draft / Active / Finished / Archived
+
+### Mechanics
+
+1. **Identify natural sections** by scanning:
+   - AC list themes — group ACs that describe the same surface (e.g. AC-1..AC-5 about creation = "CRUD" section)
+   - UI catalog screens — different screens often = different sections
+   - Test titles — verbs/nouns cluster (e.g. "Filter by...", "Search for..." → Search & Filter)
+2. **Count per section:** how many tests fall into each
+3. **Auto-merge undersized sections** (< 3 tests) into the closest semantically related section — silently, no user prompt
+4. **Apply matrix above** — fully automatic
+
+### Why a directory of MD files (not one MD with multiple `<!-- suite -->` blocks)
+
+The `check-tests` parser uses a single `currentSuite` variable — multiple `<!-- suite -->` blocks in one file overwrite each other (real parser bug). A directory becomes a parent suite in Testomat; each file inside becomes a child suite nested under it. This is the **only** working approach for nesting.
+
+### What to flag
+
+- *"Suite has 8 tests in flat MD but groups into 2 natural sections (CRUD: 5, Permissions: 3) — must be nested."*
+- *"Suite has 22 tests in flat MD but groups into 4 natural sections (CRUD: 8, Search: 6, Permissions: 5, Boundaries: 3) — must be nested."*
+- *"Suite is nested with 4 sections but 'Edge Cases' has only 2 tests — auto-merge into Boundaries during regeneration."*
+
+### Anti-patterns
+
+- Splitting alphabetically (`A-M`, `N-Z`) — sections must be semantic, not lexical
+- Splitting by priority or automation status — those are tag dimensions, not suite structure
+- Forcing 5 sections when 3 fit naturally — symmetry over clarity is wrong
+- Asking the user on borderline cases — auto-merge undersized sections instead
+
+### Blocking
+
+- Any suite with ≥ 2 natural sections of ≥ 3 tests in flat layout → blocking, must be re-laid-out as nested before publishing
+- Any nested layout with a section < 3 tests → blocking, must auto-merge during regeneration
+
+### How this links to Phase 1
+
+Phase 1 Grouping pass writes a 1-line decision header at the top of the checklist:
+```
+Output: nested | sections: 3 (CRUD: 5, Permissions: 4, Validation: 3)
+```
+or
+```
+Output: flat | reason: only 1 natural section
+```
+Phase 3 cross-checks this header against the actual MD layout produced in Phase 2 — mismatch is a violation.
+
+---
+
+## Output Format
+
+```
+Coverage Self-Review:
+
+UI elements: 12/14 buttons, 3/3 toggles, 5/7 toasts
+  ! Unused: **'Import'**, **'Export'**
+AC coverage: 8/10 ACs covered
+  ! AC-4 NOT COVERED
+
+Cross-cutting: 2/2 concerns covered (Multi-env: t.15, Multi-user: t.8)
+  — or —
+Cross-cutting: 1/2 concerns covered
+  ! "Multi-environment" affects this sub-feature but has 0 dedicated tests
+
+Scope compliance: OK
+
+Balance: happy 45%, negative 25%, boundary 15%, state 10%, role 5% — OK
+Priority: critical 2, high 7, normal 9, low 2 — OK
+
+Step quality: atomic 20/20, observable 19/20, independent 20/20, min-steps 18/20
+  ! t.14 "Label is saved" → rewrite as observable
+  ! t.12 has 1 step — merge with t.11
+
+E2E: 18/20 OK
+  ! t.3 "Creator is auto-assigned" → merge into t.1
+Parameterization: OK
+
+Automation: candidate 15 (75%), deferred 3 (15%), manual-only 2 (10%) — OK
+
+Suggestions:
+  - Add test for AC-4
+  - Rewrite t.14 expected result
+  - Consolidate t.3 into t.1
+```
diff --git a/skills/create-test-cases/references/test-case-format.md b/skills/create-test-cases/references/test-case-format.md
new file mode 100644
index 0000000..2f58ae6
--- /dev/null
+++ b/skills/create-test-cases/references/test-case-format.md
@@ -0,0 +1,322 @@
+# Testomat.io Classical Tests Markdown Format
+
+Native format for import into Testomat.io. **NEVER generate IDs** (`id: @S...` / `id: @T...`) — they are server-generated.
+
+---
+
+## Hierarchy: Folder → Suite → Test
+
+| Level | Testomat Entity | MD Representation | Example |
+|-------|----------------|-------------------|---------|
+| Feature | Folder | `# Folder Title` (parent: null via API) | Manual Test Execution |
+| Sub-feature | Suite | `# Suite Title` (parent: folder via API) | Environment Configuration |
+| Test case | Test | `## Test Title` (inside suite) | Select custom environment for a run |
+
+**Per-suite MD file:** each file contains ONE suite (sub-feature) with its tests. The folder (feature) is created via API during publishing — it does not appear in the MD file.
+
+**File naming:**
+- **Flat suite:** `test-cases/{feature-slug}/{suite-slug}-test-cases.md`
+- **Nested suites:** `test-cases/{feature-slug}/{suite-slug}/{section-slug}.md` (one file per section — see § Nested Suites below)
+
+---
+
+## Document Structure
+
+```
+<!-- suite
+key: value
+-->
+# Suite Title
+
+Suite description (optional)
+
+<!-- test
+key: value
+-->
+## Test Title
+
+Test description (optional)
+
+## Steps
+
+- Step action
+  _Expected_: Observable result
+```
+
+---
+
+## Suite Metadata (`<!-- suite ... -->`)
+
+| Field | Description |
+|-------|-------------|
+| `emoji` | Emoji for the suite (e.g. `🔐`) |
+| `assignee` | User email |
+
+**Do NOT include `tags:` or `labels:` in suite metadata — they break `check-tests push` import.**
+
+## Test Metadata (`<!-- test ... -->`)
+
+| Field | Description | Parsed by push? |
+|-------|-------------|-----------------|
+| `type` | `manual` or `automated` | No (ignored) |
+| `priority` | `low`, `normal`, `important`, `high`, `critical` | **Yes** |
+| `assignee` | User email | No (ignored) |
+| `source` | Comma-separated AC refs (e.g. `AC-1, AC-3`) or `exploratory`. Required by create-test-cases skill. | No (skill-only) |
+| `automation` | `candidate` (can automate now), `deferred` (needs infrastructure first), or `manual-only` (keep manual). Required by create-test-cases skill. | No (skill-only) |
+| `automation-note` | Short reason for the classification (e.g. `"straightforward form fill"`, `"requires multi-user setup"`, `"visual comparison needed"`). Required when `automation` is `deferred` or `manual-only`. | No (skill-only) |
+
+**Do NOT include `tags:` or `labels:` in test metadata — they break `check-tests push` import.**
+Tags go in the test title: `## Test title @tag1 @tag2`. Labels are applied after push via API v2.
+
+### Example with source and automation fields
+
+```markdown
+<!-- test
+type: manual
+priority: high
+source: AC-2, AC-3
+automation: candidate
+-->
+## Cannot create label with duplicate name
+
+<!-- test
+type: manual
+priority: normal
+source: AC-51
+automation: manual-only
+automation-note: requires 2 concurrent users to verify distribution
+-->
+## Verify auto-assign distributes tests between assigned users
+```
+
+---
+
+## Steps Format
+
+```markdown
+## Steps
+
+- Navigate to the login page
+  _Expected_: Login form is displayed
+- Enter valid credentials and click **'Login'**
+  _Expected_: User is redirected to the dashboard
+```
+
+Rules:
+- Steps under `## Steps` heading
+- Bulleted list (`-` or `*`)
+- `_Expected_:` for expected results (italic)
+- Expected results should be specific and verifiable
+
+### Multi-bullet Expected (preferred when there are multiple observable results)
+
+When a single step has more than one observable outcome, break the `_Expected_:` into indented sub-bullets — one checkable assertion per bullet. A wall of comma-separated facts is hard to read and harder to verify.
+
+```markdown
+- Click the **'PASSED'** button
+  _Expected_:
+  - Button shows active state
+  - Header counter "Passed 1" increments; "Pending N-1" decrements
+  - Three sub-status chips appear: "Expected behaviour", "Management decision", "Minor issue"
+  - Attachment zone becomes visible below the Result message textbox
+  - Result message textbox is visible with placeholder "Result message"
+```
+
+One-liner Expected is fine when there is truly one observation. As soon as a second "and …" sneaks in, switch to bullets.
+
+---
+
+## Anti-patterns in test case bodies
+
+Never leak implementation details into `_Expected_`, Steps, or titles. These come from the `ui-elements.md` catalog and belong there **only** as LLM cross-check context for the ui-validator subagent — they are NEVER copied into published test cases.
+
+| Forbidden in TC body | Why | Use instead |
+|---|---|---|
+| CSS class names (`bg-green-500`, `text-green-800`, `ring-green-500`, `.keyboard-shortcut-icon`) | Manual testers don't read CSS; tests become fragile to redesign | Describe the visual state in prose: "Button shows active state (filled green background with dark text)" |
+| Icon class names (`md-icon-timer`, `md-icon-dots-horizontal`, `md-icon-file-tree-outline`) | Same — implementation detail | Reference icon by its role or tooltip: "timer icon", "extra-options menu", "tree view toggle" |
+| `data-test-id`, `aria-describedby`, `<li>` tag names, selectors | Locator-flavored; for automation, not manual | Describe the element semantically: "tooltip popper", "history list row" |
+| Inline AC refs (`(AC-4)`, `URL pattern asserts AC-10`) | AC traceability belongs in `<!-- test -->` frontmatter `source:` only | Keep `source: AC-4, AC-10` in the test header; strip inline mentions |
+| Raw XSS / SQLi payloads (`<script>alert('xss')</script>`, `<img src=x onerror=alert(1)>`) | Noise in a manual checklist; testers can't visually parse escaping from raw markup | Describe in prose: "a message containing HTML/script markup (e.g. a `<script>` tag, an `<img>` tag with `onerror`, and a `<b>` tag)" |
+
+If the `ui-elements.md` catalog is the **only** way to identify an element, treat that as a signal to describe it by proximity / label / tooltip instead.
+
+---
+
+## Parameterized title rule
+
+If a test has an `<!-- example -->` block with parameter columns, the title **MUST** reference at least one column using `${col_name}` syntax. Otherwise every row renders with the same title in Testomat.io and becomes indistinguishable.
+
+```markdown
+## Mark test as PASSED with each ${sub_status} and result message  ← correct
+
+<!-- example -->
+
+| sub_status |
+|---|
+| Expected behaviour |
+| Management decision |
+| Minor issue |
+```
+
+Anti-pattern (do not do):
+
+```markdown
+## Mark test as PASSED with each sub-status and result message     ← title has no ${}
+```
+
+Multi-column example blocks: at least one `${col}` placeholder in the title; referencing two is fine when both are meaningful (e.g. `## Hotkey marks test as ${status} via ${hotkey}`).
+
+---
+
+## E2E Format (mandatory)
+
+Every test = complete user journey: `Navigate → Act → Verify → (next action → Verify) → Final outcome`. No standalone static UI checks. If 3+ items share the same entry point → consolidate into one test.
+
+**Full rules, anti-patterns, and examples:** [testing-strategy.md § 1.3](testing-strategy.md)
+
+---
+
+## Parameterized Tests
+
+```markdown
+<!-- test
+type: manual
+priority: high
+-->
+## User can login as ${role}
+
+## Steps
+
+- Login as ${role} user
+  _Expected_: Dashboard is displayed
+
+<!-- example -->
+
+| role |
+| --- |
+| admin |
+| user |
+```
+
+---
+
+## Full Example
+
+```markdown
+<!-- suite
+emoji: 🔐
+-->
+# Login Functionality
+
+<!-- test
+priority: high
+source: AC-1, AC-2
+automation: candidate
+-->
+## Successful login with valid credentials @smoke @critical
+
+A user should be able to log in with valid credentials.
+
+## Steps
+
+- Navigate to the login page
+  _Expected_: Login form is displayed with username and password fields
+- Enter a valid username and password
+  _Expected_: Credentials are entered without errors
+- Click the **'Login'** button
+  _Expected_: User is redirected to the dashboard
+
+<!-- test
+priority: normal
+source: AC-3
+automation: candidate
+-->
+## Login with invalid password shows error @negative
+
+## Steps
+
+- Navigate to the login page
+  _Expected_: Login form is displayed
+- Enter valid username and invalid password
+  _Expected_: Credentials are entered
+- Click the **'Login'** button
+  _Expected_: Error message "Invalid email or password" is displayed. User stays on login page
+```
+
+---
+
+## Tags
+
+- **Always in title:** `@tag` format (e.g. `## Test @smoke @regression`)
+- **NEVER in `<!-- test -->` header** — `tags:` field breaks `check-tests push` import (server 422 error)
+- Tags in titles must NOT include suite/test IDs
+
+## Nested Suites (file format)
+
+**When to nest** (decision matrix, auto-merge, anti-patterns, "what counts as a natural section"): [self-review-checks.md § 8](self-review-checks.md#8-sub-suite-distribution) — single source of truth. This section only covers **how to lay the files out** once the decision is "nested".
+
+**Only working approach:** directory structure. Multiple `<!-- suite -->` blocks in one file do not nest (parser bug).
+
+### File structure
+
+```
+test-cases/{feature-slug}/{suite-slug}/
+├── {section-1-slug}.md      # Child suite 1
+├── {section-2-slug}.md      # Child suite 2
+└── {section-3-slug}.md      # Child suite 3
+```
+
+### Each section file — standard format
+
+Each file is a normal single-suite MD file with `<!-- suite -->` + `# heading` + tests:
+
+```markdown
+<!-- suite -->
+# Runs List
+
+<!-- test
+priority: high
+source: AC-1
+automation: candidate
+-->
+## Filter runs by Manual tab and verify run entries @smoke
+
+## Steps
+
+- Navigate to the Runs page
+  _Expected_: Runs list is displayed with tabs
+- Click the **'Manual'** tab
+  _Expected_: Only manual runs are shown in the list
+```
+
+### How it maps to Testomat.io
+
+| File system | Testomat entity |
+|-------------|-----------------|
+| `{suite-slug}/` directory | Parent suite (folder) |
+| `{section-slug}.md` file | Child suite nested inside parent |
+| `## Test` inside file | Test inside child suite |
+
+### Publishing
+
+Publishing is owned by `/publish-test-cases-batch`. It copies the nested-suite directory into a temp git dir that mirrors this structure:
+
+```
+/tmp/tc-publish-{suite-slug}/
+└── {suite-slug}/
+    ├── section-1.md
+    └── section-2.md
+```
+
+`check-tests push` creates: parent suite `{suite-slug}` → child suites `Section 1`, `Section 2` nested inside.
+
+### Quality checks apply per parent
+
+All coverage thresholds (scenario balance, priority pyramid, AC coverage) are calculated across **all section files combined** — the parent suite is the unit of coverage, not individual sections.
+
+---
+
+## Labels
+
+- **NEVER in `<!-- test -->` or `<!-- suite -->` header** — `labels:` field breaks `check-tests push` import (server 422 error)
+- Labels are applied after publishing via API v2 `link` parameter (owned by `/publish-test-cases-batch` — see its `references/api-reference.md`)
diff --git a/skills/create-test-cases/references/testing-strategy.md b/skills/create-test-cases/references/testing-strategy.md
new file mode 100644
index 0000000..54141f5
--- /dev/null
+++ b/skills/create-test-cases/references/testing-strategy.md
@@ -0,0 +1,452 @@
+# Testing Strategy & Reference Data
+
+Reference guide for designing high-quality manual test cases. Covers what to test, how to write steps, and all reference data (tags, labels, roles).
+
+## Section map (anchor-based — stable across edits)
+
+Extract ONE section via a single `awk` call — never Read the whole file:
+
+```bash
+awk '/^START_PATTERN/,/^END_PATTERN/' .claude/skills/create-test-cases/references/testing-strategy.md
+```
+
+| § | Topic | START pattern | END pattern (exclusive) |
+|---|---|---|---|
+| 1 | Test Design Principles (1.1-1.6) | `^## 1\. Test Design` | `^## 2\. ` |
+| 1.3 | E2E Format (mandatory) | `^### 1\.3 E2E Format` | `^### 1\.4 ` |
+| 1.6 | Cross-Screen Outcome Verification | `^### 1\.6 Cross-Screen` | `^## 2\. ` |
+| 2 | Coverage Approach | `^## 2\. Coverage` | `^## 3\. ` |
+| 2.3 | Scenario Balance Thresholds | `^### 2\.3 Scenario Balance` | `^### 2\.4 ` |
+| 3 | Step Writing Rules | `^## 3\. Step Writing` | `^## 4\. ` |
+| 4 | Naming Conventions | `^## 4\. Naming` | `^## 5\. ` |
+| 5 | Scenario Design Patterns (5.1-5.8) | `^## 5\. Scenario Design` | `^## 6\. ` |
+| 5.5 | Parameterized Pattern | `^### 5\.5 Parameterized` | `^### 5\.6 ` |
+| 5.6 | Negative Test Pattern | `^### 5\.6 Negative` | `^### 5\.7 ` |
+| 6 | Common Pitfalls | `^## 6\. Common Pitfalls` | `^## 8\. ` |
+| 8 | Priority Matrix | `^## 8\. Priority Matrix` | `^$` (end of file — use `,0` or drop END pattern) |
+
+**Last-section trick:** for sections that extend to end-of-file, use `awk '/^## 8\. /{p=1} p'` instead of the range pattern.
+
+---
+
+## 1. Test Design Principles
+
+### 1.1 One test = one scenario
+
+Each test case verifies **one specific behavior**. If a test title contains "and" — consider splitting.
+
+- **Good:** "Finish run updates status to Passed when all tests passed"
+- **Bad:** "Finish run and check status and verify report"
+
+### 1.2 Independence
+
+Each test must be executable **independently** — no dependency on another test's outcome.
+
+- **Data preconditions** ("At least one label exists") go in the **Preconditions** section — specify how to create or reference setup
+- **State preconditions** ("User is on the Labels page") become the first steps of the test
+- If test B logically follows test A (e.g., "Edit label" after "Create label") — test B must include creation in its own preconditions or first steps. Never assume test A ran first
+- Shared test data (e.g., a label used by multiple tests) → describe in preconditions of each test that uses it
+
+### 1.3 E2E Format (mandatory)
+
+This project is an **E2E automation** project. Every test case must be a complete **end-to-end scenario** — a user journey from entry point to verifiable outcome. Tests describe what a user **does**, not what the UI **looks like**.
+
+**The pattern:** Navigate → Act → Verify → (next action → Verify) → Final outcome
+
+**Rules:**
+- Every test MUST start with a user navigation or action, never with "Verify..." or "Check..."
+- Every test MUST end with a verifiable outcome (toast, page state, data visible)
+- Static UI checks ("X is visible by default", "Y is active by default") are NOT standalone tests — they become verification steps inside a larger E2E scenario
+- If 3+ tests in a suite all verify different elements on the same screen after the same action → **consolidate** them into one E2E scenario with multiple verification steps
+- Each test must answer: **"What does the user DO, and what HAPPENS as a result?"**
+- **Minimum 3 steps per test** — every standalone test must have at least 3 steps (navigate → act → verify). A test with 1-2 steps is either too granular (merge with related test) or missing navigation context (add entry-point steps). Exception: individual rows in a parameterized test may have fewer steps since navigation is shared
+
+**Good (E2E):**
+```
+## Verify New Manual Run form default state after opening from Runs page
+
+Steps:
+- Navigate to the **'Runs'** page
+  _Expected_: Runs page is displayed with the **'Manual Run'** button visible
+- Click the **'Manual Run'** button
+  _Expected_: New Manual Run sidebar opens with **'New Manual Run'** header and `manual` badge
+- Inspect the form controls
+  _Expected_: **'Title'** input, **'Rungroup'** dropdown, **'Environment'** list, **'Description'** textarea and **'Run as checklist'** toggle are all visible. Creator is shown as `as manager`. **'All tests'** tab is active with read-only tree. **'Launch'**, **'Save'**, **'Cancel'** buttons are visible at the bottom
+```
+
+**Bad (isolated UI state checks — NOT E2E):**
+```
+## All configuration controls are visible by default          ← no user action
+## Form opens as a sidebar with header and manual badge       ← no journey
+## Creator is auto-assigned as manager by default             ← static assertion
+## All tests tab is active by default                         ← static assertion
+## Launch, Save and Cancel buttons are visible at the bottom  ← static assertion
+```
+These 5 separate "tests" should be **one E2E scenario** (as shown in the Good example above).
+
+**Consolidation heuristic:** If removing one test from a group doesn't change the entry point, setup, or first N steps of the remaining tests — they belong together as one scenario.
+
+See also § 1.6 — an E2E test must follow through to the **resulting screen** when the action has a downstream effect.
+
+### 1.4 Reproducibility
+
+Steps must be precise enough that **any team member** can execute them and get the same result. No "check that everything works" — specify what exactly to check.
+
+### 1.5 Real UI language
+
+Use **exact UI element names** as they appear in the app — bold with `**`. Never paraphrase button text.
+
+- **Good:** Click the **'Run Automated as Manual'** toggle
+- **Bad:** Enable the automated tests toggle
+
+### 1.6 Cross-Screen Outcome Verification (mandatory)
+
+When an action produces an effect **visible on another screen** — Launch → execution page, Save → detail page, Delete → list view, Assign → test tree — the test MUST continue to that screen and verify the effect with a **concrete, observable** check (specific element + specific value).
+
+Stopping at the modal/form after clicking the action button is only acceptable when the test is **explicitly about the modal's internal behavior** — e.g., "Strategy button text updates when toggling options". Such tests should be the minority; make the modal-only scope obvious in the title.
+
+**Concrete verification** means naming the element AND the value you expect to see — not vague descriptions.
+
+**Bad (outcome test that stops at the modal):**
+```
+## Launch run with Randomly Distribute strategy and verify distribution
+
+- Click 'Launch' button
+  _Expected_: Execution page opens. Tests are approximately equally distributed
+              among team members.
+```
+The test claims to verify distribution but the result is vague (`approximately`) and never names a specific test or user. Distribution could be 100/0 and still "pass" this check.
+
+**Good (outcome test that follows through with concrete checks):**
+```
+## Launch run with Randomly Distribute strategy and verify distribution
+
+- Click 'Launch' button
+  _Expected_: Execution page opens with test tree visible and all tests in Pending status
+- Inspect the test tree
+  _Expected_: Every assigned non-manager user has at least 1 test. Test 'Login flow'
+              shows the avatar of one of the assigned users. The run creator
+              (manager) is NOT shown as assignee on any test
+- Navigate to the run detail page
+  _Expected_: 'Assigned to' metadata row shows all 3 users (creator + 2 selected)
+              with avatars and names
+```
+
+**Rules:**
+- If the test's `source` references an AC with outcome verbs (`applied`, `distributed`, `assigned`, `saved`, `created`, `deleted`, `updated`) → the test MUST continue past the action button to the resulting screen
+- Verification must name the element AND the value — ban `approximately`, `correctly`, `appropriately`, `properly`, `as expected`, `works fine` (see § 3.3)
+- When the outcome is visible on two screens (e.g., execution page + run detail), pick the stronger one or verify both — vague coverage is the problem, not scope
+
+---
+
+## 2. Coverage Approach
+
+### 2.1 Test Types
+
+| Category | Description | Example |
+|----------|-------------|---------|
+| **Happy path** | Main flow, expected input, successful outcome | Create a run with all tests, finish it |
+| **Alternative path** | Valid but non-default flow | Create a run with test plan instead of all tests |
+| **Negative** | Invalid input, error handling | Launch run without selecting any tests |
+| **Boundary** | Edge values: 0, 1, max, empty | Run with 1000+ tests, empty title |
+| **State transitions** | Moving between states | Pending → Passed → re-mark as Failed |
+| **Permissions/roles** | Different user roles | Read-only user cannot finish a run |
+| **Data persistence** | Data survives refresh/navigation | Reload page — run status preserved |
+| **Bulk operations** | Multi-select + batch action | Select 5 tests → bulk delete → all removed |
+| **Cross-feature** | Action in feature A affects feature B | Delete label → tests with that label update |
+| **Recovery** | Behavior after errors/interruptions | Refresh during save, back button after submit |
+
+### 2.2 Mandatory Coverage Checklist
+
+- [ ] All entry points to the feature
+- [ ] All tabs/modes
+- [ ] All interactive elements from the UI catalog (buttons, links, menus)
+- [ ] All form fields: valid, empty, boundary values
+- [ ] All dropdown/menu items exercised at least once
+- [ ] All toggle states: on → off, off → on
+- [ ] Success messages / toasts — exact text verified
+- [ ] Error messages — most common ones
+- [ ] Loading states and empty states
+- [ ] Data persistence — refresh page, navigate away and back
+- [ ] Bulk operations — if the feature supports multi-select
+- [ ] Search/filter — if the feature has search or filter controls
+- [ ] Role-based access — at minimum: owner (happy) + read-only (denied). See [product-context.md](product-context.md)
+- [ ] Post-action effects verified on the **downstream screen** (not just the modal/form) — see § 1.6
+
+### 2.3 Scenario Balance Thresholds (mandatory)
+
+The generated test suite MUST meet these minimum thresholds:
+
+| Type | Minimum % | If below |
+|------|-----------|----------|
+| Negative | ≥ 20% | Add error/validation/permission-denied scenarios |
+| Boundary | ≥ 10% | Add min/max/empty/overflow/special-char tests |
+| Happy path | ≤ 50% | Too many happy paths — convert some to parameterized |
+
+These thresholds are checked in Step 2 Phase 3 (built-in quality checks). Violations are **blocking** — fix during generation before presenting to user.
+
+### 2.4 What NOT to Test (in manual tests)
+
+- API internals (use API tests)
+- CSS pixel-perfect layout (use visual regression)
+- Performance benchmarks (use load tests)
+- Third-party integrations deeply (mock boundary, verify connection)
+
+---
+
+## 3. Step Writing Rules
+
+### 3.1 Atomic Steps
+
+Each step = **one user action** + **one expected result**.
+
+```
+- Click the **'Finish Run'** button
+  _Expected_: Confirmation dialog appears asking "Are you sure?"
+```
+
+**Bad (compound):** "Click Finish Run and confirm the dialog"
+
+### 3.2 Expected Result Formatting
+
+- Single outcome → plain text
+- **Multiple checkpoints** → list as sentences separated by periods: `First thing. Second thing. Third thing`
+- Do NOT number checkpoints (1. 2. 3.) — Testomat renders them poorly
+
+### 3.3 Observable Expected Results
+
+| Type | Good | Bad |
+|------|------|-----|
+| Visual | "Toast 'Run saved' appears" | "Run is saved" |
+| Counter | "Passed counter shows '5'" | "Counter updates" |
+| Navigation | "Run execution page is displayed with title 'My Run'" | "Page opens" |
+| State | "Toggle changes to active (blue)" | "Toggle is enabled" |
+| Absence | "Delete button is not visible" | "Feature is disabled" |
+| List/table | "List shows 5 items sorted by date" | "List is correct" |
+| Form state | "**'Save'** button becomes enabled" | "Form is valid" |
+| URL/redirect | "URL changes to `/runs/{id}`" | "Navigated to run page" |
+| Outcome on next screen | "Execution page: Test 'Login flow' shows avatar of User A" | "Tests are approximately distributed" |
+
+**Banned vague words.** The following are never acceptable in `_Expected_:` — they are not observable, cannot be verified objectively:
+
+`approximately`, `correctly`, `appropriately`, `properly`, `as expected`, `works fine`, `is saved`, `is correct`, `works`, `should work`, `updates`
+
+Replace every occurrence with a concrete check: element + value. See § 1.6 for the full cross-screen rule.
+
+### 3.4 Step Format
+
+- Bullet list (`-` or `*`), not numbered
+- `_Expected_:` after each step (italic — Testomat import format)
+- Use exact UI element names (bold) from the UI element catalog
+- Include waits for loading states
+- Check `steps_search` for reusable shared steps
+
+### 3.5 Preconditions
+
+Two categories — always include both:
+
+**Data preconditions** — what must exist before the test starts:
+- "At least one label exists in the project."
+- "A test plan with 3+ tests exists."
+
+**State preconditions** — who the user is and where they are:
+- "User is logged in as owner." (always specify exact role, never just "user")
+- "User is on the Runs page."
+
+**Rules:**
+- If data must exist → specify how to create it or state it as given
+- Always specify exact role name from [product-context.md](product-context.md): `owner`, `manager`, `QA`, `read-only user`
+- Default role: `owner` — use when role doesn't matter for the test
+- Role-based tests: write separate preconditions per role
+
+**Good:** "User is logged in as owner. At least one test plan with 3+ tests exists in the project."
+**Bad:** "User is ready to test." / "User is logged in." (which role?)
+
+---
+
+## 4. Naming Conventions
+
+Descriptive phrases, no "TC-NNN" prefix. Patterns:
+- `{action} {object} {context}` — `Login with invalid credential`
+- `{object} is {state} when {condition}` — `Folder is searchable after renaming`
+- Parameterized: `${variable}` syntax — `User can create ${project type} project`
+
+Structure: `Feature Folder (@domain-tag) → Sub-feature Suite → Tests` (see SKILL.md Hierarchy Model)
+
+---
+
+## 5. Scenario Design Patterns
+
+### 5.1 CRUD Pattern
+
+For any entity (run, plan, suite, test), cover the full lifecycle:
+
+1. **Create** — with minimum fields, with all fields
+2. **Read** — view details, list view, search/filter
+3. **Update** — edit title, change status, modify fields
+4. **Delete** — single delete, bulk delete, undo if available
+
+### 5.2 State Machine Pattern
+
+Map transitions and test each valid + at least one invalid:
+
+```
+[Saved] → Launch → [In Progress] → Finish → [Finished]
+                                  → Abort → [Aborted]
+[Finished] → Relaunch → [In Progress]
+```
+
+### 5.3 Role-Based Pattern
+
+| Action | Owner | Manager | QA | Read-only |
+|--------|-------|---------|-----|-----------|
+| Create run | yes | yes | yes | no |
+| Finish run | yes | yes | yes | no |
+| Delete run | yes | yes | no | no |
+| View run | yes | yes | yes | yes |
+
+### 5.4 Multi-Entity Interaction Pattern
+
+When a feature involves **multiple related entities** (e.g., run group with environments, plan with test selections, label assigned to tests):
+
+1. **Create** the parent with N children/related items
+2. **Modify** children — verify parent reflects changes
+3. **Remove** a child — verify parent updates (count, display, behavior)
+4. **Delete** parent — verify children are handled (cascade delete? orphaned? error?)
+5. **Cross-feature** — action on entity A affects entity B (e.g., delete label → tests with that label update)
+
+See entity relationships in [product-context.md](product-context.md).
+
+### 5.5 Parameterized Pattern (mandatory for identical flows)
+
+**Rule:** If 3+ tests share the **same entry point and identical steps** and differ ONLY in input data or user role → they **MUST** be a single parameterized test with a data table. This is critical for automation efficiency — each separate test = a new browser session with login + navigation overhead.
+
+Use when: identical flow, different data/roles. Don't use when: different step logic or different expected results per parameter.
+
+**Mandatory parameterization triggers:**
+- 3+ tests filling the same form field with different values (e.g., title: empty, short, max-length, special chars, unicode)
+- 3+ tests performing the same action with different user roles
+- 3+ tests selecting different options from the same dropdown/tab and verifying similar outcomes
+
+**Before (5 separate tests = 5 browser sessions):**
+```
+Test: Create run with empty title          — 2 steps
+Test: Create run with short valid title    — 2 steps
+Test: Create run with max-length title     — 2 steps
+Test: Create run with special characters   — 2 steps
+Test: Create run with Unicode title        — 2 steps
+```
+
+**After (1 parameterized test = 1 browser session):**
+
+```markdown
+<!-- test
+priority: high
+source: AC-5
+automation: candidate
+-->
+## User can create run from ${source} page @runs @smoke
+
+**Preconditions:** User is logged in as owner.
+
+## Steps
+
+- Navigate to **${source}** page
+  _Expected_: ${source} page is displayed
+- Click the **'Manual Run'** button
+  _Expected_: New Manual Run sidebar is displayed
+
+<!-- example -->
+
+| source |
+|--------|
+| Runs |
+| Tests |
+| Plans |
+```
+
+### 5.6 Negative Test Pattern
+
+The strategy requires ≥ 20% negatives (§ 2.3). Systematically identify them by category:
+
+| Category | What to test | Example |
+|----------|-------------|---------|
+| **Validation** | Empty, too long, invalid chars, duplicate, special chars | Create label with 256+ chars → error |
+| **Permission** | Wrong role attempts action | Read-only user clicks **'Create'** → button not visible or action denied |
+| **State** | Action on deleted/archived/locked entity | Edit a label that was just deleted by another user |
+| **Dependency** | Delete entity that's used elsewhere | Delete label assigned to 10 tests → label removed from all tests |
+| **Constraint** | Exceed limits, violate business rules | Create 2 labels with same name → "Name already exists" error |
+| **Input boundary** | 0, 1, max, max+1, empty string, whitespace-only | Title with only spaces → treated as empty |
+
+**Rules:**
+- Every happy-path CRUD action should have at least one corresponding negative
+- Validation negatives are the easiest source — check each form field for empty + invalid
+- Permission negatives: at minimum test read-only user for each write action (see [product-context.md](product-context.md))
+
+### 5.7 Bulk Operations Pattern
+
+If the feature supports multi-select or batch actions:
+
+1. **Select** — select all, select some, deselect all, select via checkbox vs shift-click
+2. **Act** — bulk delete, bulk label, bulk status change, bulk move
+3. **Mixed validity** — bulk action on items where some succeed and some fail (e.g., bulk delete including a protected item)
+4. **Scale** — large selection (50+ items) — does the action complete? is there a progress indicator?
+5. **Cancel/undo** — cancel mid-bulk-action, undo if available
+
+**Key test:** select multiple items → perform bulk action → verify ALL items updated (not just the first/last).
+
+### 5.8 Search & Filter Pattern
+
+If the feature has search input or filter controls:
+
+1. **Text search** — exact match, partial match, case sensitivity, special chars, empty query
+2. **Filter by category** — single filter, combine 2+ filters, filter with 0 results
+3. **Clear/reset** — clear search field, reset all filters, verify full list returns
+4. **Persistence** — do filters survive page refresh? navigate away and back?
+5. **No results** — empty state displayed, helpful message, clear action available
+
+**Key test:** apply filter → verify results match → clear filter → verify full list restored.
+
+---
+
+## 6. Common Pitfalls
+
+| Pitfall | Fix |
+|---------|-----|
+| Vague steps ("Check the page") | Specify exact element and expected state |
+| Missing loading waits | Add "Wait for..." steps for async operations |
+| Hardcoded data ("Click on 'My Project'") | Use preconditions or describe how to create data |
+| Testing implementation ("Verify API returns 200") | Focus on UI-observable behavior |
+| Duplicate coverage | Search existing tests before writing new ones |
+| Overly long tests (20+ steps) | Split into focused scenarios |
+
+E2E, parameterization, balance, and priority pitfalls are enforced by their respective sections (§ 1.3, § 5.5, § 2.3, § 8) and checked mechanically in [self-review-checks.md](self-review-checks.md).
+
+---
+
+## 8. Priority Matrix
+
+Every test gets a priority using **impact × frequency**. Do not assign by gut feeling — use the matrix.
+
+|                  | High frequency (used daily) | Low frequency (used rarely) |
+|------------------|-----------------------------|------------------------------|
+| **High impact**  | `critical`                  | `high`                       |
+| **Low impact**   | `high`                      | `normal` / `low`             |
+
+**Definitions:**
+
+| Priority | When | Examples |
+|----------|------|----------|
+| `critical` | Core user flow, used daily, breaks business if broken | Login, create run, finish run, save plan |
+| `high` | Important flow used weekly, or critical edge case | Edit plan, filter tests, role-based access |
+| `normal` | Secondary flow, used occasionally | Bulk rename, export CSV, reorder folders |
+| `low` | Rare edge case, unlikely to trigger | Cancel during save, network flake recovery |
+
+**Rules:**
+- Smoke test suites = only `critical` + `high`
+- If >40% of a suite is `critical` — the criteria are too loose, recalibrate
+- **If >40% of a suite is `high` — the criteria are too loose, downgrade weekly-use and secondary flows to `normal`**
+- **Healthy pyramid:** `critical` ≤ 15%, `high` ≤ 35%, `normal` ≥ 35%, `low` ≥ 5%. Violations are flagged in Step 2 Phase 3
+- Negative / boundary tests inherit priority from the positive flow they test, minus one step (e.g., happy path is `critical` → negative is `high`)
+- New tests default to `normal` if unclear, let reviewer upgrade
+- Tag `@smoke` on every `critical` test + `high` happy-path tests — these form the smoke suite for CI
diff --git a/skills/create-test-cases/references/troubleshooting.md b/skills/create-test-cases/references/troubleshooting.md
new file mode 100644
index 0000000..7d65db4
--- /dev/null
+++ b/skills/create-test-cases/references/troubleshooting.md
@@ -0,0 +1,65 @@
+# Troubleshooting
+
+Consolidates skill-wide error handling and browser traps. Loaded on demand when an error occurs — do NOT eagerly Read.
+
+**Cross-references:**
+- Publishing-specific failures (`--sync`, Import Lock, `TESTOMATIO_PREPEND_DIR`, label 404): owned by the publisher skill (local `/publish-test-cases-batch` — not yet part of this repo; or [`sync-cases`](../../sync-cases/SKILL.md)). This skill never publishes.
+- Self-review threshold failures: [self-review-checks.md](self-review-checks.md)
+
+---
+
+## Error Handling (SKILL.md Steps 0-3)
+
+### Recoverable
+
+| Trigger | Recovery |
+|---|---|
+| **No docs found** (Step 1.1) | Proceed with UI catalog + user prompt as AC source. Label inferred ACs as `(inferred)` |
+| **No GitHub MCP** | Ask user for docs link or paste the spec |
+| **Playwright login fails** (Step 1.3 / UI reality check) | Retry once with fresh browser. Still fails → ask user to verify `$TESTOMATIO_EMAIL` / `$TESTOMATIO_PASSWORD` |
+| **Playwright navigation / timeout** | Retry with longer timeout. Page doesn't exist → ask user for correct URL |
+| **Testomat MCP errors** (Step 1.2) | Fall back to API v2 curl commands — see Testomat.io API docs or the [`sync-cases`](../../sync-cases/SKILL.md) skill |
+| **Partial run** (conversation broke mid-skill) | Resume Detector handles it — see SKILL.md § Pre-step: Resume Detector |
+
+### Blocking
+
+| Trigger | Action |
+|---|---|
+| **Feature not in UI** | Ask user for correct location — do not guess or fabricate |
+| **User rejects tests** (Step 2 Phase 4) | Return to Step 2, regenerate per feedback |
+
+---
+
+## Browser Traps (Playwright MCP)
+
+Apply these workarounds preemptively in both `ui-explorer` and `ui-validator` subagents, and in any inline Playwright fallback.
+
+### 1. Icon-only toolbar buttons
+Use `aria-describedby` → popper content to identify. Clicking by icon position alone is fragile.
+
+### 2. Delete button proximity
+Delete button is adjacent to the extra-menu button. **Always verify button identity before clicking** — grab `outerHTML` or `aria-label` first.
+
+### 3. Detail panel blocks tree clicks
+The right-side detail panel can overlay tree elements. Fixes:
+- Press `Escape` to close the detail panel before clicking tree elements, OR
+- Use `browser_evaluate` to find and click elements programmatically (bypasses pointer events)
+
+### 4. Intercepted clicks on link buttons
+Some link-styled buttons (e.g. "Manual Run") get intercepted by overlapping elements. Error message: `intercepted pointer events`.
+- If click fails → navigate directly to the target URL instead of clicking.
+
+### 5. Import progress banner blocks UI verification
+After a push, the import banner can block subsequent clicks.
+- Dismiss via: `browser_evaluate(() => document.querySelector('.import-progress')?.remove())`
+
+### 6. Project mismatch
+The `ui-explorer` and `ui-validator` subagents enforce the user-configured exploration project (e.g. `$TESTOMATIO_EXPLORATION_PROJECT`) for exploration/validation. **Never silently switch projects** — if `entry_url` points elsewhere, append a warning to the 1-line summary and let the parent decide.
+
+---
+
+## Nested Suite Parser Gotchas
+
+- Multiple `<!-- suite -->` blocks in ONE file **do not nest** — parser overwrites `currentSuite`, creates duplicates
+- `## Steps` heading can be misread as a test title if preceded by another heading
+- Directory structure is the only working approach — tested 2026-04-15
diff --git a/skills/create-test-cases/references/ui-catalog-format.md b/skills/create-test-cases/references/ui-catalog-format.md
new file mode 100644
index 0000000..8bcd614
--- /dev/null
+++ b/skills/create-test-cases/references/ui-catalog-format.md
@@ -0,0 +1,87 @@
+# UI Element Catalog Format
+
+Under the feature-first flow there are TWO catalogs per feature, both produced by the `ui-explorer` subagent:
+
+| File | Scope | Producer | ui-explorer mode |
+|---|---|---|---|
+| `test-cases/{F}/_shared-ui.md` | **Feature-baseline:** shared chrome reused across every sub-feature (top nav, breadcrumbs, sidebar, shared modals, bulk bars, global toasts) | Step 1.1 | `feature-baseline` |
+| `test-cases/{F}/{S}-ui-delta.md` | **Sub-feature delta:** ONLY elements specific to this sub-feature that are NOT already in `_shared-ui.md` | Step 2.2 | `sub-feature-delta` |
+
+Step 3 checklist, Phase 3 element-name check, and ui-validator all read the combined pool (shared ∪ delta). Never duplicate a shared element into the delta file — treat `_shared-ui.md` as read-only once Step 1 is approved.
+
+Both catalogs follow the same section schema below. The delta file additionally carries `references: _shared-ui.md` and `scope: sub-feature-delta` in its frontmatter so the reviewer can confirm it layers on top of the baseline (see [artifacts.md § A-shared-ui and § A4](artifacts.md)).
+
+> **Scope reminder — LLM cross-check context only.**
+> CSS class names, icon class names (`md-icon-*`), `data-test-id`, `aria-describedby`, raw HTML tags, and any other selector-flavored detail captured here exist so the ui-validator subagent and the generator can identify elements during exploration and validation. **They MUST NEVER be copied into generated test cases** (titles, Steps, `_Expected_:` lines). Test cases describe elements semantically — by label, tooltip, role, or visual state. See [test-case-format.md § Anti-patterns in test case bodies](test-case-format.md) for the full rule + rewrite table.
+
+## Schema
+
+```markdown
+# UI Element Catalog: {Feature Name}
+
+**Last updated:** YYYY-MM-DD
+**Env:** prod
+**Collected by:** Playwright MCP
+
+## Screen: {Screen Name}
+
+**URL:** /projects/{project-slug}/{path}
+**Entry points:** from {where} via {action}
+
+### Buttons
+- **'Create Run'** — primary, opens New Manual Run sidebar
+- **'Multi-Select'** — toolbar icon button (tooltip via aria-describedby)
+- **'Delete'** — danger, requires confirmation dialog
+
+### Toggles
+- **'Run Automated as Manual'** — default off, affects test type in new run
+- **'Include archived'** — default off
+
+### Inputs
+- **Title** input — required, max 255 chars, placeholder "Run title"
+- **Description** textarea — optional, markdown supported
+
+### Tabs
+- **All**, **Passed**, **Failed**, **Skipped**, **Blocked** — on Run execution page
+
+### Dropdowns
+- **Assignee** — lists project members
+- **Priority** — low / normal / high / critical
+
+### Toasts
+- Success: `"Run has been saved"` (after Save)
+- Success: `"Run has been finished"` (after Finish Run)
+- Error: `"Title can't be blank"` (validation)
+- Error: `"Failed to save run"` (server error)
+
+### Modals / Dialogs
+- **Delete confirmation** — "Are you sure you want to delete this run?"
+- **Unsaved changes** — "You have unsaved changes. Leave anyway?"
+
+### Empty states
+- List page, no runs: message `"No runs yet. Create your first one."`
+
+### Loading states
+- List page: skeleton rows while fetching
+- Run execution: spinner on test status change
+
+## Happy-path sequence
+
+1. Navigate to **Runs** page
+2. Click **'Manual Run'** button
+3. Fill **Title** input with `"Smoke run"`
+4. (optional) Select **Plan** from dropdown
+5. Click **'Save'** button
+6. Verify toast `"Run has been saved"` appears
+7. Verify new run is visible in runs list
+```
+
+## Rules
+
+- **Every UI element referenced in a test step MUST exist in either catalog (shared ∪ delta)** — enforced by Step 3 Phase 3 element-name check
+- **Toast text in quotes** — tests assert the exact string, not paraphrase
+- **Icon-only buttons** — always grab tooltip via `aria-describedby`, never guess label from context
+- **One screen per `## Screen:` section** — if a feature spans 5 screens, the shared catalog has 5 sections; sub-feature deltas add screens only when the sub-feature introduces a screen not in the shared pool
+- **Depth-first exploration** — every interactive element (button, dropdown, filter trigger, menu) must be clicked to catalog what opens behind it. Panels, sidebars, modals, dropdown options — all cataloged as nested sub-sections under the parent element
+- **Reuse existing catalog** — Step 2.2 (ui-explorer `sub-feature-delta`) always reads `_shared-ui.md` first and writes ONLY new elements. Do NOT re-list shared chrome in the delta file. If a shared element turned out to be wrong → fix it in `_shared-ui.md`, not in the delta
+- Store any feature-wide rules (e.g., "all destructive actions require confirmation") under a `## Conventions` section at the bottom of `_shared-ui.md`
diff --git a/skills/create-test-cases/steps/00-intake.md b/skills/create-test-cases/steps/00-intake.md
new file mode 100644
index 0000000..216a2b0
--- /dev/null
+++ b/skills/create-test-cases/steps/00-intake.md
@@ -0,0 +1,41 @@
+# Step 0: Intake Questionnaire
+
+## Contract
+- **Precondition:** Resume Detector has run; `$TESTOMATIO_EMAIL` + `$TESTOMATIO_PASSWORD` env vars set
+- **Input:** $ARGUMENTS, memory, optional existing A1
+- **Output:** A1 (`test-cases/{F}/intake.md`)
+- **Postcondition:** A1 cheap-validation passes (Q1, Q1.2, Q2 rows present); `F` resolved; `S` optional (may be `pick after map` — resolved after Step 1)
+- **Idempotent:** yes — safe to re-run; "start fresh" overwrites, "edit specific" preserves unchanged rows
+- **Retry policy:** none (interactive)
+
+---
+
+Run the intake interview from [../intake-questionnaire.md](../intake-questionnaire.md) before any gathering. See [../intake-examples.md](../intake-examples.md) for filled templates.
+
+## Preflight (before Q1)
+
+Verify required env vars are set. Run once:
+```bash
+echo "email=${TESTOMATIO_EMAIL:?missing TESTOMATIO_EMAIL} pw=${TESTOMATIO_PASSWORD:+SET}"
+```
+If the shell errors on missing var → STOP and ask user to export both vars before continuing. Do NOT proceed — ui-explorer (Step 1.1 feature-baseline) will fail otherwise, wasting the work done in Step 0.
+
+## Procedure
+
+1. Read the argument and check memory for prior context
+2. If `test-cases/{feature-slug}/intake.md` exists → ask: reuse / start over / edit specific answers
+3. Run questions with smart-skip (skip what's already knowable; accept `"use defaults"` / `"за замовчуванням"` shortcut)
+4. **Q1.2:** Ask if the user has a specific sub-feature in mind for the first run (optional). If yes → record as preferred starting point. If no / `не знаю` / `pick after map` → record as `pick after map`. Either way, Step 1 (feature phase) runs first and produces the sub-feature map; the final `S` is chosen from the approved map before Step 2 starts.
+5. Save answers to `test-cases/{feature-slug}/intake.md` (template: intake-questionnaire.md Part 3)
+
+## How answers feed into later steps
+
+| Answer | Used in | Effect |
+|---|---|---|
+| Q1.1 feature location | Step 1.1 | Where ui-explorer (feature-baseline) starts its traversal |
+| Q1.2 sub-feature hint | Step 1.4 / Step 2 | Preferred starting sub-feature after the map is approved; scopes Step 2 slice once chosen |
+| Q2 coverage depth | Step 3 | Test case depth and count (smoke / balanced / regression) |
+| Q3 sources | Step 1.2 | Which docs/specs to fetch for `_ac-baseline.md` (UI exploration is always on unless user says "no UI") |
+| Q4 focus | Step 3 | Extra emphasis tests (security, validation, edge, etc.) |
+
+**Do NOT proceed to Step 1 without a saved intake file.**
diff --git a/skills/create-test-cases/steps/01-feature-phase.md b/skills/create-test-cases/steps/01-feature-phase.md
new file mode 100644
index 0000000..b76068b
--- /dev/null
+++ b/skills/create-test-cases/steps/01-feature-phase.md
@@ -0,0 +1,153 @@
+# Step 1: Feature Phase — gather once, share across every sub-feature
+
+## Contract
+- **Precondition:** A1 valid (intake complete)
+- **Input:** Q1.1 entry URL, Q3 sources, Q4 focus areas from A1
+- **Output (all feature-level):**
+  - A-shared-ui (`test-cases/{F}/_shared-ui.md`)
+  - A-base (`test-cases/{F}/_ac-baseline.md`)
+  - A-steps (`test-cases/{F}/_existing-steps.md`) — optional, skipped if Testomat MCP unavailable
+  - A2 (`test-cases/{F}/destructuring.md`) with sub-feature map + cross-cutting concerns + `- [ ]` progress tracker
+- **Postcondition:** all cheap-validations pass; user has approved the map + baselines; HARD STOP fires
+- **Idempotent:** yes — existing artifacts are reused; missing ones are produced
+- **Retry policy:** `ui-explorer` subagent failure → 1 retry, then escalate to user. Docs-fetch failure → fall back to user-provided docs or `(inferred)` label
+
+---
+
+## Purpose
+
+Heavy gather-info runs ONCE per feature. Each sub-feature phase (Step 2) then consumes the shared artifacts and adds only a thin delta. This eliminates the duplication cost that previously happened when generating test cases across multiple sub-features of a complex product.
+
+**Product knowledge — JIT load only:** Do **NOT** eagerly Read [../references/product-context.md](../references/product-context.md) here. Read it on demand, only when one of these triggers fires:
+- 1.2 produces an AC list with `(inferred)` items → load for cross-check
+- 1.4 cross-cutting analysis needs role/entity matrix
+- Step 3 Phase 2 hits a role-based precondition → load canonical role names
+
+## Substep dependency map
+
+| # | Input | Output | Tool / Owner | Depends on |
+|---|---|---|---|---|
+| 1.1 | Q1.1 entry URL | A-shared-ui + candidate sub-feature list | **ui-explorer** subagent, `mode: feature-baseline` | — |
+| 1.2 | Q3 sources, feature scope | A-base (`_ac-baseline.md`) | WebFetch / GitHub MCP / parent | — (parallel with 1.1) |
+| 1.3 | feature scope | A-steps (`_existing-steps.md`) | Testomat MCP | — (parallel with 1.1) |
+| 1.4 | candidate sub-features (1.1) + A-base (1.2) | A2 destructuring.md with sub-feature table + Cross-cutting concerns section | parent (inline) | 1.1, 1.2 |
+| 1.5 | A-shared-ui + A-base + A-steps + A2 | user-approved map | parent (inline, single presentation) | 1.1–1.4 |
+
+Run 1.1 ‖ 1.2 ‖ 1.3 in parallel when possible, then 1.4 → 1.5 strictly sequentially.
+
+---
+
+## 1.1 Feature-level UI baseline — `ui-explorer` subagent (mode: feature-baseline)
+
+Invoke via the Agent tool with `subagent_type: ui-explorer`. The subagent traverses the feature at a breadth-first level — it does NOT dive into every form/modal. It catalogs **shared surfaces** (top nav, breadcrumbs, sidebar, feature-wide toolbars, common modals, bulk-action bars, global toasts) and identifies candidate sub-features for the destructuring map.
+
+**Prompt must include:**
+- `feature_slug` (from A1)
+- `entry_url` or `entry_path` (from Q1.1)
+- `scope_summary` (2-3 sentences from intake Q1/Q4)
+- `mode: feature-baseline`
+- `output_catalog_path: test-cases/{F}/_shared-ui.md`
+- `output_subfeatures_hint: return a list of candidate sub-features with 1-line description each (for destructuring map construction in 1.4)`
+
+**Subagent contract:** writes `_shared-ui.md` per [../references/ui-catalog-format.md](../references/ui-catalog-format.md) with feature-baseline frontmatter (see [../references/artifacts.md § A-shared-ui](../references/artifacts.md)). Returns ONE line:
+
+```
+Feature baseline <written|reused> to <path>: <N> shared surfaces, <M> candidate sub-features, gaps=<count>
+```
+
+**After subagent returns:**
+1. Parse the 1-line summary
+2. Run cheap validation on the frontmatter (`awk` over `head -20`)
+3. Do NOT Read the full catalog — Step 2 sub-features will each consume it via the `sub-feature-delta` ui-explorer mode
+
+**Fallback:** if the subagent fails twice (timeout, tool error), STOP and escalate. Do NOT run inline Playwright.
+
+---
+
+## 1.2 Acceptance Criteria baseline (`_ac-baseline.md`)
+
+Fetch feature-level ACs from docs once. Source in priority order:
+1. Q3-provided docs URLs → `WebFetch`
+2. `testomatio/docs` repo via GitHub MCP (if available)
+3. User-pasted spec text
+4. Inferred from A-shared-ui + intake prompt → label `source: inferred`
+
+**Extract a structured AC list** (not raw prose). Parse into `AC-1: {one verifiable requirement}`. One AC = one behavior. Vague docs → `AC-N: UNCLEAR — {question}` (surfaced in 1.5).
+
+**IDs are feature-global** — AC-17 in this file is the same AC-17 referenced by every sub-feature's delta.
+
+Save as A-base per [../references/artifacts.md § A-base](../references/artifacts.md) at `test-cases/{F}/_ac-baseline.md`. Cheap-validation must pass before 1.4 begins.
+
+---
+
+## 1.3 Existing steps from Testomat MCP (optional)
+
+Run `mcp__testomatio__steps_list` / `steps_search` to fetch reusable step phrases already defined for this feature's suites. Save as A-steps per [../references/artifacts.md § A-steps](../references/artifacts.md) at `test-cases/{F}/_existing-steps.md`.
+
+Purpose: Step 3 Phase 2 will prefer matching existing wording over inventing new phrasings, reducing step-library churn.
+
+**Skip condition:** Testomat MCP unavailable OR feature has no prior coverage → omit the file. Downstream consumers treat absence as "no hints — generate freely".
+
+---
+
+## 1.4 Destructuring map + cross-cutting concerns → A2
+
+Use [../references/destructuring.md](../references/destructuring.md) procedure steps 3–7 (decomposition criteria, scope boundaries, cross-cutting analysis, execution order). Input is the `candidate_sub_features` list returned by 1.1 plus the A-base AC list for coverage reasoning.
+
+Save to `test-cases/{F}/destructuring.md` with:
+- Sub-feature table (ownership, does-not-own, key elements, estimated tests)
+- `## Cross-cutting concerns` section
+- `- [ ]` progress tracker (one line per sub-feature)
+
+Cheap-validation must pass (see [../references/artifacts.md § A2](../references/artifacts.md)).
+
+---
+
+## 1.5 User approval (single presentation)
+
+**ONE user stop** — present all four outputs together so the user can approve or edit with a single review pass:
+
+```
+Feature phase complete. Produced:
+
+📁 test-cases/{F}/
+  ├─ _shared-ui.md        — {N} shared surfaces cataloged
+  ├─ _ac-baseline.md      — {M} ACs ({K} UNCLEAR, needs your input)
+  ├─ _existing-steps.md   — {S} reusable step phrases [or: SKIPPED]
+  └─ destructuring.md     — {P} sub-features, {X} cross-cutting concerns
+
+UNCLEAR ACs requiring clarification:
+- AC-{N}: {question}
+...
+
+Sub-feature execution order (recommended):
+1. {Sub-feature A} — {est. tests}
+2. {Sub-feature B} — {est. tests}
+...
+
+Cross-cutting concerns:
+A. {concern} — affects #1, #3, #4
+B. {concern} — affects #2, #5
+...
+
+Options:
+  (a) Approve — I'll HARD STOP here so you can open a new conversation for the first sub-feature
+  (b) Edit the map / cross-cutting / AC baseline — tell me what to change
+  (c) Resolve UNCLEAR ACs inline — I'll update _ac-baseline.md
+```
+
+On approval → HARD STOP.
+
+---
+
+## HARD STOP at end of this step
+
+After user approves, print the handoff message and stop. Do NOT proceed to Step 2 in the same conversation — the feature phase has consumed significant context; the sub-feature phase needs a fresh conversation to avoid mid-generation compact.
+
+```
+✅ Feature phase saved at test-cases/{F}/.
+⛔ Stopping here to protect context. Open a NEW conversation window and run:
+   /create-test-cases with #1 {First Sub-feature Name}
+
+Total sub-features: {N}. Batch publish (/publish-test-cases-batch) fires when all are marked [x].
+```
diff --git a/skills/create-test-cases/steps/10-sub-feature-slice.md b/skills/create-test-cases/steps/10-sub-feature-slice.md
new file mode 100644
index 0000000..79a34a6
--- /dev/null
+++ b/skills/create-test-cases/steps/10-sub-feature-slice.md
@@ -0,0 +1,129 @@
+# Step 2: Sub-feature Slice — thin layer on top of feature baselines
+
+## Contract
+- **Precondition:** A1 valid; feature phase complete (A-shared-ui, A-base, A2 all present and valid); `F` and `S` resolved from A2 sub-feature map
+- **Input:** `_ac-baseline.md`, `_shared-ui.md`, `_existing-steps.md` (if exists), A2 sub-feature row (ownership + does-not-own + key elements), intake Q4 focus areas
+- **Output:** A3 (`{S}-ac-delta.md`), A4 (`{S}-ui-delta.md`), A5 (`{S}-scope.md`)
+- **Postcondition:** all three cheap-validations pass; AC↔UI cross-validation (using combined baseline + delta) has 0 unresolved gaps or gaps explicitly accepted in 2.4
+- **Idempotent:** yes — existing delta artifacts are reused on spot-check, regenerated otherwise
+- **Retry policy:** `ui-explorer` subagent failure → 1 retry then escalate; baseline files missing → escalate to user to re-run Step 1
+
+---
+
+## Purpose
+
+The sub-feature phase is **thin**. Heavy exploration and AC extraction happened once in Step 1 and live in the feature-level `_*.md` files. This step only captures what is **specific to the current sub-feature** and builds the scope contract that Step 3 will generate tests against.
+
+**Context invariant:** every sub-feature phase runs in its own conversation (HARD STOP in Step 4 after previous sub-feature). Keeping Step 2 thin preserves context for Step 3 generation.
+
+## Substep dependency map
+
+| # | Input | Output | Tool / Owner | Depends on |
+|---|---|---|---|---|
+| 2.1 | A-base, A2 sub-feature row, Q4 | A3 `{S}-ac-delta.md` | parent (inline) | — |
+| 2.2 | A-shared-ui, A3 (2.1), A2 sub-feature row | A4 `{S}-ui-delta.md` | **ui-explorer** subagent, `mode: sub-feature-delta` | 2.1 |
+| 2.3 | A3 + A4 + A-shared-ui + A-base | combined cross-validation report | parent (inline) | 2.1, 2.2 |
+| 2.4 | A3 + A4 + A2 + cross-validation | A5 `{S}-scope.md` | parent (inline, single user gate) | 2.3 |
+
+2.1 runs first (AC delta drives UI focus). 2.2 then 2.3 then 2.4 strictly sequentially.
+
+---
+
+## 2.1 AC delta — `{S}-ac-delta.md`
+
+**Read `_ac-baseline.md`** once (entire file — it's small, flat AC list).
+
+**Filter applicable baseline ACs:** for each `AC-N` in the baseline, decide whether this sub-feature must cover it. Use the A2 sub-feature row:
+- `Owns:` → strong signal AC applies
+- `Does NOT own:` → strong signal AC belongs to another sub-feature
+- `Key elements:` → if AC mentions one of these elements, AC applies
+- Cross-cutting concerns from A2: if the current sub-feature is listed under "Affects" → the concern's must-test scenarios add ACs
+
+**Identify delta ACs:** requirements that the baseline did NOT capture but that emerge from either:
+- sub-feature-specific docs (deeper links not fetched at feature level)
+- UI observations during 2.2 (iterate — may need to append here after ui-delta)
+- Q4 focus areas specific to this sub-feature
+
+Write A3 per [../references/artifacts.md § A3](../references/artifacts.md) at `test-cases/{F}/{S}-ac-delta.md`:
+
+```yaml
+---
+feature: {F}
+suite: {S}
+references: _ac-baseline.md
+baseline_acs_applicable: [AC-3, AC-7, AC-12]
+delta_ac_count: N
+source: inferred | docs | mixed
+---
+
+## Baseline ACs applicable to {S}
+- AC-3 (baseline)
+- AC-7 (baseline)
+- AC-12 (baseline)
+
+## Delta ACs (sub-feature-specific)
+
+ac-delta-1: {new requirement}
+ac-delta-2: ...
+```
+
+Cheap-validation must pass before 2.2.
+
+**Edge case:** sub-feature has zero applicable baseline ACs → either (a) destructuring map is wrong, or (b) this sub-feature is truly orthogonal. Flag to user BEFORE continuing — don't silently generate only from delta.
+
+---
+
+## 2.2 UI delta — `ui-explorer` subagent, mode: sub-feature-delta
+
+Invoke via Agent tool with `subagent_type: ui-explorer`. The subagent receives the shared catalog path and writes ONLY elements specific to this sub-feature — it does not re-catalog shared chrome.
+
+**Prompt must include:**
+- `feature_slug`, `suite_slug`
+- `entry_url` — sub-feature-specific entry (from A2 sub-feature row) or Q1.1 if no sub-entry
+- `scope_summary` (from A2 sub-feature `Owns:` + `Does NOT own:`)
+- `mode: sub-feature-delta`
+- `shared_catalog_path: test-cases/{F}/_shared-ui.md` — subagent reads this first to know what NOT to re-catalog
+- `ac_list_path: test-cases/{F}/{S}-ac-delta.md` — for AC-focused exploration and self-check
+- `output_catalog_path: test-cases/{F}/{S}-ui-delta.md`
+
+**Reuse:** If `{S}-ui-delta.md` already exists, pass `existing_catalog_path` — subagent spot-checks 3-5 delta elements and reuses or regenerates.
+
+**Subagent contract:** writes delta per [../references/artifacts.md § A4](../references/artifacts.md). Returns ONE line:
+
+```
+Delta <written|reused> to <path>: <N> delta elements, <K> verified flows, gaps=<count>
+```
+
+**After subagent returns:**
+1. Parse the 1-line summary; if `gaps > 0` → flag for 2.3 cross-validation
+2. Run cheap validation on frontmatter (`head -20` + `awk`): `mode: sub-feature-delta` + `references: _shared-ui.md` + `delta_elements` present
+3. `delta_elements: 0` is allowed but rare — confirm with user
+
+**Fallback:** subagent fails twice → STOP and escalate. Do NOT run inline Playwright.
+
+---
+
+## 2.3 Cross-validation (combined baseline + delta)
+
+Work out of both AC sources and both UI sources:
+- AC pool: `baseline_acs_applicable` (from A3 frontmatter, resolved against A-base body) ∪ delta ACs (from A3 body)
+- UI pool: A-shared-ui body ∪ A4 delta body
+
+Steps:
+1. **AC → UI:** for each AC in the combined pool, verify ≥1 element in the combined UI pool supports it. Missing support → either catalog is incomplete or AC is unreachable in UI (out-of-scope candidate).
+2. **UI → AC:** for each delta UI element (A4), verify it maps to ≥1 AC. Unmapped delta elements → add an inferred delta AC or mark out-of-scope.
+3. **Shared-UI elements** are not individually checked here — the feature phase already validated their coverage at the baseline level. But if an AC in the current sub-feature's applicable set has no coverage in either shared or delta UI → the shared catalog is incomplete; escalate to user to extend `_shared-ui.md` or to re-explore.
+
+Report gaps before 2.4. If >0 AC→UI gaps that cannot be resolved by re-invoking `ui-explorer` once (appending to A4) → surface to user in 2.4 with resolution options.
+
+---
+
+## 2.4 Scope contract + single user gate → A5
+
+**ONE user presentation** combining: applicable baseline ACs, delta ACs, UNCLEAR ACs, IN SCOPE (each maps to AC), OUT OF SCOPE (each with justification), sources summary.
+
+Ask user to confirm or edit. **On approval** — save A5 per [../references/artifacts.md § A5](../references/artifacts.md) at `test-cases/{F}/{S}-scope.md` with sections `## In Scope`, `## Out of Scope`, `## Unclear ACs`, `## Sources Used`.
+
+Step 3 Phase 3a scope-compliance gate reads this file; `test-case-reviewer` subagent does the same.
+
+On user approval → proceed directly to Step 3 (no HARD STOP here — Step 3 generation is the same conversation).
diff --git a/skills/create-test-cases/steps/20-generate.md b/skills/create-test-cases/steps/20-generate.md
new file mode 100644
index 0000000..7417f9c
--- /dev/null
+++ b/skills/create-test-cases/steps/20-generate.md
@@ -0,0 +1,215 @@
+# Step 3: Generate Test Cases
+
+## Contract
+- **Precondition:** A1 intake; feature-level `_ac-baseline.md`, `_shared-ui.md`, A2 destructuring map all valid; sub-feature-level A3 `{S}-ac-delta.md`, A4 `{S}-ui-delta.md`, A5 `{S}-scope.md` all valid
+- **Input:** all artifacts above + `_existing-steps.md` (if present) + `_style.md` (if present, read FIRST in Phase 2) + product-context.md (JIT)
+- **Output:** A6 (single MD or directory per Phase 1 Output decision); optional A7 from reviewer subagent for ≥15-test suites
+- **Postcondition:** all Phase 3a bash gates exit 0; Phase 3b ui-validator returns `mismatches fixed, gaps=0`; Phase 4 user approved option 1
+- **Idempotent:** yes — regeneration on Phase 4 reject overwrites A6; Phase 3 fixes mutate A6 in place
+- **Retry policy:** Phase 4 reject → regenerate (max 3 iterations, then escalate per § Iteration limit)
+
+---
+
+Combines checklist, test case generation, and quality checks into one step.
+
+**Output files:**
+- **Flat suite (default):** single file `test-cases/{feature-slug}/{suite-slug}-test-cases.md`
+- **Nested suites:** directory `test-cases/{feature-slug}/{suite-slug}/` with one MD file per section (e.g., `runs-list.md`, `run-detail-view.md`). See [../references/test-case-format.md](../references/test-case-format.md) § Nested Suites for format.
+
+**Format:** [../references/test-case-format.md](../references/test-case-format.md) | **Step rules:** [../references/testing-strategy.md § 3](../references/testing-strategy.md) | **Priority:** [../references/testing-strategy.md § 8](../references/testing-strategy.md) | **Roles:** [../references/product-context.md § User Roles](../references/product-context.md)
+
+## Phase 0: Cross-Cutting Concerns Input
+
+Read `test-cases/{F}/destructuring.md` § Cross-cutting concerns. For each concern that lists the current sub-feature in "Affects":
+- Add its "Must-test scenarios" to the checklist as mandatory items
+- These become dedicated test cases in Phase 2, not just steps inside other tests
+
+Cross-cutting analysis is done once at the feature level (Step 1.4) and consumed by every sub-feature. Destructuring is the default flow — no mode switch.
+
+## Phase 1: Checklist
+
+Generate a categorized checklist from combined inputs:
+- **ACs** = `_ac-baseline.md` filtered by A3 `baseline_acs_applicable` + A3 delta ACs
+- **UI** = `_shared-ui.md` (feature-level) + `{S}-ui-delta.md` (sub-feature-specific)
+- **Reusable steps** = `_existing-steps.md` (if present)
+- **Cross-cutting must-test scenarios** from Phase 0
+
+Shape driven by Q2 (coverage depth). Q4 focus areas add emphasis.
+
+Apply three consolidation passes in order — **definitions and rules live in [../references/testing-strategy.md](../references/testing-strategy.md), do not restate here**:
+1. **E2E pass** — see [../references/testing-strategy.md § 1.3](../references/testing-strategy.md)
+2. **Parameterization pass** — see [../references/testing-strategy.md § 5.5](../references/testing-strategy.md)
+3. **Grouping pass (mandatory, fully automatic)** — assign every checklist item to a section. Sections come from AC list themes and UI catalog screens (CRUD, Search & Filter, Permissions, Validation, Edge Cases, Multi-environment, etc. — see SKILL.md § Sub-suite Distribution Rule for what counts as a natural section). Then apply the decision matrix:
+   - Count sections + tests-per-section
+   - If any section has < 3 tests → **auto-merge** it into the closest semantically related section (do NOT ask user); re-evaluate
+   - Pick output mode per matrix: **nested** (≥ 2 sections of ≥ 3 tests after merging) or **flat** (otherwise)
+   - Record the decision in a 1-line note at the top of the checklist: `Output: nested | sections: 3 (CRUD: 5, Permissions: 4, Validation: 3)` or `Output: flat | reason: only 1 natural section`
+   - Phase 2 reads this header and generates the correct file layout — flat MD or directory of section MDs
+
+## Phase 2: Full Test Cases
+
+**Style carryover (read FIRST).** Before writing any test, check whether `test-cases/{F}/_style.md` exists:
+- **Exists** → Read it. Match its conventions exactly — preconditions phrasing, step granularity, voice, element references, title pattern, parameterization rule, tag usage. The file was written after the first sub-feature of this feature was approved by the user; honoring it keeps the whole feature stylistically consistent.
+- **Does not exist** → this is the first sub-feature of the feature. Proceed; Step 4 will capture style after user approval so the next sub-feature inherits it.
+
+**Reusable step phrases.** If `_existing-steps.md` exists, prefer matching its phrasings when semantically equivalent rather than inventing new wording. Reduces step-library churn on publish.
+
+Convert checklist into detailed test cases with metadata:
+- `priority` via impact×frequency matrix, `source` (AC traceability), `automation` classification, `automation-note`
+- Preconditions (exact role + data requirements)
+- Steps with `_Expected_:` after each action
+- Tags in title (`@smoke`, `@negative`, etc.)
+
+**Hard rules:**
+- **NEVER generate IDs** — no `id: @S...` / `id: @T...`, server assigns them
+- **E2E format, min-3-steps, parameterization** — enforce per [../references/testing-strategy.md § 1.3](../references/testing-strategy.md) (E2E + min steps) and [§ 5.5](../references/testing-strategy.md) (parameterization). Do not restate the rules here.
+- Splitting/merging during step writing is allowed when it improves quality
+- Emphasize Q4 focus areas, write in English
+
+## Phase 3: Built-in Quality Checks
+
+Split into 3a (mechanical checks on MD) + 3b (UI reality check in browser). Both MUST run before Phase 4. Thresholds are blocking for suites ≥15 tests, advisory for smaller suites (see self-review-checks.md intro).
+
+### Phase 3a: Mechanical checks (LLM table)
+
+Row execution rule depends on suite size:
+
+| Suite size | How to run 3a |
+|---|---|
+| **< 15 tests** | Execute each row inline during generation |
+| **≥ 15 tests** | **Mandatory delegation** to `test-case-reviewer` subagent — running inline compacts the parent context and LLM starts missing cross-test connections mid-review (observed regression). Pass the test-cases MD path; subagent returns 1-line `Reviewed <path>: <V> violations, fixed=<F>` and writes a violations report. See [../../agents/test-case-reviewer.md](../../../agents/test-case-reviewer.md) |
+
+| Check | Threshold | Detail § | Action if failing |
+|---|---|---|---|
+| AC coverage | All ACs covered; if `(inferred)` → cross-check with product-context.md | [§ 1](../references/self-review-checks.md#1-coverage) | Add missing tests |
+| Cross-cutting coverage | Each affected concern from destructuring.md has ≥1 dedicated test | [§ 1b](../references/self-review-checks.md#1b-cross-cutting-coverage-from-destructuringmd) | Add tests for uncovered concerns |
+| UI element coverage | ≥80% elements in action steps (not just preconditions) | [§ 1](../references/self-review-checks.md#1-coverage) | Add tests or flag unused elements |
+| Scope compliance | No tests under OUT OF SCOPE items from Step 2.4 (A5) | [§ 2](../references/self-review-checks.md#2-scope-compliance-from-step-14) | STOP and ask user |
+| Scenario balance | per [../references/testing-strategy.md § 2.3](../references/testing-strategy.md) | [§ 3](../references/self-review-checks.md#3-balance) | Rebalance before finalizing |
+| Semantic balance | Each negative/boundary/role test is genuine by content, not just tag | [§ 3](../references/self-review-checks.md#3-balance) | Reclassify, then re-check thresholds |
+| Priority pyramid | critical ≤15%, high ≤35%, normal ≥35%, low ≥5% | [§ 3](../references/self-review-checks.md#3-balance) | Recalibrate priorities |
+| Step quality | Atomic, observable, independent, ≥3 steps | [§ 4](../references/self-review-checks.md#4-step-quality) | Fix inline |
+| E2E + parameterization | 100% E2E, 0 unmerged candidates | [§ 6](../references/self-review-checks.md#6-e2e--parameterization) | Consolidate |
+| **Sub-suite distribution** | If 2+ natural sections of ≥3 tests exist → MUST be nested (regardless of total count) | [§ 8](../references/self-review-checks.md#8-sub-suite-distribution) | Re-run Phase 1 Grouping pass with auto-merge of undersized sections |
+| Automation readiness | 100% classified, ≤30% manual-only | [§ 7](../references/self-review-checks.md#7-automation-readiness) | Add fields |
+
+**Loading rule:** Read self-review-checks.md only when (a) a check row's mechanic is unclear from the table, or (b) you hit a borderline case. Use anchor links to load just the section, never the whole file.
+
+**Borderline escalation:** if a Scope-compliance check hits a test whose AC mapping is ambiguous ("maybe in scope, maybe out") → STOP and ask user. Do not silently include or silently drop — the Step 2.4 scope contract (A5) is the source of truth; update it if needed.
+
+### Phase 3a — deterministic Bash gate checks
+
+After (or instead of, for ≥15-test suites via subagent) the mechanical table above, run the Bash gate batch. These gates are **blocking regardless of suite size** — they fail fast on issues that the LLM self-review regularly misses.
+
+**Full gate definitions + commands:** [../references/artifacts.md § Phase 3a gate-check recipes](../references/artifacts.md#phase-3a-gate-check-recipes). The skill invokes them via a single block:
+
+```bash
+# SET F and S before running
+F="{feature-slug}"
+S="{suite-slug}"
+# Resolve A6 path (flat vs nested)
+if [ -f "test-cases/$F/$S-test-cases.md" ]; then
+  MDS=("test-cases/$F/$S-test-cases.md")
+elif [ -d "test-cases/$F/$S" ]; then
+  MDS=(test-cases/$F/$S/*.md)
+else
+  echo "GATE-FAIL: A6 not found"; exit 1
+fi
+MD="${MDS[0]}"
+
+# Run gates — each exits non-zero on violation
+fail=0
+# Gate 1: AC coverage — applies to baseline ACs applicable to this sub-feature + delta ACs
+# See artifacts.md § Gate 1 for full script. Inline version:
+bash -c '
+  set -e
+  base_acs=$(awk "/^baseline_acs_applicable:/{sub(/^baseline_acs_applicable: *\[/, \"\"); sub(/\].*$/, \"\"); gsub(/,/, \" \"); gsub(/ +/, \" \"); print; exit}" "test-cases/'"$F"'/'"$S"'-ac-delta.md")
+  delta_acs=$(grep -oE "^ac-delta-[0-9]+" "test-cases/'"$F"'/'"$S"'-ac-delta.md" | sort -u)
+  for ac in $base_acs $delta_acs; do
+    [ -z "$ac" ] && continue
+    grep -qE "^source:.*\b$ac\b" '"${MDS[@]}"' || { echo "GATE-1 uncovered: $ac"; exit 1; }
+  done
+' || fail=$((fail+1))
+
+# Gate 5: no forbidden metadata
+grep -qE '^(tags|labels):' "${MDS[@]}" && { echo "GATE-5 forbidden tags/labels: header fields"; fail=$((fail+1)); }
+
+# Gate 9: no id field
+grep -qE '^id: @[ST]' "${MDS[@]}" && { echo "GATE-9 forbidden id field"; fail=$((fail+1)); }
+
+# Gate 11: no impl leakage (CSS / icon class / data-test-id / inline AC / raw exploit payload)
+# See artifacts.md § Gate 11 for full awk script
+bash -c 'set -e; SCRIPT=$(sed -n "/^### Gate 11/,/^### Gate 12/p" .claude/skills/create-test-cases/references/artifacts.md | sed -n "/```bash/,/```/p" | sed "1d;\$d"); eval "$SCRIPT"' || fail=$((fail+1))
+
+# Gate 12: parameterized title references ${col}
+bash -c 'set -e; SCRIPT=$(sed -n "/^### Gate 12/,/^### Gate 13/p" .claude/skills/create-test-cases/references/artifacts.md | sed -n "/```bash/,/```/p" | sed "1d;\$d"); eval "$SCRIPT"' || fail=$((fail+1))
+
+# Gate 13: no process-metadata sections (## Validation Log, ## Review Log, TODO, placeholders) leaked into test MD
+bash -c 'set -e; SCRIPT=$(sed -n "/^### Gate 13/,/^---/p" .claude/skills/create-test-cases/references/artifacts.md | sed -n "/```bash/,/```/p" | sed "1d;\$d"); eval "$SCRIPT"' || fail=$((fail+1))
+
+# (Gates 2, 3, 4, 6, 7, 8, 10 — see artifacts.md for full commands; run the same pattern)
+
+if [ $fail -gt 0 ]; then
+  echo "Phase 3a gates: $fail violations — routing back to regeneration"
+  exit 1
+fi
+echo "Phase 3a gates: PASS"
+```
+
+**Routing on gate failure:**
+- **Gates 1, 3, 4, 6, 7, 8 (content violations)** → regenerate the flagged tests in place, re-run gates (max 2 auto-regeneration attempts, then escalate to user)
+- **Gate 2 (priority pyramid)** → recalibrate priorities (ask LLM to downgrade lowest-impact critical/high to normal), re-run gate
+- **Gates 5, 9 (forbidden fields)** → strip the fields from A6 in place, no regeneration needed
+- **Gate 10 (layout mismatch)** → re-run Phase 1 Grouping pass with the correct Output mode, then regenerate Phase 2
+- **Gate 11 (impl leakage)** → strip CSS classes / icon classes / `data-test-id` / `aria-describedby` / inline AC refs / raw exploit payloads from the flagged Steps and `_Expected_:` lines in place. Rewrite the affected lines semantically (label, tooltip, role, visual state). No test-count change. Re-run gate. See [../references/test-case-format.md § Anti-patterns in test case bodies](../references/test-case-format.md) for rewrite patterns.
+- **Gate 12 (parameterized title missing `${col}`)** → rewrite the flagged test title to reference at least one column from its `<!-- example -->` header via `${col_name}` (e.g. `## Mark test as PASSED with each sub-status` → `## Mark test as PASSED with each ${sub_status}`). No Steps change. Re-run gate.
+- **Gate 13 (process-metadata leak)** → strip the offending H2 block / placeholder / dev-marker from the test MD. Investigate the producing subagent's output path: `ui-validator` must write to `{S}-validation-log.md` (A8), `test-case-reviewer` must write to `{S}-review.md` (A7). If the subagent definition still points to the wrong path → fix the agent definition, not just the symptom. **Not auto-fixable** by the skill. Re-run gate after manual cleanup. Regression origin: 2026-04-20 — 20 files leaked `## Validation Log` from ui-validator.
+
+**Do NOT proceed to Phase 3b or Phase 4 while any gate fails.** If user explicitly overrides a gate (e.g. accepts a <15-test skew) → log the override in A6 as a comment line `<!-- gate-override: Gate-2 priority pyramid waived by user 2026-04-16 -->`.
+
+### Phase 3b: UI reality check — always via `ui-validator` subagent
+
+Element names and step mechanics cannot be verified from the MD alone — they must be walked through the real UI. Same isolation pattern as ui-explorer: Playwright dumps stay in the subagent, parent sees only a 1-line summary.
+
+| Check | Threshold | Detail § | Action if failing |
+|---|---|---|---|
+| **UI reality check** | Walk 2-3 tests step-by-step in browser via `ui-validator` subagent. Covers element-name accuracy, step mechanics, toast text | [§ 5](../references/self-review-checks.md#5-ui-reality-check-playwright) | Fix steps, toasts, element names that don't match real UI |
+
+Invoke via the Agent tool with `subagent_type: ui-validator`. Pass test-cases MD path + UI catalog path. Subagent walks 2-3 tests in Playwright, edits the MD in-place to fix mismatches, returns: `Validated <path>: <K> tests walked, <M> mismatches fixed, gaps=<count>`. See [../../agents/ui-validator.md](../../../agents/ui-validator.md).
+
+**Never run 3b inline in the parent** — Playwright snapshots will compact the conversation mid-generation.
+
+## Phase 4: Present to User
+
+Show: test count, priority distribution, scenario balance, AC/UI coverage, automation split. Then unified menu:
+
+```
+1. 👍 Approve — save MD, hand off to /publish-test-cases-batch
+2. ➖ Less detail — drop boundary/rare edge cases
+3. ➕ More detail — add boundaries, state transitions, role checks
+4. ✏️ Edit — rename / add / remove specific tests
+```
+
+**On option 1 (Approve):** save A6, go directly to Step 4 (final report + publish handoff). This skill never pushes — the Step 4 handoff line tells the user which `/publish-test-cases-batch` invocation to run next.
+
+**Iteration limit:** Phase 4 is the **single approval gate** (no separate presentation phase). If the user rejects / requests changes **3 times in a row** → STOP the loop and ask: "We've iterated 3 times — is there a scope or strategy issue I'm missing? Options: (a) revisit Step 2.4 scope contract (A5), (b) change Q2 coverage depth, (c) accept current state and hand off." Do not silently keep regenerating.
+
+## Automation Classification
+
+Every test MUST include `automation` and optionally `automation-note` metadata per [../references/test-case-format.md](../references/test-case-format.md).
+
+| Classification | When | Examples |
+|---|---|---|
+| `candidate` | Single-user flow, standard UI, deterministic | Form fill + submit, CRUD, click + verify toast |
+| `deferred` | Automatable but needs infrastructure | Multi-user, email verification, specific file formats |
+| `manual-only` | Not cost-effective or needs human judgment | Visual layout, subjective UX, complex drag-and-drop |
+
+Default to `candidate` when unsure. `automation-note` required for `deferred` and `manual-only`.
+
+## Traceability
+
+Every test links to at least one AC via `source` metadata: `source: AC-2, AC-3`. No AC match → `source: exploratory`. Multiple exploratory = likely scope drift.
+
+## Matching Existing Style
+
+If the argument (Q1) is a Testomat suite URL, or Q3 sources include links to an existing suite: match writing style, detail level, title patterns, language. Do NOT copy feature-specific prefixes or outdated element names.
diff --git a/skills/create-test-cases/steps/40-report.md b/skills/create-test-cases/steps/40-report.md
new file mode 100644
index 0000000..68b5037
--- /dev/null
+++ b/skills/create-test-cases/steps/40-report.md
@@ -0,0 +1,90 @@
+# Step 4: Final Report & Handoff
+
+## Contract
+- **Precondition:** Step 3 approved (Phase 4 option 1). A6 saved to disk. A2 `destructuring.md` exists (always, under feature-first flow)
+- **Input:** A6 path, A1 intake, A2 destructuring
+- **Output:** final report printed; A2 progress tracker toggled to `[x]` for the current sub-feature; A-style captured on first approval; publish handoff line
+- **Postcondition:** user knows where MD lives and which `/publish-test-cases-batch` invocation to run next; HARD STOP before next sub-feature
+- **Idempotent:** yes — read-only aside from A2 checkbox mutation (re-sets same checkbox) and one-time `_style.md` creation
+- **Retry policy:** none (no external calls)
+
+---
+
+This skill never publishes. Step 4 is a local report + handoff to the publishing skill.
+
+## Procedure
+
+1. **Print the sub-feature report** (prefaced with sub-feature name):
+   ```
+   ✅ Test cases saved — {feature-slug} / {suite-slug}
+
+   Layout:      {flat | nested, N section files}
+   Path:        test-cases/{F}/{S}-test-cases.md        (flat)
+                test-cases/{F}/{S}/                     (nested)
+   Test count:  {N}
+   Priority:    critical X · high Y · normal Z · low W
+   AC coverage: {M}/{total} ACs cited ({pct}%)  — baseline applicable: {B}, delta: {D}
+   Automation:  candidate A · deferred B · manual-only C
+   ```
+
+2. **Toggle A2 progress tracker.** Update `test-cases/{F}/destructuring.md`: change the current sub-feature's row from `- [ ]` to `- [x] #{n} {name} — {N} tests, {layout}, ({date})`.
+
+3. **Style capture (FIRST approval only).** If `test-cases/{F}/_style.md` does NOT exist yet → this is the first approved sub-feature of the feature. Extract style patterns from the just-approved A6 and write `_style.md`. If it already exists → skip; do NOT overwrite (user may have manually edited it).
+
+   Template:
+   ```markdown
+   # Feature Style Guide — {F}
+
+   Detected from: {first-sub-feature-slug} (approved {YYYY-MM-DD})
+   Reference A6: test-cases/{F}/{first-suite}-test-cases.md  (or directory if nested)
+
+   ## Preconditions
+   - Role phrasing: "{copy an actual line from A6, e.g. 'User is logged in as QA user.'}"
+   - Project / data state phrasing: "{copy an actual line from A6}"
+   - Count of precondition bullets per test: {typical range observed}
+
+   ## Steps
+   - Granularity: {one action + _Expected_ per step | combined action+expected | other}
+   - Voice: {imperative ("Click the ...") | descriptive | mixed}
+   - Element references: {by visible label | by ARIA role | by tooltip | mixed}
+   - `_Expected_:` placement: {after every step | only on final step | other}
+
+   ## Titles
+   - Pattern: "{e.g. Verb + entity + modifier}" — {active voice | descriptive}
+   - Tags position: {in title as `@tag` | in metadata only}
+   - Parameterization: {used when N+ variants; format "${col}" inside title}
+
+   ## Tags
+   - `@smoke`: {criterion used in this feature}
+   - `@negative`: {criterion}
+   - Other tags observed: {list or "none"}
+
+   ## Automation defaults observed
+   - Most common `automation:` value: {candidate | deferred | manual-only}
+   - `automation-note` style: {concise | detailed | when-needed}
+   ```
+
+   Extract each `{...}` placeholder by **reading actual content** from A6 — do not invent patterns. If a category shows no clear pattern (e.g., mixed step granularity), write `mixed — see reference A6`. File written once, feature-scoped, `_` prefix signals feature-level shared artifact.
+
+4. **HARD STOP — return control to user.** Print exactly one of the two blocks below depending on remaining sub-features:
+
+   **If remaining sub-features > 0:**
+   ```
+   ✅ Sub-feature done. MD saved at {path}.
+   ⛔ Stopping here to protect context. Open a NEW conversation window and run:
+      /create-test-cases with #{next_number} {Next Sub-feature Name}
+
+   Remaining: {N-1} sub-features. Batch publish fires when all are marked [x]:
+      /publish-test-cases-batch test-cases/{F}/
+   ```
+
+   **If this was the LAST sub-feature (all rows now `[x]`):**
+   ```
+   ✅ All sub-features done for {feature-slug}. MD saved at {path}.
+   ⛔ Stopping here to return control. Next step — publish the whole feature:
+      /publish-test-cases-batch test-cases/{F}/
+
+   (Auto-detects whole-feature batch mode from the directory path.)
+   ```
+
+5. Do NOT proceed to generate the next sub-feature in the same conversation — context will compact before batch publishing.