diff --git a/.claude/commands/analyze-ci-results.md b/.claude/commands/analyze-ci-results.md
new file mode 100644
index 000000000..1c6574b23
--- /dev/null
+++ b/.claude/commands/analyze-ci-results.md
@@ -0,0 +1,280 @@
+---
+name: analyze-ci-results
+description: Analyze OpenShift CI (Prow) test results from a gcsweb URL - identifies infra vs test/code failures and correlates with git commits
+parameters:
+  - name: ci-url
+    description: >
+      The gcsweb URL for a CI run. Can be any level of the artifact tree:
+      - Job root: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}/
+      - Test artifacts: .../{RUN_ID}/artifacts/e2e-incidents/monitoring-plugin-tests-incidents-ui/
+      - Prow UI: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}
+    required: true
+  - name: focus
+    description: "Optional: focus analysis on specific test file or area (e.g., 'regression', '01.incidents', 'filtering')"
+    required: false
+---
+
+# Analyze OpenShift CI Test Results
+
+Fetch, parse, and classify failures from an OpenShift CI (Prow) test run. This skill is designed to be the **first step** in an agentic test iteration workflow — it produces a structured diagnosis that the orchestrator can act on.
+
+## Instructions
+
+### Step 1: Normalize the URL
+
+The user may provide a URL at any level. Normalize it to the **job root**:
+
+```
+https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}/
+```
+
+If the user provides a Prow UI URL (`prow.ci.openshift.org/view/gs/...`), convert it:
+- Replace `https://prow.ci.openshift.org/view/gs/` with `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/`
+- Append trailing `/` if missing
+
+Derive these base paths:
+- **Job root**: `{normalized_url}`
+- **Test artifacts root**: `{normalized_url}artifacts/e2e-incidents/monitoring-plugin-tests-incidents-ui/`
+- **Screenshots root**: `{test_artifacts_root}artifacts/screenshots/`
+- **Videos root**: `{test_artifacts_root}artifacts/videos/`
+
+### Step 2: Fetch Job Metadata (parallel)
+
+Fetch these files from the **job root** using WebFetch:
+
+| File | What to extract |
+|------|----------------|
+| `started.json` | `timestamp`, `pull` (PR number), `repos` (commit SHAs) |
+| `finished.json` | `passed` (bool), `result` ("SUCCESS"/"FAILURE"), `revision` (PR HEAD SHA) |
+| `prowjob.json` | PR title, PR author, PR branch, base branch, base SHA, PR SHA, job name, cluster, duration |
+
+From `started.json` `repos` field, extract:
+- **Base commit**: the SHA after `main:` (before the comma)
+- **PR commit**: the SHA after `{PR_NUMBER}:`
+
+Present a summary:
+```
+CI Run Summary:
+  PR:          #{PR_NUMBER} - {PR_TITLE}
+  Author:      {AUTHOR}
+  Branch:      {PR_BRANCH} -> {BASE_BRANCH}
+  PR commit:   {PR_SHA} (short: first 7 chars)
+  Base commit: {BASE_SHA} (short: first 7 chars)
+  Result:      PASSED / FAILED
+  Duration:    {DURATION}
+  Job:         {JOB_NAME}
+```
+
+### Step 3: Fetch and Parse Test Results
+
+Fetch `{test_artifacts_root}build-log.txt` using WebFetch.
+
+#### Cypress Output Format
+
+The build log contains Cypress console output. Parse these sections:
+
+**Per-spec results block** — appears after each spec file runs:
+```
+  (Results)
+
+  ┌──────────────────────────────────────────────────────────┐
+  │ Tests:        N                                          │
+  │ Passing:      N                                          │
+  │ Failing:      N                                          │
+  │ Pending:      N                                          │
+  │ Skipped:      N                                          │
+  │ Screenshots:  N                                          │
+  │ Video:        true                                       │
+  │ Duration:     X minutes, Y seconds                       │
+  │ Spec Ran:     {spec-file-name}.cy.ts                     │
+  └──────────────────────────────────────────────────────────┘
+```
+
+**Final summary table** — appears at the very end:
+```
+  (Run Finished)
+
+  ┌──────────────────────────────────────────────────────────┐
+  │ Spec                    Tests  Passing  Failing  Pending │
+  ├──────────────────────────────────────────────────────────┤
+  │ ✓ spec-file.cy.ts       5      5        0        0      │
+  │ ✗ other-spec.cy.ts      3      1        2        0      │
+  └──────────────────────────────────────────────────────────┘
+```
+
+**Failure details** — appear inline during test execution:
+```
+  1) Suite Name
+       "before all" hook for "test description":
+     ErrorType: error message
+       > detailed error
+       at stack trace...
+
+  N failing
+```
+
+Or for test-level (not hook) failures:
+```
+  1) Suite Name
+       test description:
+     AssertionError: Timed out retrying after Nms: Expected to find element: .selector
+```
+
+Extract per-spec:
+- Spec file name
+- Pass/fail/skip counts
+- For failures: test name, error type, error message, whether it was in a hook
+
+### Step 4: Fetch Failure Screenshots
+
+For each failing spec, navigate to `{screenshots_root}{spec-file-name}/` and list available screenshots.
+
+**Screenshot naming convention:**
+```
+{Suite Name} -- {Test Title} -- before all hook (failed).png
+{Suite Name} -- {Test Title} (failed).png
+```
+
+Fetch each screenshot URL and **read it using the Read tool** (multimodal) to understand the visual state at failure time. Describe what you see:
+- What page/view is shown?
+- Are there error dialogs, loading spinners, empty states?
+- Is the expected UI element visible? If not, what's in its place?
+- Are there console errors visible in the browser?
+
+### Step 5: Classify Each Failure
+
+For every failing test, classify it into one of these categories:
+
+#### Infrastructure Failures (not actionable by test code changes)
+
+| Classification | Indicators |
+|---------------|------------|
+| `INFRA_CLUSTER` | Certificate expired, API server unreachable, node not ready, cluster version mismatch |
+| `INFRA_OPERATOR` | COO/CMO installation timeout, operator pod not running, CRD not found |
+| `INFRA_PLUGIN` | Plugin deployment unavailable, dynamic plugin chunk loading error, console not accessible |
+| `INFRA_AUTH` | Login failed, kubeconfig invalid, RBAC permission denied (for expected operations) |
+| `INFRA_CI` | Pod eviction, OOM killed, timeout at infrastructure level (not test timeout) |
+
+**Key signals for infra issues:**
+- Errors in `before all` hooks related to cluster setup
+- Certificate/TLS errors
+- `oc` command failures with connection errors
+- Element `.co-clusterserviceversion-install__heading` not found (operator install UI)
+- Errors mentioning pod names, namespaces, or k8s resources
+- `e is not a function` or similar JS errors from the console application itself (not test code)
+
+#### Test/Code Failures (actionable)
+
+| Classification | Indicators |
+|---------------|------------|
+| `TEST_BUG` | Wrong selector, incorrect assertion logic, race condition / timing issue, test assumes wrong state |
+| `FIXTURE_ISSUE` | Mock data doesn't match expected structure, missing alerts/incidents in fixture, edge case not covered |
+| `PAGE_OBJECT_GAP` | Page object method missing, selector outdated, doesn't match current DOM |
+| `MOCK_ISSUE` | cy.intercept not matching the actual API call, response shape incorrect, query parameter mismatch |
+| `CODE_REGRESSION` | Test was passing before, UI behavior genuinely changed — the source code has a bug |
+
+**Key signals for test/code issues:**
+- `AssertionError: Timed out retrying` on application-specific selectors (not infra selectors)
+- `Expected X to equal Y` where the assertion logic is wrong
+- Failures only in specific test scenarios, not across the board
+- Screenshot shows the UI rendered correctly but test expected something different
+
+### Step 6: Correlate with Git Commits
+
+Using the PR commit SHA and base commit SHA from Step 2:
+
+1. **Check local git history**: Run `git log {base_sha}..{pr_sha} --oneline` to see what changed in the PR
+2. **Identify relevant changes**: Run `git diff {base_sha}..{pr_sha} --stat` to see which files were modified
+3. **For CODE_REGRESSION failures**: Check if the failing component's source code was modified in the PR
+4. **For TEST_BUG failures**: Check if the test itself was modified in the PR (new test might have a bug)
+
+Present the correlation:
+```
+Commit correlation for {test_name}:
+  PR modified: src/components/incidents/IncidentChart.tsx (+45, -12)
+  Test file:   cypress/e2e/incidents/01.incidents.cy.ts (unchanged)
+  Verdict:     CODE_REGRESSION - chart rendering changed but test expectations not updated
+```
+
+Or:
+```
+Commit correlation for {test_name}:
+  PR modified: cypress/e2e/incidents/regression/01.reg_filtering.cy.ts (+30, -5)
+  Source code: src/components/incidents/ (unchanged)
+  Verdict:     TEST_BUG - new test code has incorrect assertion
+```
+
+### Step 7: Produce Structured Report
+
+Output a structured report with this format:
+
+```
+# CI Analysis Report
+
+## Run: PR #{PR} - {TITLE}
+- Commit: {SHORT_SHA} by {AUTHOR}
+- Branch: {BRANCH}
+- Result: {RESULT}
+- Duration: {DURATION}
+
+## Summary
+- Total specs: N
+- Passed: N
+- Failed: N (M infra, K test/code)
+
+## Infrastructure Issues (not actionable via test changes)
+
+### INFRA_CLUSTER: Certificate expired
+- Affected: ALL tests (cascade failure)
+- Detail: x509 certificate expired at {timestamp}
+- Action needed: Cluster certificate renewal (outside test scope)
+
+## Test/Code Issues (actionable)
+
+### TEST_BUG: Selector timeout in filtering test
+- Spec: regression/01.reg_filtering.cy.ts
+- Test: "should filter incidents by severity"
+- Error: Timed out retrying after 80000ms: Expected to find element: [data-test="severity-filter"]
+- Screenshot: [description of what screenshot shows]
+- Commit correlation: Test file was modified in this PR (+30 lines)
+- Suggested fix: Update selector to match current DOM structure
+
+### CODE_REGRESSION: Chart not rendering after component refactor
+- Spec: regression/02.reg_ui_charts_comprehensive.cy.ts
+- Test: "should display incident bars in chart"
+- Error: Expected 5 bars, found 0
+- Screenshot: Chart area is empty, no error messages visible
+- Commit correlation: src/components/incidents/IncidentChart.tsx was refactored
+- Suggested fix: Investigate chart rendering logic in the refactored component
+
+## Flakiness Indicators
+- If a test failed with a timing-related error but similar tests in the same suite passed,
+  flag it as potentially flaky
+- If the error message contains "Timed out retrying" on an element that should exist,
+  it may be a race condition rather than a missing element
+
+## Recommendations
+- List prioritized next steps
+- For infra issues: what needs to happen before tests can run
+- For test/code issues: which fixes to attempt first (quick wins vs complex)
+- Whether local reproduction is recommended
+```
+
+### Step 8: If `focus` parameter is provided
+
+Filter the analysis to only the relevant tests. For example:
+- `focus=regression` -> only analyze `regression/*.cy.ts` specs
+- `focus=filtering` -> only analyze tests with "filter" in their name
+- `focus=01.incidents` -> only analyze `01.incidents.cy.ts`
+
+Still fetch all metadata and provide the full context, but limit detailed diagnosis to the focused area.
+
+## Notes for the Orchestrator
+
+When this skill is used as the first step of `/iterate-incident-tests`:
+
+1. **If all failures are INFRA_***: Report to user and STOP. No test changes will help.
+2. **If mixed INFRA_* and TEST/CODE**: Report infra issues to user, proceed with test/code fixes only.
+3. **If all failures are TEST/CODE**: Proceed with the full iteration loop.
+4. **The commit correlation** tells the orchestrator whether to focus on fixing tests or investigating source code changes.
+5. **Screenshots** give the Diagnosis Agent a head start — it can reference the CI screenshot analysis instead of reproducing the failure locally first.
diff --git a/.claude/commands/cypress/scripts/notify-slack.py b/.claude/commands/cypress/scripts/notify-slack.py
new file mode 100644
index 000000000..49e3c6bfd
--- /dev/null
+++ b/.claude/commands/cypress/scripts/notify-slack.py
@@ -0,0 +1,305 @@
+#!/usr/bin/env python3
+"""Send Slack notifications for agentic test iteration loops.
+
+Supports two modes based on environment variables:
+
+Option A (Webhook — one-way):
+    SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T.../B.../..."
+
+Option B (Bot with thread replies — two-way):
+    SLACK_BOT_TOKEN="xoxb-..."
+    SLACK_CHANNEL_ID="C0123456789"
+
+If neither is set, prints the message to stdout and exits cleanly.
+
+Usage:
+    # Send a notification (both modes)
+    python3 notify-slack.py send <event_type> <message> [options]
+
+    # Wait for thread reply (Option B only)
+    python3 notify-slack.py wait <message_ts> [--timeout 600]
+
+Event types:
+    fix_applied, ci_started, ci_complete, ci_failed,
+    review_needed, iteration_done, flaky_found, blocked
+
+Options:
+    --pr <number>       PR number (adds link to message)
+    --branch <name>     Branch name
+    --url <ci_url>      CI run URL
+    --thread-ts <ts>    Reply in a thread (Option B)
+    --timeout <seconds> Review window timeout for 'wait' command (default: 600)
+"""
+
+import argparse
+import json
+import os
+import subprocess
+import sys
+import time
+import urllib.request
+import urllib.error
+
+
+EMOJI = {
+    "fix_applied": ":wrench:",
+    "ci_started": ":hourglass_flowing_sand:",
+    "ci_complete": ":white_check_mark:",
+    "ci_failed": ":x:",
+    "review_needed": ":eyes:",
+    "iteration_done": ":checkered_flag:",
+    "flaky_found": ":warning:",
+    "blocked": ":octagonal_sign:",
+}
+
+
+def build_blocks(event_type, message, pr=None, branch=None, url=None):
+    """Build Slack Block Kit blocks for the notification."""
+    emoji = EMOJI.get(event_type, ":robot_face:")
+
+    blocks = [
+        {
+            "type": "section",
+            "text": {
+                "type": "mrkdwn",
+                "text": f"{emoji} *Agent: {event_type.replace('_', ' ').title()}*",
+            },
+        },
+        {
+            "type": "section",
+            "text": {"type": "mrkdwn", "text": message},
+        },
+    ]
+
+    context_parts = []
+    if pr:
+        context_parts.append(
+            f"<https://github.com/openshift/monitoring-plugin/pull/{pr}|PR #{pr}>"
+        )
+    if branch:
+        context_parts.append(f"Branch: `{branch}`")
+    if url:
+        context_parts.append(f"<{url}|CI Run>")
+
+    if context_parts:
+        blocks.append(
+            {
+                "type": "context",
+                "elements": [
+                    {"type": "mrkdwn", "text": " | ".join(context_parts)}
+                ],
+            },
+        )
+
+    return blocks
+
+
+def send_webhook(webhook_url, blocks):
+    """Option A: Send via incoming webhook."""
+    payload = json.dumps({"blocks": blocks}).encode("utf-8")
+
+    req = urllib.request.Request(
+        webhook_url,
+        data=payload,
+        headers={"Content-Type": "application/json"},
+        method="POST",
+    )
+
+    try:
+        with urllib.request.urlopen(req) as resp:
+            return {"ok": True, "status": resp.status}
+    except urllib.error.HTTPError as e:
+        print(f"Webhook failed: HTTP {e.code} — {e.read().decode()}", file=sys.stderr)
+        return {"ok": False, "error": str(e)}
+
+
+def slack_api(token, method, payload):
+    """Call a Slack Web API method."""
+    url = f"https://slack.com/api/{method}"
+    data = json.dumps(payload).encode("utf-8")
+
+    req = urllib.request.Request(
+        url,
+        data=data,
+        headers={
+            "Content-Type": "application/json; charset=utf-8",
+            "Authorization": f"Bearer {token}",
+        },
+        method="POST",
+    )
+
+    try:
+        with urllib.request.urlopen(req) as resp:
+            return json.loads(resp.read().decode())
+    except urllib.error.HTTPError as e:
+        body = e.read().decode()
+        print(f"Slack API {method} failed: HTTP {e.code} — {body}", file=sys.stderr)
+        return {"ok": False, "error": str(e)}
+
+
+def send_bot(token, channel, blocks, thread_ts=None):
+    """Option B: Send via bot token."""
+    payload = {
+        "channel": channel,
+        "blocks": blocks,
+    }
+    if thread_ts:
+        payload["thread_ts"] = thread_ts
+
+    result = slack_api(token, "chat.postMessage", payload)
+
+    if result.get("ok"):
+        ts = result.get("ts", "")
+        print(f"MESSAGE_TS={ts}")
+        return {"ok": True, "ts": ts}
+    else:
+        print(f"Bot send failed: {result.get('error')}", file=sys.stderr)
+        return {"ok": False, "error": result.get("error")}
+
+
+def wait_for_reply(token, channel, message_ts, timeout=600, poll_interval=30):
+    """Option B: Poll for thread replies within a review window.
+
+    Returns the latest user reply text, or None if no reply within timeout.
+    Output format:
+        REPLY=<user's message text>
+        NO_REPLY
+    """
+    # Get bot's own user ID to filter out its own messages
+    auth_result = slack_api(token, "auth.test", {})
+    bot_user_id = auth_result.get("user_id", "")
+
+    deadline = time.time() + timeout
+    seen_messages = set()
+
+    # Seed with the original message to ignore it
+    seen_messages.add(message_ts)
+
+    print(f"Waiting up to {timeout}s for reply in thread {message_ts}...", flush=True)
+
+    while time.time() < deadline:
+        result = slack_api(
+            token,
+            "conversations.replies",
+            {"channel": channel, "ts": message_ts},
+        )
+
+        if result.get("ok"):
+            messages = result.get("messages", [])
+            for msg in messages:
+                msg_ts = msg.get("ts", "")
+                user = msg.get("user", "")
+
+                if msg_ts in seen_messages:
+                    continue
+                seen_messages.add(msg_ts)
+
+                # Skip bot's own messages
+                if user == bot_user_id:
+                    continue
+
+                # Found a user reply
+                reply_text = msg.get("text", "")
+                print(f"REPLY={reply_text}")
+                return reply_text
+
+        remaining = int(deadline - time.time())
+        if remaining > 0:
+            print(
+                f"No reply yet, {remaining}s remaining...",
+                file=sys.stderr,
+                flush=True,
+            )
+
+        time.sleep(min(poll_interval, max(1, remaining)))
+
+    print("NO_REPLY")
+    return None
+
+
+def cmd_send(args):
+    """Handle the 'send' subcommand."""
+    webhook_url = os.environ.get("SLACK_WEBHOOK_URL", "")
+    bot_token = os.environ.get("SLACK_BOT_TOKEN", "")
+    channel_id = os.environ.get("SLACK_CHANNEL_ID", "")
+
+    blocks = build_blocks(
+        args.event_type, args.message, pr=args.pr, branch=args.branch, url=args.url
+    )
+
+    # Option B: Bot token takes priority (supports two-way)
+    if bot_token and channel_id:
+        result = send_bot(bot_token, channel_id, blocks, thread_ts=args.thread_ts)
+        return 0 if result.get("ok") else 1
+
+    # Option A: Webhook (one-way)
+    if webhook_url:
+        result = send_webhook(webhook_url, blocks)
+        return 0 if result.get("ok") else 1
+
+    # No Slack configured — print to stdout and exit cleanly
+    emoji = EMOJI.get(args.event_type, "")
+    print(f"[slack-skip] {emoji} {args.event_type}: {args.message}")
+    return 0
+
+
+def cmd_wait(args):
+    """Handle the 'wait' subcommand."""
+    bot_token = os.environ.get("SLACK_BOT_TOKEN", "")
+    channel_id = os.environ.get("SLACK_CHANNEL_ID", "")
+
+    if not bot_token or not channel_id:
+        print(
+            "NO_REPLY (Option B not configured — SLACK_BOT_TOKEN and SLACK_CHANNEL_ID required)"
+        )
+        return 0
+
+    reply = wait_for_reply(
+        bot_token, channel_id, args.message_ts, timeout=args.timeout
+    )
+    return 0
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Slack notifications for agentic test iteration"
+    )
+    subparsers = parser.add_subparsers(dest="command", required=True)
+
+    # 'send' subcommand
+    send_parser = subparsers.add_parser("send", help="Send a notification")
+    send_parser.add_argument(
+        "event_type",
+        choices=list(EMOJI.keys()),
+        help="Event type",
+    )
+    send_parser.add_argument("message", help="Message text (Slack mrkdwn supported)")
+    send_parser.add_argument("--pr", help="PR number")
+    send_parser.add_argument("--branch", help="Branch name")
+    send_parser.add_argument("--url", help="CI run URL")
+    send_parser.add_argument(
+        "--thread-ts", help="Thread timestamp to reply in (Option B)"
+    )
+
+    # 'wait' subcommand
+    wait_parser = subparsers.add_parser(
+        "wait", help="Wait for thread reply (Option B only)"
+    )
+    wait_parser.add_argument("message_ts", help="Message timestamp to watch")
+    wait_parser.add_argument(
+        "--timeout",
+        type=int,
+        default=600,
+        help="Seconds to wait for reply (default: 600)",
+    )
+
+    args = parser.parse_args()
+
+    if args.command == "send":
+        return cmd_send(args)
+    elif args.command == "wait":
+        return cmd_wait(args)
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/.claude/commands/cypress/scripts/poll-ci-status.py b/.claude/commands/cypress/scripts/poll-ci-status.py
new file mode 100644
index 000000000..226399074
--- /dev/null
+++ b/.claude/commands/cypress/scripts/poll-ci-status.py
@@ -0,0 +1,92 @@
+#!/usr/bin/env python3
+"""Poll OpenShift CI (Prow) job status for a PR until completion.
+
+Usage:
+    python3 poll-ci-status.py <pr_number> [job_substring] [max_attempts] [interval_seconds]
+
+Arguments:
+    pr_number       GitHub PR number to poll
+    job_substring   Substring to match in job name (default: e2e-incidents)
+    max_attempts    Maximum polling attempts (default: 30)
+    interval_seconds Sleep between polls in seconds (default: 300)
+
+Output on completion:
+    CI_COMPLETE state=SUCCESS url=<prow_url>
+    CI_COMPLETE state=FAILURE url=<prow_url>
+    CI_TIMEOUT (if max_attempts reached)
+
+Requires: gh CLI authenticated with access to the repo.
+"""
+
+import subprocess
+import json
+import time
+import sys
+
+
+def poll(pr, job_substring="e2e-incidents", max_attempts=30, interval=300):
+    for attempt in range(max_attempts):
+        result = subprocess.run(
+            ["gh", "pr", "checks", pr, "--json", "name,state,link"],
+            capture_output=True,
+            text=True,
+        )
+
+        if result.returncode != 0:
+            print(
+                f"gh pr checks failed (attempt {attempt + 1}/{max_attempts}): {result.stderr.strip()}",
+                flush=True,
+            )
+            time.sleep(interval)
+            continue
+
+        try:
+            checks = json.loads(result.stdout)
+        except json.JSONDecodeError:
+            print(
+                f"Invalid JSON from gh pr checks (attempt {attempt + 1}/{max_attempts})",
+                flush=True,
+            )
+            time.sleep(interval)
+            continue
+
+        found = False
+        for check in checks:
+            if job_substring in check.get("name", ""):
+                found = True
+                state = check["state"]
+                url = check.get("link", "")
+
+                if state in ("SUCCESS", "FAILURE"):
+                    print(f"CI_COMPLETE state={state} url={url}")
+                    return 0
+
+                print(
+                    f"CI_PENDING state={state}, attempt {attempt + 1}/{max_attempts}, sleeping {interval}s...",
+                    flush=True,
+                )
+                break
+
+        if not found:
+            print(
+                f"Job '{job_substring}' not found yet, attempt {attempt + 1}/{max_attempts}, sleeping {interval}s...",
+                flush=True,
+            )
+
+        time.sleep(interval)
+
+    print("CI_TIMEOUT")
+    return 1
+
+
+if __name__ == "__main__":
+    if len(sys.argv) < 2:
+        print(f"Usage: {sys.argv[0]} <pr_number> [job_substring] [max_attempts] [interval_seconds]")
+        sys.exit(2)
+
+    pr = sys.argv[1]
+    job = sys.argv[2] if len(sys.argv) > 2 else "e2e-incidents"
+    attempts = int(sys.argv[3]) if len(sys.argv) > 3 else 30
+    interval = int(sys.argv[4]) if len(sys.argv) > 4 else 300
+
+    sys.exit(poll(pr, job, attempts, interval))
diff --git a/.claude/commands/cypress/scripts/review-github.py b/.claude/commands/cypress/scripts/review-github.py
new file mode 100644
index 000000000..57877c103
--- /dev/null
+++ b/.claude/commands/cypress/scripts/review-github.py
@@ -0,0 +1,232 @@
+#!/usr/bin/env python3
+"""GitHub PR comment-based review flow for agentic test iteration.
+
+Posts fix details as PR comments and polls for author replies within a
+timed review window. Designed to work alongside Slack webhook notifications
+(one-way) — GitHub PR comments provide the two-way interaction channel.
+
+Usage:
+    # Post a review comment on a PR
+    python3 review-github.py post <pr_number> <message> [--repo owner/repo]
+
+    # Wait for author reply within a review window
+    python3 review-github.py wait <pr_number> <since_timestamp> [--timeout 600] [--repo owner/repo]
+
+Output formats:
+    post:  COMMENT_ID=<id>  COMMENT_TIME=<iso_timestamp>
+    wait:  REPLY=<text>     (author replied)
+           NO_REPLY         (timeout reached, no author reply)
+
+Requires: gh CLI authenticated with comment access to the target repo.
+
+Security: Author filtering is enforced deterministically in code —
+the PR author's login is fetched via API and only comments from that
+user are considered. This is not instruction-based filtering.
+"""
+
+import argparse
+import json
+import subprocess
+import sys
+import time
+from datetime import datetime, timezone
+
+
+DEFAULT_REPO = "openshift/monitoring-plugin"
+MAGIC_PREFIX = "/agent"
+
+
+def gh_api(endpoint, method="GET", body=None, repo=None):
+    """Call GitHub API via gh CLI."""
+    cmd = ["gh", "api"]
+    if repo:
+        endpoint = endpoint.replace("{repo}", repo)
+    if method != "GET":
+        cmd.extend(["--method", method])
+    if body:
+        for key, value in body.items():
+            cmd.extend(["-f", f"{key}={value}"])
+    cmd.append(endpoint)
+
+    result = subprocess.run(cmd, capture_output=True, text=True)
+    if result.returncode != 0:
+        print(f"gh api failed: {result.stderr.strip()}", file=sys.stderr)
+        return None
+
+    if not result.stdout.strip():
+        return {}
+
+    try:
+        return json.loads(result.stdout)
+    except json.JSONDecodeError:
+        print(f"Invalid JSON from gh api: {result.stdout[:200]}", file=sys.stderr)
+        return None
+
+
+def get_pr_author(pr, repo):
+    """Fetch the PR author's login."""
+    data = gh_api(f"repos/{repo}/pulls/{pr}")
+    if data and "user" in data:
+        return data["user"]["login"]
+    return None
+
+
+def post_comment(pr, message, repo):
+    """Post a comment on a PR. Returns (comment_id, created_at)."""
+    data = gh_api(
+        f"repos/{repo}/issues/{pr}/comments",
+        method="POST",
+        body={"body": message},
+    )
+    if data and "id" in data:
+        comment_id = data["id"]
+        created_at = data.get("created_at", "")
+        print(f"COMMENT_ID={comment_id}")
+        print(f"COMMENT_TIME={created_at}")
+        return comment_id, created_at
+
+    print("Failed to post comment", file=sys.stderr)
+    return None, None
+
+
+def wait_for_author_reply(pr, since_timestamp, repo, timeout=600, poll_interval=30):
+    """Poll PR comments for a reply from the PR author.
+
+    Only considers comments that:
+    1. Were posted AFTER since_timestamp (time-scoped)
+    2. Were authored by the PR author (deterministic .user.login check)
+    3. Optionally start with the magic prefix /agent (if present, stripped from reply)
+
+    Args:
+        pr: PR number
+        since_timestamp: ISO 8601 timestamp — only comments after this are considered
+        repo: owner/repo string
+        timeout: seconds to wait before giving up
+        poll_interval: seconds between polls
+
+    Returns:
+        Reply text if found, None otherwise.
+    """
+    # Fetch PR author login — deterministic, code-enforced filter
+    pr_author = get_pr_author(pr, repo)
+    if not pr_author:
+        print("Could not determine PR author. Proceeding without review.", file=sys.stderr)
+        print("NO_REPLY")
+        return None
+
+    print(f"Waiting up to {timeout}s for reply from @{pr_author} on PR #{pr}...", flush=True)
+
+    deadline = time.time() + timeout
+    seen_ids = set()
+
+    while time.time() < deadline:
+        # Fetch comments created after since_timestamp
+        comments = gh_api(
+            f"repos/{repo}/issues/{pr}/comments?since={since_timestamp}&per_page=50"
+        )
+
+        if comments is None:
+            remaining = int(deadline - time.time())
+            if remaining > 0:
+                print(f"API error, retrying in {poll_interval}s ({remaining}s remaining)...",
+                      file=sys.stderr, flush=True)
+                time.sleep(min(poll_interval, max(1, remaining)))
+            continue
+
+        for comment in comments:
+            comment_id = comment.get("id")
+            if comment_id in seen_ids:
+                continue
+            seen_ids.add(comment_id)
+
+            # Deterministic author filter — code-enforced, not instruction-based
+            commenter = comment.get("user", {}).get("login", "")
+            if commenter != pr_author:
+                continue
+
+            body = comment.get("body", "").strip()
+
+            # If magic prefix is used, strip it; otherwise accept any author comment
+            if body.startswith(MAGIC_PREFIX):
+                body = body[len(MAGIC_PREFIX):].strip()
+
+            if body:
+                print(f"REPLY={body}")
+                return body
+
+        remaining = int(deadline - time.time())
+        if remaining > 0:
+            print(
+                f"No reply yet from @{pr_author}, {remaining}s remaining...",
+                file=sys.stderr,
+                flush=True,
+            )
+            time.sleep(min(poll_interval, max(1, remaining)))
+
+    print("NO_REPLY")
+    return None
+
+
+def format_fix_comment(message):
+    """Wrap the agent's message in a standard comment format."""
+    return (
+        "### Agent: Fix Applied\n\n"
+        f"{message}\n\n"
+        "---\n"
+        f"*Reply to this comment (or prefix with `{MAGIC_PREFIX}`) to provide feedback. "
+        "The agent will incorporate your input before pushing, or proceed automatically "
+        "after the review window expires.*"
+    )
+
+
+def cmd_post(args):
+    """Handle the 'post' subcommand."""
+    formatted = format_fix_comment(args.message)
+    comment_id, created_at = post_comment(args.pr, formatted, args.repo)
+    return 0 if comment_id else 1
+
+
+def cmd_wait(args):
+    """Handle the 'wait' subcommand."""
+    wait_for_author_reply(
+        args.pr, args.since, args.repo, timeout=args.timeout
+    )
+    return 0
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="GitHub PR comment-based review for agentic test iteration"
+    )
+    parser.add_argument(
+        "--repo", default=DEFAULT_REPO,
+        help=f"GitHub repo (default: {DEFAULT_REPO})"
+    )
+    subparsers = parser.add_subparsers(dest="command", required=True)
+
+    # 'post' subcommand
+    post_parser = subparsers.add_parser("post", help="Post a review comment on a PR")
+    post_parser.add_argument("pr", help="PR number")
+    post_parser.add_argument("message", help="Comment body (markdown supported)")
+
+    # 'wait' subcommand
+    wait_parser = subparsers.add_parser(
+        "wait", help="Wait for author reply on a PR"
+    )
+    wait_parser.add_argument("pr", help="PR number")
+    wait_parser.add_argument("since", help="ISO 8601 timestamp — only consider comments after this")
+    wait_parser.add_argument(
+        "--timeout", type=int, default=600,
+        help="Seconds to wait for reply (default: 600)"
+    )
+
+    args = parser.parse_args()
+
+    if args.command == "post":
+        return cmd_post(args)
+    elif args.command == "wait":
+        return cmd_wait(args)
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/.claude/commands/diagnose-test-failure.md b/.claude/commands/diagnose-test-failure.md
new file mode 100644
index 000000000..6c8185e49
--- /dev/null
+++ b/.claude/commands/diagnose-test-failure.md
@@ -0,0 +1,167 @@
+---
+name: diagnose-test-failure
+description: Diagnose a Cypress test failure using error output, screenshots, and codebase analysis
+parameters:
+  - name: test-name
+    description: "Full title of the failing test (from mochawesome 'fullTitle' or Cypress output)"
+    required: true
+  - name: spec-file
+    description: "Path to the spec file (e.g., cypress/e2e/incidents/regression/01.reg_filtering.cy.ts)"
+    required: true
+  - name: error-message
+    description: "The error message from the test failure"
+    required: true
+  - name: screenshot-path
+    description: "Absolute path to the failure screenshot (will be read with multimodal vision)"
+    required: false
+  - name: stack-trace
+    description: "The error stack trace (estack from mochawesome)"
+    required: false
+  - name: ci-context
+    description: "Optional context from /analyze-ci-results (commit correlation, infra status)"
+    required: false
+---
+
+# Diagnose Test Failure
+
+Analyze a Cypress test failure to determine root cause and recommend a fix. This skill is used by the `/iterate-incident-tests` orchestrator but can also be invoked standalone.
+
+## Diagnosis Protocol
+
+**IMPORTANT**: Follow this order. Visual evidence first, then code analysis.
+
+### Step 1: Read the Screenshot (if available)
+
+If `screenshot-path` is provided, read it using the Read tool (multimodal).
+
+Describe what you see:
+- What page/view is displayed?
+- Is the expected UI element visible? If not, what's in its place?
+- Are there error dialogs, loading spinners, empty states, or overlays?
+- Is the page fully loaded or still loading?
+- Are there any browser console errors visible?
+- Does the layout look correct (no overlapping elements, correct positioning)?
+
+This visual context often reveals the root cause faster than reading code.
+
+### Step 2: Read the Test Code
+
+Read the spec file at `spec-file`. Find the failing test by matching `test-name`.
+
+Identify:
+- What the test is trying to do (user actions + assertions)
+- Which page object methods it calls
+- Which fixture it loads (look at `before`/`beforeEach` hooks)
+- The specific assertion or command that failed
+- Whether the failure is in a `before all` hook (affects all tests in suite) or a specific `it()` block
+
+### Step 3: Read the Page Object
+
+Read `web/cypress/views/incidents-page.ts`.
+
+For each page object method used by the failing test:
+- Check the selector — does it match current DOM conventions?
+- Check for hardcoded waits vs proper Cypress chaining
+- Look for methods that might be missing or outdated
+
+### Step 4: Read the Fixture (if applicable)
+
+If the test uses `cy.mockIncidentFixture('...')`, read the fixture YAML file.
+
+Check:
+- Does the fixture have the incidents/alerts the test expects?
+- Are severities, states, components, timelines correct?
+- Are there edge cases (empty arrays, missing fields, zero-duration timelines)?
+
+### Step 5: Read the Mock Layer (if relevant)
+
+If the error suggests an API/intercept issue, read relevant files in `cypress/support/incidents_prometheus_query_mocks/`:
+- `prometheus-mocks.ts` — intercept setup and route matching
+- `mock-generators.ts` — response data generation
+- `types.ts` — type definitions for fixtures
+
+Check:
+- Does the intercept URL pattern match the actual API call?
+- Is the response shape what the UI code expects?
+- Are query parameters (group_id, alertname, severity) handled correctly?
+
+### Step 6: Cross-reference with Error
+
+Now combine visual evidence + code analysis + error message to determine root cause.
+
+**Common patterns:**
+
+| Error Pattern | Likely Cause |
+|--------------|--------------|
+| `Timed out retrying after Nms: Expected to find element: .selector` | Selector wrong, element not rendered, or page not loaded |
+| `Expected N to equal M` (counts) | Fixture doesn't have enough data, or filter state is wrong |
+| `expected true to be false` / vice versa | Assertion logic inverted |
+| `Cannot read properties of undefined` | Page object method returns wrong element, or DOM structure changed |
+| `cy.intercept() matched no requests` | Mock intercept URL doesn't match actual API call |
+| `Timed out retrying` on `.should('be.visible')` | Element exists but hidden (z-index, opacity, overflow, display:none) |
+| `before all hook` failure | Setup issue — fixture load, navigation, or login failed |
+| `detached from the DOM` | Element re-rendered between find and action — needs `.should('exist')` guard |
+| `e is not a function` / runtime JS error | Application code bug, not test issue |
+| `x509: certificate` / `Unable to connect` | Infrastructure issue |
+
+### Step 7: Classify and Recommend
+
+Output your diagnosis in this exact format:
+
+```
+## Diagnosis
+
+**Classification**: TEST_BUG | FIXTURE_ISSUE | PAGE_OBJECT_GAP | MOCK_ISSUE | REAL_REGRESSION | INFRA_ISSUE
+
+**Confidence**: HIGH | MEDIUM | LOW
+
+**Root Cause**:
+[1-3 sentence explanation of what's wrong and why]
+
+**Evidence**:
+- Screenshot: [what the screenshot showed]
+- Error: [what the error message tells us]
+- Code: [what the code analysis revealed]
+
+**Recommended Fix**:
+- File: [path to file that needs editing]
+- Change: [specific description of what to change]
+- [If multiple files need changing, list each]
+
+**Risk Assessment**:
+- Will this fix affect other tests? [yes/no and why]
+- Could this mask a real bug? [yes/no and why]
+
+**Alternative Hypotheses**:
+- [If confidence is MEDIUM or LOW, list other possible causes]
+```
+
+## Classification Reference
+
+### Auto-fixable (proceed with Fix Agent)
+
+| Classification | Description | Examples |
+|---------------|-------------|----------|
+| `TEST_BUG` | Test code is wrong | Wrong selector, incorrect assertion value, missing wait, wrong test order dependency |
+| `FIXTURE_ISSUE` | Test data is wrong | Missing incident in fixture, wrong severity, timeline doesn't cover test's time window |
+| `PAGE_OBJECT_GAP` | Page object needs update | Selector targets old class name, method missing for new UI element, method returns wrong element |
+| `MOCK_ISSUE` | API mock is wrong | Intercept URL pattern outdated, response missing required field, query filter not handled |
+
+### Not auto-fixable (report to user)
+
+| Classification | Description | Examples |
+|---------------|-------------|----------|
+| `REAL_REGRESSION` | UI code has a bug | Component doesn't render, wrong data displayed, broken interaction |
+| `INFRA_ISSUE` | Environment problem | Cluster down, cert expired, operator not installed, console unreachable |
+
+### Distinguishing TEST_BUG from REAL_REGRESSION
+
+This is the hardest classification. Use these heuristics:
+
+1. **Was the test ever passing?** If it's a new test, lean toward `TEST_BUG`. If it was passing before, check what changed.
+2. **Does the screenshot show the UI working correctly but the test expecting something different?** → `TEST_BUG`
+3. **Does the screenshot show the UI broken (empty state, error, wrong data)?** → Likely `REAL_REGRESSION`
+4. **Do other tests in the same suite pass?** If yes, the infra/app is fine → `TEST_BUG` or `FIXTURE_ISSUE`
+5. **If CI context is available**: Check if the source code was modified in the PR. Modified source + broken test = likely `REAL_REGRESSION`
+
+When in doubt, classify as `REAL_REGRESSION` — it's safer to report a false positive to the user than to silently "fix" a test that was correctly catching a bug.
diff --git a/.claude/commands/iterate-ci-flaky.md b/.claude/commands/iterate-ci-flaky.md
new file mode 100644
index 000000000..9f99418da
--- /dev/null
+++ b/.claude/commands/iterate-ci-flaky.md
@@ -0,0 +1,416 @@
+---
+name: iterate-ci-flaky
+description: Iterate on flaky Cypress tests against OpenShift CI presubmit jobs — push fixes, trigger CI, analyze results, repeat
+parameters:
+  - name: pr
+    description: "PR number to iterate on (e.g., 857)"
+    required: true
+  - name: max-iterations
+    description: "Maximum fix-push-wait cycles (default: 3)"
+    required: false
+  - name: confirm-runs
+    description: "Number of green CI runs required to declare stable (default: 2)"
+    required: false
+  - name: job
+    description: "Prow job name to target (default: pull-ci-openshift-monitoring-plugin-main-e2e-incidents)"
+    required: false
+  - name: focus
+    description: "Optional: focus analysis on specific test area (e.g., 'regression', 'filtering')"
+    required: false
+  - name: review-window
+    description: "Seconds to wait for user feedback after posting fix to Slack before pushing (default: 0 = no wait). Requires Option B Slack setup."
+    required: false
+---
+
+# Iterate CI Flaky Tests
+
+Fix flaky Cypress tests by iterating against real OpenShift CI presubmit jobs. Pushes fixes, triggers CI, waits for results, analyzes failures, and repeats until stable.
+
+## Prerequisites
+
+### 1. GitHub CLI Authentication
+
+```bash
+gh auth status
+```
+
+Must be logged in with comment access to `openshift/monitoring-plugin` (for `/test` comments to trigger Prow CI).
+
+**Recommended auth method**: `gh auth login --web` (OAuth via browser). This uses your GitHub user's existing org permissions — no PAT scope management needed. Revocable anytime at GitHub → Settings → Applications.
+
+**Why not a PAT?**
+- Fine-grained PATs can only scope repos you own — you can't add `openshift/monitoring-plugin` as a contributor.
+- Classic PATs with `public_repo` scope work but grant broader access than needed.
+- OAuth via `--web` uses the GitHub CLI OAuth app which requests only the permissions it needs and inherits your org membership.
+
+**Push access**: Git push to your fork uses SSH (`origin` remote) — this is independent of the `gh` token.
+
+**Fallback**: If the token lacks upstream comment permissions, the agent will report the blocker and ask you to post the `/test` comment manually on the PR page.
+
+### 2. Permissions
+
+Required in `.claude/settings.local.json`:
+
+```json
+{
+  "permissions": {
+    "allow": [
+      "Bash(gh auth:*)",
+      "Bash(gh api:*)",
+      "Bash(gh pr:*)",
+      "Bash(git push:*)",
+      "Bash(git add:*)",
+      "Bash(git commit:*)",
+      "Bash(git status:*)",
+      "Bash(git diff:*)",
+      "Bash(git log:*)",
+      "Bash(git rev-parse:*)",
+      "Bash(git -C:*)",
+      "Bash(git checkout:*)",
+      "Bash(git fetch:*)",
+      "Bash(python3:*)",
+      "Bash(find screenshots:*)",
+      "Bash(find cypress/screenshots:*)",
+      "Bash(find cypress/videos:*)",
+      "WebFetch(domain:gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com)"
+    ]
+  }
+}
+```
+
+### 3. Notifications & Review (optional)
+
+Notifications and review are optional — if not configured, the script prints to stdout and the loop continues normally.
+
+**Slack Notifications (one-way):**
+```bash
+export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T.../B.../..."
+```
+Setup: Slack → Apps → Incoming Webhooks → create webhook for your channel. 5 minutes.
+Provides one-way status notifications at key events (ci_started, ci_failed, fix_applied, etc.).
+
+**GitHub PR Comment Review (two-way):**
+
+The `review-window` parameter enables a two-way review flow using GitHub PR comments. When a fix is ready:
+
+1. Agent posts fix details as a PR comment (via `review-github.py post`)
+2. Agent also sends a Slack webhook notification (if configured)
+3. Agent waits `review-window` seconds for a reply from the **PR author only**
+4. If the author replies on the PR — agent reads the feedback and adjusts the fix
+5. If no reply within the window — agent proceeds autonomously
+
+**Security**: Author filtering is **code-enforced** in `review-github.py` — only comments where `.user.login` matches the PR author are considered. This is deterministic, not instruction-based.
+
+**How to reply**: Post a regular comment on the PR. The agent only reads comments from the PR author posted after the agent's notification. Optionally prefix with `/agent` for clarity.
+
+No additional setup needed beyond `gh auth` (Step 1) — the same token used for `/test` comments is used for posting and reading review comments.
+
+Both Slack webhook URL and review-window can be set in `cypress/export-env.sh` or `~/.zshrc`.
+
+### 4. Unsigned Commits
+
+Same as `/iterate-incident-tests` — all commits use `--no-gpg-sign`. They live on a PR branch and are squash-merged by the user.
+
+## Instructions
+
+**IMPORTANT — Autonomous Execution Rules:**
+- **Never chain commands** with `&&` or `|` — use separate Bash calls for each operation. Compound commands and pipes trigger security prompts that block autonomous execution.
+- **Never combine `cd` with other commands** — `cd && git` triggers an unskippable security prompt.
+- When you need to process command output (e.g., parse JSON), capture it with a Bash call first, then process it in a second call or read the output directly.
+
+### Step 1: Gather PR Context
+
+Fetch PR metadata:
+```bash
+gh pr view {pr} --json headRefName,headRefOid,baseRefName,number,title,url,author,statusCheckRollup
+```
+
+Extract:
+- **Branch**: `headRefName`
+- **HEAD SHA**: `headRefOid`
+- **Check runs**: from `statusCheckRollup`, find the job matching `{job}` (default: `pull-ci-openshift-monitoring-plugin-main-e2e-incidents`)
+
+Check out the PR branch locally:
+```bash
+git fetch origin {headRefName}
+```
+```bash
+git checkout {headRefName}
+```
+
+Present summary:
+```
+PR #{pr}: {title}
+Branch: {headRefName}
+HEAD: {short_sha}
+CI job: {job}
+Latest run status: {SUCCESS|FAILURE|PENDING|none}
+```
+
+### Step 2: Determine Current CI State
+
+From the status check rollup, determine the state of the target job:
+
+- **SUCCESS**: Skip to Step 5 (flakiness confirmation — was it truly stable?)
+- **FAILURE**: Proceed to Step 3 (analyze the failure)
+- **PENDING / IN_PROGRESS**: Skip to Step 4 (wait for it)
+- **No run found**: Trigger one in Step 3
+
+### Step 3: Trigger CI Run (if needed)
+
+If there's no recent run, or a fix was just pushed:
+
+```bash
+gh api repos/openshift/monitoring-plugin/issues/{pr}/comments -f body="/test e2e-incidents"
+```
+
+**IMPORTANT**: The `/test` command uses the **short alias** (`e2e-incidents`), not the full Prow job name. Using the full name will fail with "specified target(s) for /test were not found."
+
+Note: If you just pushed a commit in Step 6, the push automatically triggers Prow — you can skip the `/test` comment. Only use `/test` for:
+- Retriggering without code changes (flakiness retry)
+- The initial run if none exists
+
+After triggering, notify and proceed to Step 4:
+```bash
+python3 .claude/commands/cypress/scripts/notify-slack.py send ci_started "CI triggered for PR #{pr}. Polling for results (~2h)." --pr {pr} --branch {headRefName}
+```
+
+### Step 4: Wait for CI Completion
+
+Use the polling script at `.claude/commands/cypress/scripts/poll-ci-status.py`:
+
+```bash
+python3 .claude/commands/cypress/scripts/poll-ci-status.py {pr}
+```
+
+Arguments: `<pr_number> [job_substring] [max_attempts] [interval_seconds]`
+- Default job substring: `e2e-incidents`
+- Default max attempts: 30 (150 minutes at 5-minute intervals)
+- Default interval: 300 seconds
+
+Run this with `run_in_background: true` and a timeout of 9000000ms (150 minutes).
+
+When the background task completes, parse the output line starting with `CI_COMPLETE`:
+- Extract `state` (SUCCESS or FAILURE)
+- Extract `url` (Prow URL for the run)
+
+### Step 5: Analyze CI Results
+
+Convert the Prow URL to a gcsweb URL:
+- Replace `https://prow.ci.openshift.org/view/gs/` with `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/`
+
+Run `/analyze-ci-results` (or follow its instructions inline):
+
+1. Fetch `started.json`, `finished.json`, `prowjob.json` for metadata
+2. Fetch `build-log.txt` from the test artifacts path
+3. List and fetch failure screenshots
+4. Classify each failure
+
+**Classification outcomes:**
+
+| Classification | Action |
+|---------------|--------|
+| `INFRA_*` | Report to user. Optionally retrigger with `/retest` (Step 3). Do NOT attempt code fixes. |
+| `TEST_BUG` | Diagnose and fix locally (Step 6) |
+| `FIXTURE_ISSUE` | Diagnose and fix locally (Step 6) |
+| `PAGE_OBJECT_GAP` | Diagnose and fix locally (Step 6) |
+| `MOCK_ISSUE` | Diagnose and fix locally (Step 6) |
+| `CODE_REGRESSION` | Report to user and **STOP** |
+
+Notify after analysis:
+
+If failures:
+```bash
+python3 .claude/commands/cypress/scripts/notify-slack.py send ci_failed "{N} failures found: {test_names}. Diagnosing..." --pr {pr} --branch {headRefName} --url {ci_url}
+```
+
+If all green:
+```bash
+python3 .claude/commands/cypress/scripts/notify-slack.py send ci_complete "All tests passed. Starting flakiness confirmation." --pr {pr} --branch {headRefName} --url {ci_url}
+```
+
+If `CODE_REGRESSION` or `INFRA_*` blocks the loop:
+```bash
+python3 .claude/commands/cypress/scripts/notify-slack.py send blocked "{classification}: {description}. Agent stopped — needs human input." --pr {pr} --branch {headRefName}
+```
+
+If **all green** (SUCCESS): Proceed to Step 7 (flakiness confirmation).
+
+### Step 6: Fix and Push
+
+For each fixable failure:
+
+1. **Diagnose** using `/diagnose-test-failure` (read screenshots, test code, fixtures, page object)
+2. **Fix** — edit the relevant files. Same constraints as `/iterate-incident-tests`:
+   - May edit: `cypress/e2e/incidents/**`, `cypress/fixtures/incident-scenarios/**`, `cypress/views/incidents-page.ts`, `cypress/support/incidents_prometheus_query_mocks/**`
+   - Must NOT edit: `src/**`, non-incident tests, cypress config
+3. **Validate locally** (optional but recommended if cluster is accessible):
+   ```bash
+   source cypress/export-env.sh && npx cypress run --spec "{SPEC}" --env grep="{TEST_NAME}"
+   ```
+4. **Commit**:
+   ```bash
+   git add {files}
+   ```
+   ```bash
+   git commit --no-gpg-sign -m "fix(tests): {summary}
+
+   CI run: {prow_url}
+   Classifications: {list}
+
+   Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>"
+   ```
+
+5. **Notify and review window** (before pushing):
+
+   **a) Slack notification** (one-way, if configured):
+   ```bash
+   python3 .claude/commands/cypress/scripts/notify-slack.py send fix_applied "*What changed:*\n• {file}: {change_description}\n\n*Why:* {diagnosis_summary}\n*Classification:* {classification} (confidence: {confidence})\n\n`git diff HEAD~1` on branch `{headRefName}`" --pr {pr} --branch {headRefName}
+   ```
+
+   **b) GitHub PR review comment** (two-way, if `review-window` > 0):
+
+   Post fix details as a PR comment:
+   ```bash
+   python3 .claude/commands/cypress/scripts/review-github.py post {pr} "**What changed:**\n• {file}: {change_description}\n\n**Why:** {diagnosis_summary}\n**Classification:** {classification} (confidence: {confidence})\n\n\`git diff HEAD~1\` on branch \`{headRefName}\`"
+   ```
+
+   Capture `COMMENT_TIME` from the output, then wait for author reply:
+   ```bash
+   python3 .claude/commands/cypress/scripts/review-github.py wait {pr} {COMMENT_TIME} --timeout {review-window}
+   ```
+
+   Parse the output:
+   - `REPLY=<text>`: PR author provided feedback. Read the reply text and adjust the fix accordingly. This may mean:
+     - Reverting the commit (`git reset --soft HEAD~1`), applying the user's suggestion, and re-committing
+     - Or making an additional commit on top with the adjustment
+   - `NO_REPLY`: No feedback within the window. Proceed with push.
+
+   **Note**: The `wait` command only considers comments from the PR author (`.user.login` match, code-enforced). Comments from other users or bots are ignored.
+
+6. **Push**:
+   ```bash
+   git push origin {headRefName}
+   ```
+
+The push automatically triggers a new Prow run. Go to **Step 4** (wait for CI).
+
+Track iteration count. If `current_iteration >= max-iterations`: Report remaining failures and **STOP**.
+
+### Step 7: Flakiness Confirmation
+
+A single green CI run doesn't prove stability. Trigger `confirm-runs` additional runs (default: 2) to confirm.
+
+For each confirmation run:
+
+1. Trigger via `/test` comment (no code changes):
+   ```bash
+   gh api repos/openshift/monitoring-plugin/issues/{pr}/comments -f body="/test e2e-incidents"
+   ```
+
+2. Wait for completion (Step 4)
+
+3. Analyze results (Step 5)
+
+4. If failures found:
+   - If same test fails across runs → likely a real bug, diagnose and fix (Step 6)
+   - If different tests fail across runs → environment-dependent flakiness, harder to fix
+   - Report flakiness pattern to user
+
+Track results across all runs:
+```
+Stability Report:
+  Run 1 (fix iteration): {SHA} — PASSED
+  Run 2 (confirm #1):    {SHA} — PASSED
+  Run 3 (confirm #2):    {SHA} — PASSED (or FAILED: test X)
+```
+
+### Step 8: Final Report
+
+```
+# CI Flaky Test Iteration Report
+
+## PR: #{pr} - {title}
+## Branch: {headRefName}
+## Iterations: {N}
+
+## Timeline
+1. [SHA] Initial state — CI FAILURE
+   - {N} failures: {test names}
+2. [SHA] fix(tests): {summary} — pushed, CI triggered
+3. [SHA] CI result: PASSED
+4. Confirmation run 1: PASSED
+5. Confirmation run 2: PASSED
+
+## Fixes Applied
+1. [commit] fix(tests): {summary}
+   - {file}: {change}
+   CI run: {prow_url}
+
+## Stability Assessment
+- Tests stable: {N}/{total} (passed all runs)
+- Tests flaky: {N} (intermittent failures)
+- Tests broken: {N} (failed every run)
+
+## Flaky Test Details (if any)
+- "test name": passed 2/3 runs
+  Failure pattern: {timing issue / element not found / etc.}
+  Fix attempted: {yes/no}
+
+## Remaining Issues
+- {any unresolved items}
+
+## Recommendations
+- {merge / needs more investigation / etc.}
+```
+
+After generating the report, send the final notification:
+```bash
+python3 .claude/commands/cypress/scripts/notify-slack.py send iteration_done "Iteration complete: {passed}/{total} passed, {flaky} flaky, {iterations} cycles.\n\n{short_summary}" --pr {pr} --branch {headRefName}
+```
+
+### Step 9: Update Stability Ledger
+
+After the final report, update `web/cypress/reports/test-stability.md`.
+
+Read the file and update both sections:
+
+**1. Current Status table** — for each test in this run:
+- If test already in table: update pass rate, update trend
+- If test is new: add a row
+- Pass rate = total passes / total runs across all recorded iterations
+- Trend: compare last 3 runs — improving / stable / degrading
+
+**2. Run History log** — append a new row:
+```
+| {next_number} | {YYYY-MM-DD} | ci | {branch} | {total_tests} | {passed} | {failed} | {flaky} | {commit_sha} |
+```
+
+**3. Machine-readable data** — update the JSON block between `STABILITY_DATA_START` and `STABILITY_DATA_END` with the new run data.
+
+Commit:
+```bash
+git add web/cypress/reports/test-stability.md
+```
+```bash
+git commit --no-gpg-sign -m "docs: update test stability ledger — {passed}/{total} passed, {flaky} flaky (CI)"
+```
+
+## Error Handling
+
+- **Push rejected** (branch protection, force push required): Report to user. Do NOT force push.
+- **`/test` comment ignored by Prow**: User may lack `ok-to-test` permission. Check if the label exists on the PR: `gh pr view {pr} --json labels`.
+- **CI timeout** (>150 min): Report timeout, check if the job is stuck. Suggest manual inspection.
+- **Multiple CI jobs running**: Only track the latest run. Use the `detailsUrl` from the most recent check run.
+- **Merge conflicts after push**: Report to user. The PR branch may need rebasing — do NOT rebase automatically.
+- **Rate limiting on gh api**: GitHub allows 5000 requests/hour for authenticated users. Polling every 5 min = 12/hour, well within limits.
+
+## Guardrails
+
+- **Never force-push** — always additive commits
+- **Never push to main** — only to the PR branch
+- **Never edit source code** (`src/`) — only test infrastructure
+- **Never close or merge the PR** — that's the user's decision
+- **Max 3 `/test` comments per hour** — avoid spamming the PR
+- **Always include the CI run URL** in commit messages for traceability
+- **Stop on CODE_REGRESSION** — if the UI is genuinely broken, that's not a flaky test
diff --git a/.claude/commands/iterate-incident-tests.md b/.claude/commands/iterate-incident-tests.md
new file mode 100644
index 000000000..246848a9a
--- /dev/null
+++ b/.claude/commands/iterate-incident-tests.md
@@ -0,0 +1,465 @@
+---
+name: iterate-incident-tests
+description: Autonomously run, diagnose, fix, and verify incident detection Cypress tests with flakiness probing
+parameters:
+  - name: target
+    description: >
+      What to test. Options:
+      - "all" — all incident tests (excluding @e2e-real)
+      - "regression" — only regression/ directory tests
+      - a specific spec file path (e.g., "cypress/e2e/incidents/01.incidents.cy.ts")
+      - a grep pattern for a specific test (e.g., "should filter by severity")
+    required: true
+  - name: max-iterations
+    description: "Maximum fix-and-retry cycles (default: 3)"
+    required: false
+  - name: ci-url
+    description: "Optional: gcsweb or Prow URL for CI results to use as starting context (triggers /analyze-ci-results first)"
+    required: false
+  - name: flakiness-runs
+    description: "Number of flakiness probe runs (default: 3). Set to 0 to skip flakiness probing"
+    required: false
+  - name: skip-branch
+    description: "If 'true', work on current branch instead of creating a new one (default: false)"
+    required: false
+---
+
+# Iterate Incident Tests
+
+Autonomous test iteration loop: run tests, diagnose failures, apply fixes, verify, and probe for flakiness.
+
+## Prerequisites
+
+### 1. Cypress Environment
+
+Run `/cypress-setup` first to ensure `web/cypress/export-env.sh` exists with cluster credentials.
+
+### 2. Permissions
+
+This skill runs autonomously and needs pre-approved permissions in `.claude/settings.local.json` to avoid interactive approval prompts blocking the loop. Required permissions:
+
+```json
+{
+  "permissions": {
+    "allow": [
+      "Bash(git stash:*)",
+      "Bash(git checkout:*)",
+      "Bash(git checkout -b:*)",
+      "Bash(git branch:*)",
+      "Bash(git add:*)",
+      "Bash(git commit:*)",
+      "Bash(git status:*)",
+      "Bash(git diff:*)",
+      "Bash(git log:*)",
+      "Bash(rm -f screenshots/cypress_report_*.json:*)",
+      "Bash(rm -f screenshots/merged-report.json:*)",
+      "Bash(rm -rf cypress/screenshots/*:*)",
+      "Bash(rm -rf cypress/videos/*:*)",
+      "Bash(npx cypress run:*)",
+      "Bash(npx mochawesome-merge:*)",
+      "Bash(source cypress/export-env.sh:*)",
+      "Bash(cd /home/drajnoha/Code/monitoring-plugin:*)",
+      "Bash(find /home/drajnoha/Code/monitoring-plugin/web/cypress:*)",
+      "Bash(ls:*)"
+    ]
+  }
+}
+```
+
+The `rm` permissions are scoped to test artifact directories only (mochawesome reports, screenshots, videos) — these are regenerated every run.
+
+### 3. Unsigned Commits
+
+All commits in this workflow use `--no-gpg-sign` to avoid GPG passphrase prompts blocking the loop. These unsigned commits live on a working branch and are intended to be **squash-merged** by the user with their own signature when approved. Never push unsigned commits directly to main.
+
+If using CI analysis, also add to `web/.claude/settings.local.json`:
+```json
+"WebFetch(domain:gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com)"
+```
+
+## Instructions
+
+Execute the following steps in order. This is the main orchestrator — it coordinates sub-agents and manages the iteration loop.
+
+### Step 0: CI Context (optional)
+
+If `ci-url` is provided, run `/analyze-ci-results` first to get CI failure context.
+
+Capture the CI analysis output:
+- If **all failures are INFRA_***: Report the infrastructure issues to the user and **STOP**. No test changes will help.
+- If **mixed infra + test/code**: Note the infra issues for the user, but proceed with the test/code failures only.
+- If **all test/code**: Proceed. Use the CI diagnosis (commit correlation, screenshots) as context for the local iteration.
+
+Store the CI analysis as `ci_context` for later reference by diagnosis agents.
+
+### Step 1: Branch Setup
+
+First, check the current branch:
+```bash
+git rev-parse --abbrev-ref HEAD
+```
+
+**Decision logic:**
+- If `skip-branch` is "true": Stay on the current branch, skip to Step 2.
+- If already on a `test/incident-robustness-*` branch: Stay on it, skip to Step 2.
+- If on any other non-main working branch (e.g., `agentic-test-iteration`, a feature branch): Ask the user whether to create a child branch or work on the current one.
+- If on `main`: Create a new branch.
+
+To create a branch (only when needed):
+```bash
+git checkout -b test/incident-robustness-$(date +%Y-%m-%d)
+```
+
+If that branch name already exists, append a suffix: `-2`, `-3`, etc.
+
+**IMPORTANT**: Do NOT combine `cd` and `git` in the same command — compound `cd && git` commands trigger a security approval prompt that blocks autonomous execution. Always use separate Bash calls, or set the working directory before running git.
+
+### Step 2: Resolve Target
+
+Based on the `target` parameter, determine the Cypress run command:
+
+| Target | Spec | Grep Tags |
+|--------|------|-----------|
+| `all` | `cypress/e2e/incidents/**/*.cy.ts` | `@incidents --@e2e-real --@flaky --@demo` |
+| `regression` | `cypress/e2e/incidents/regression/**/*.cy.ts` | `@incidents --@e2e-real --@flaky` |
+| specific file | `cypress/e2e/incidents/{target}` | (none) |
+| grep pattern | `cypress/e2e/incidents/**/*.cy.ts` | (none, use `--env grep="{target}"`) |
+
+### Step 3: Clean Previous Results
+
+**IMPORTANT**: Never chain commands with `&&`. Use separate Bash calls for each operation — compound commands trigger security prompts that block autonomous execution.
+
+From the `web/` directory:
+```bash
+rm -f screenshots/cypress_report_*.json
+```
+```bash
+rm -f screenshots/merged-report.json
+```
+```bash
+rm -rf cypress/screenshots/*
+```
+```bash
+rm -rf cypress/videos/*
+```
+
+### Step 4: Run Tests
+
+Execute Cypress inline (NOT in a separate terminal). From the `web/` directory:
+
+```bash
+source cypress/export-env.sh && npx cypress run --spec "{SPEC}" {GREP_ARGS}
+```
+
+Note: `source && npx` is one logical operation (env setup + run) and is acceptable as a single command.
+
+**IMPORTANT**: This command may take several minutes. Use a timeout of 600000ms (10 minutes).
+
+Capture the exit code:
+- `0` = all passed
+- non-zero = failures occurred
+
+### Step 5: Parse Results
+
+Merge mochawesome reports and parse. From the `web/` directory:
+
+```bash
+npx mochawesome-merge screenshots/cypress_report_*.json -o screenshots/merged-report.json
+```
+
+Read `screenshots/merged-report.json` and extract:
+
+For each test:
+```
+{
+  spec_file: string,        // from results[].fullFile
+  suite: string,            // from suites[].title
+  test_name: string,        // from tests[].title
+  full_title: string,       // from tests[].fullTitle
+  state: "passed" | "failed" | "skipped",
+  error_message: string,    // from tests[].err.message (if failed)
+  stack_trace: string,      // from tests[].err.estack (if failed)
+  duration_ms: number       // from tests[].duration
+}
+```
+
+Build a failure list and a pass list.
+
+**Note**: Mochawesome JSON has nested suites. Walk the tree recursively:
+```
+results[] -> suites[] -> tests[]
+                      -> suites[] -> tests[]  (nested suites)
+```
+
+### Step 6: Identify Screenshots
+
+For each failure, find the corresponding screenshot:
+
+```bash
+find /home/drajnoha/Code/monitoring-plugin/web/cypress/screenshots -name "*.png" -type f
+```
+
+Match screenshots to failures using the naming convention:
+```
+{Suite Name} -- {Test Title} (failed).png
+{Suite Name} -- {Test Title} -- before all hook (failed).png
+```
+
+### Step 7: Diagnosis Loop
+
+**If no failures** (exit code 0): Skip to Step 10 (flakiness probe).
+
+**If failures exist**: For each failing test, spawn a **Diagnosis Agent** (Explore-type sub-agent).
+
+Use the `/diagnose-test-failure` skill prompt. Provide:
+- `test-name`: the full title
+- `spec-file`: the spec file path
+- `error-message`: the error message
+- `screenshot-path`: absolute path to the failure screenshot
+- `stack-trace`: the error stack trace
+- `ci-context`: any relevant context from Step 0
+
+**Parallelization**: If failures are in **different spec files**, spawn diagnosis agents in parallel. If they're in the **same spec file**, diagnose sequentially (they may share root causes like a broken `before all` hook).
+
+**Before-all hook failures**: If a `before all` hook failed, all tests in that suite were skipped. Diagnose only the hook failure — fixing it will unblock all skipped tests.
+
+Collect all diagnoses. Separate into:
+- **Fixable**: `TEST_BUG`, `FIXTURE_ISSUE`, `PAGE_OBJECT_GAP`, `MOCK_ISSUE`
+- **Blocking**: `REAL_REGRESSION`, `INFRA_ISSUE`
+
+If any **blocking** issues found: Report them to the user. Continue fixing the fixable issues.
+
+### Step 8: Fix Loop
+
+For each fixable failure, spawn a **Fix Agent** (general-purpose sub-agent).
+
+Provide the Fix Agent with:
+1. The full diagnosis from Step 7
+2. The test file content (read it)
+3. The page object content (read `cypress/views/incidents-page.ts`)
+4. The fixture content (if relevant)
+5. These constraints:
+
+```
+## Fix Constraints
+
+You may ONLY edit files in these paths:
+- web/cypress/e2e/incidents/**/*.cy.ts (test files)
+- web/cypress/fixtures/incident-scenarios/*.yaml (fixtures)
+- web/cypress/views/incidents-page.ts (page object)
+- web/cypress/support/incidents_prometheus_query_mocks/** (mock layer)
+
+You must NOT edit:
+- web/src/** (source code — that's Phase 2)
+- Non-incident test files
+- Cypress config or support infrastructure
+- Any file outside the web/ directory
+
+## Fix Guidelines
+
+- Prefer the minimal change that fixes the issue
+- Don't refactor surrounding code — only fix the failing test
+- If adding a wait/timeout, prefer Cypress retry-ability (.should()) over cy.wait()
+- If fixing a selector, check that the new selector exists in the current DOM
+  by reading the relevant React component in src/ (read-only, don't edit)
+- If fixing a fixture, validate it against the fixture schema
+  (run /validate-incident-fixtures mentally or reference the schema)
+- If adding a page object method, follow existing naming conventions
+```
+
+After the Fix Agent returns, verify the fix makes sense:
+- Does the edit address the diagnosed root cause?
+- Could the edit break other tests?
+- Is it the minimal change needed?
+
+If the fix looks wrong, re-diagnose with additional context.
+
+### Step 9: Validate Fixes
+
+After applying fixes, re-run **only the previously failing tests**:
+
+From the `web/` directory:
+```bash
+source cypress/export-env.sh && npx cypress run --spec "{SPEC}" --env grep="{FAILING_TEST_NAME}"
+```
+
+For each test:
+- **Now passes**: Stage the fix files with `git add`
+- **Still fails**: Re-diagnose (increment retry counter). Max 2 retries per test.
+- **After 2 retries still failing**: Mark as `UNRESOLVED` and report to user
+
+### Step 10: Commit Batch
+
+After all fixable failures are addressed (or max retries reached):
+
+Stage and commit as separate commands (never chain `cd && git`):
+```bash
+git add <fixed-files>
+```
+```bash
+git commit --no-gpg-sign -m "<message>"
+```
+
+Commit message format:
+```
+fix(tests): <summary of what was fixed>
+
+- <file>: <change description>
+- <file>: <change description>
+
+Classifications: N TEST_BUG, N FIXTURE_ISSUE, N PAGE_OBJECT_GAP, N MOCK_ISSUE
+Unresolved: N (if any)
+
+Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
+```
+
+Track commit count. If commit count reaches **5**: Notify the user that the review threshold has been reached and ask whether to continue or pause for review.
+
+### Step 11: Iterate
+
+If there were failures and `current_iteration < max-iterations`:
+- Increment iteration counter
+- Go back to **Step 3** (clean results and re-run)
+
+This catches cascading fixes — e.g., fixing a `before all` hook unblocks skipped tests that may have their own issues.
+
+If all tests pass: Proceed to Step 12.
+
+### Step 12: Flakiness Probe
+
+Run the full target test suite `flakiness-runs` times (default: 3), even if everything is green.
+
+For each run:
+1. Clean previous results (Step 3)
+2. Run tests (Step 4)
+3. Parse results (Step 5)
+4. Record per-test pass/fail
+
+After all runs, compute flakiness:
+
+```
+Flakiness Report:
+  Total tests: N
+  Stable (all runs passed):  N
+  Flaky (some runs failed):  N
+  Broken (all runs failed):  N
+
+  Flaky tests:
+    - "test name" — passed 2/3 runs
+      Error on failure: <error message>
+    - "test name" — passed 1/3 runs
+      Error on failure: <error message>
+```
+
+For each **flaky** test:
+- Diagnose it using `/diagnose-test-failure` with the context that it's intermittent
+- Common flaky patterns: race conditions, animation timing, network mock timing, DOM detach/reattach
+- Apply fix if confident (add `.should('exist')` guards, use `{ timeout: N }`, avoid `.eq(N)` on dynamic lists)
+- Re-run flakiness probe on just the fixed tests to verify
+
+### Step 13: Final Report
+
+Output a summary:
+
+```
+# Iteration Complete
+
+## Branch: test/incident-robustness-YYYY-MM-DD
+## Commits: N
+## Iterations: N
+
+## Results
+- Tests run: N
+- Passing: N
+- Fixed in this session: N
+- Unresolved: N (details below)
+- Flaky (stabilized): N
+- Flaky (remaining): N
+
+## Fixes Applied
+1. [commit-sha] fix(tests): <summary>
+   - <file>: <change>
+
+2. [commit-sha] fix(tests): <summary>
+   - <file>: <change>
+
+## Unresolved Issues
+- "test name": REAL_REGRESSION — <description>. Source file X was modified in PR #N.
+- "test name": UNRESOLVED after 2 retries — <last error>
+
+## Remaining Flakiness
+- "test name": 2/3 passed — timing issue in chart rendering, needs investigation
+
+## Recommendations
+- [Next steps for unresolved issues]
+- [Whether to merge current fixes or wait]
+```
+
+### Step 14: Update Stability Ledger
+
+After the final report, update `web/cypress/reports/test-stability.md`.
+
+Read the file and update both sections:
+
+**1. Current Status table** — for each test in this run:
+- If test already in table: update pass rate (rolling average across all recorded runs), update trend
+- If test is new: add a row
+- Pass rate = total passes / total runs across all recorded iterations
+- Trend: compare last 3 runs — improving / stable / degrading
+
+**2. Run History log** — append a new row:
+```
+| {next_number} | {YYYY-MM-DD} | local | {branch} | {total_tests} | {passed} | {failed} | {flaky} | {commit_sha} |
+```
+
+**3. Machine-readable data** — update the JSON block between `STABILITY_DATA_START` and `STABILITY_DATA_END`:
+```json
+{
+  "tests": {
+    "test full title": {
+      "results": ["pass", "pass", "fail", "pass"],
+      "last_failure_reason": "Timed out...",
+      "last_failure_date": "2026-03-23",
+      "fixed_by": "abc1234"
+    }
+  },
+  "runs": [
+    {
+      "date": "2026-03-23",
+      "type": "local",
+      "branch": "test/incident-robustness-2026-03-23",
+      "total": 15,
+      "passed": 15,
+      "failed": 0,
+      "flaky": 0,
+      "commit": "abc1234"
+    }
+  ]
+}
+```
+
+Commit the ledger update together with the final batch of fixes if any, or as a standalone commit:
+```bash
+git add web/cypress/reports/test-stability.md
+```
+```bash
+git commit --no-gpg-sign -m "docs: update test stability ledger — {passed}/{total} passed, {flaky} flaky"
+```
+
+### Error Handling
+
+- **Cypress crashes** (not just test failures): Check if it's an OOM issue (`--max-old-space-size`), a missing dependency, or a config problem. Report to user.
+- **No `export-env.sh`**: Remind user to run `/cypress-setup` first.
+- **No mochawesome reports generated**: Check if the reporter config is correct. Fall back to parsing Cypress console output.
+- **Git conflicts**: If the working branch has conflicts with changes, report to user and stop.
+- **Sub-agent failure**: If a Diagnosis or Fix agent fails, log the error and skip that test. Don't let one broken agent block the whole loop.
+
+### Guardrails
+
+- **Never edit source code** (`src/`) in Phase 1
+- **Never disable a test** — if a test can't be fixed, mark it as unresolved, don't add `.skip()`
+- **Never add `@flaky` tag** to a test — that's a human decision
+- **Never change test assertions to match wrong behavior** — if the UI is wrong, it's a REAL_REGRESSION
+- **Max 2 retries per test** to avoid infinite loops
+- **Max 5 commits before pausing** for user review
+- **Always run flakiness probe** before declaring success
diff --git a/docs/agentic-test-iteration-ideas.md b/docs/agentic-test-iteration-ideas.md
new file mode 100644
index 000000000..3101473bd
--- /dev/null
+++ b/docs/agentic-test-iteration-ideas.md
@@ -0,0 +1,464 @@
+# Agentic Test Iteration — Ideas & Future Improvements
+
+Ideas and potential enhancements for the agentic test iteration system. These are not committed plans — they're options to explore when the core workflow is stable.
+
+## Authentication: GitHub App for CI Triggering
+
+**Problem**: The CI iteration skill (`/iterate-ci-flaky`) needs to comment `/test` on upstream PRs to trigger Prow. Current options (PATs, OAuth) are tied to a personal GitHub account.
+
+**Idea**: Create a dedicated GitHub App installed on `openshift/monitoring-plugin`.
+
+### How it would work
+
+1. Create a GitHub App with minimal permissions: `Issues: Write`, `Pull requests: Read`, `Checks: Read`
+2. An org admin approves installation on `openshift/monitoring-plugin`
+3. The app authenticates via a private key (`.pem` file) → short-lived installation tokens (1h expiry, auto-rotated)
+4. Comments appear as `my-ci-bot[bot]` instead of a personal user
+
+### Tradeoffs vs OAuth
+
+| Aspect | OAuth (`gh auth login --web`) | GitHub App |
+|--------|-------------------------------|------------|
+| Setup effort | Minimal | Moderate (create app, org admin approval) |
+| Tied to a person | Yes | No — bot identity |
+| Survives user leaving org | No | Yes |
+| Token management | Manual refresh | Automatic (1h expiry from private key) |
+| Audit trail | Personal user | Dedicated bot account |
+| Team sharing | Each person needs own auth | One app, anyone's agent can use it |
+
+### When to pursue
+
+- When multiple team members want to use the CI iteration skill
+- When you want a persistent bot identity for test automation comments
+- When you want to remove personal account dependency
+
+### Blocker
+
+Requires an `openshift` org admin to approve the app installation.
+
+---
+
+## CI Iteration: Fully Automated Job Triggering
+
+**Problem**: Currently the CI loop requires either a `/test` comment (needs upstream write access) or a `git push` (triggers automatically). The push path works but creates noise commits.
+
+**Ideas**:
+- **Empty commits**: `git commit --allow-empty -m "retrigger CI"` — triggers Prow without code changes, but pollutes history
+- **Prow API**: Prow may have a direct API for retriggering jobs without GitHub comments — investigate `https://prow.ci.openshift.org/` endpoints
+- **GitHub Actions bridge**: A lightweight GitHub Action on the fork that comments `/test` on the upstream PR when triggered via `workflow_dispatch`
+
+---
+
+## Parallel CI Runs for Flakiness Detection
+
+**Problem**: Flakiness probing requires N sequential CI runs (~2h each). 3 runs = 6 hours.
+
+**Idea**: Open N temporary PRs from the same branch, each triggers its own CI run in parallel. Collect all results, then close the temporary PRs.
+
+**Tradeoff**: Consumes N times the CI resources. May not be acceptable for shared CI infrastructure.
+
+**Alternative**: Ask if Prow supports multiple runs of the same job on the same PR — some CI systems allow this.
+
+---
+
+## Local Mock Tests + CI Real Tests as Two-Phase Validation
+
+**Problem**: Local iteration is fast but uses mocked data. CI uses real clusters but is slow (~2h).
+
+**Idea**: Formalize a two-phase approach:
+1. **Phase A** (`/iterate-incident-tests`): Fast local iteration with mocks — fix all mock-testable issues
+2. **Phase B** (`/iterate-ci-flaky`): Push to CI — catch environment-specific flakiness
+
+The orchestrator could automatically transition from Phase A to Phase B when local tests are green.
+
+---
+
+## Agent Fork with Deploy Key
+
+**Problem**: The agent creates unsigned commits on the user's working branch. Push access, GPG signing, and branch management all create friction.
+
+**Idea**: A dedicated fork (`monitoring-plugin-agent` or similar) with:
+- A passwordless deploy key for push access
+- No GPG signing requirement
+- Agent creates PRs from the fork to the upstream repo
+- User reviews and merges — clean separation of human vs agent work
+
+**Benefits**:
+- No unsigned commits in the user's fork
+- Agent can push freely without SSH key access to user's account
+- Clear audit trail: all agent work comes from the agent fork
+- Multiple agents (different team members) can share the same fork
+
+---
+
+## Screenshot Diffing for Visual Regression
+
+**Problem**: The diagnosis agent reads failure screenshots to understand UI state, but has no reference for "what it should look like."
+
+**Idea**: Capture baseline screenshots from passing tests and store them. On failure, the agent can compare the failure screenshot against the baseline to identify visual differences.
+
+**Implementation**: Cypress has plugins for visual regression testing (`cypress-image-snapshot`). The agent could:
+1. Generate baselines from a known-good run
+2. On failure, diff the failure screenshot against baseline
+3. Highlight visual changes to speed up diagnosis
+
+---
+
+## Test Stability Ledger
+
+**Status**: Partially implemented. Ledger file created at `web/cypress/reports/test-stability.md`. Update step added to `/iterate-incident-tests` (Step 14). Still needs to be wired into `/iterate-ci-flaky`.
+
+**Problem**: Flakiness data is ephemeral — it exists in the agent's report from one run and is lost. Next time the agent runs, it has no memory of previous results.
+
+**Design**: A markdown file with embedded machine-readable JSON, updated by both skills after each run.
+
+**Location**: `web/cypress/reports/test-stability.md` — committed to the working branch, travels with the fixes.
+
+**Contents**:
+- Human-readable table: per-test pass rate, trend, last failure reason, fix commit
+- Run history log: date, type (local/CI), branch, pass/fail counts
+- Machine-readable JSON block for programmatic parsing by the agent
+
+**Agent behavior**:
+- Reads the ledger at the start of each run to prioritize — "this test was flaky in last 3 runs, focus here"
+- Updates the ledger after each run with new results
+- Commits the ledger update alongside fixes
+
+---
+
+## Slack Notifications for Long-Running Loops
+
+**Status**: Implemented. Slack webhook notifications (Option A) integrated into `/iterate-ci-flaky`. GitHub PR comment-based review flow implemented as the two-way interaction channel (`review-github.py`). Option B (Slack bot with thread replies) documented but deprioritized due to internal setup complexity.
+
+### The Problem
+
+The CI iteration loop (`/iterate-ci-flaky`) runs for hours — each CI run takes ~2h, and the loop may do 3-5 fix-push-wait cycles. During that time:
+
+- The user has no visibility into what the agent decided to fix or how
+- By the time the loop finishes, multiple commits may have been pushed with no chance to course-correct
+- A wrong fix in cycle 1 wastes 2+ hours of CI time before the agent discovers it didn't work
+- The user may have domain context ("that test is flaky because of animation timing, not the selector") that would save cycles
+
+The core tension: **autonomy vs oversight**. The agent should run independently, but the user needs the ability to intervene at natural pause points.
+
+### Natural Pause Points
+
+The CI loop has built-in pauses where user input is most valuable:
+
+```
+Push fix ──→ [PAUSE: fix_applied] ──→ CI runs (~2h) ──→ [PAUSE: ci_complete] ──→ Analyze ──→ ...
+```
+
+1. **After fix, before CI runs** (`fix_applied`): The agent committed a fix and is about to push (or just pushed). This is the highest-value notification — the user can review the approach and say "redo" before a 2-hour CI cycle starts.
+
+2. **After CI completes** (`ci_complete`): Results are in. The agent is about to diagnose. User might have context about known issues.
+
+3. **When blocked** (`blocked`): Agent can't continue — needs human decision.
+
+### Review Window
+
+For the `fix_applied` event, the agent could optionally **wait before pushing**, giving the user a time window to respond:
+
+```
+Agent: "I'm about to push this fix. Waiting 10 minutes for feedback before proceeding."
+       [Shows diff summary in Slack]
+
+User (within 10 min): "Don't change the selector, the issue is timing. Add a cy.wait(500) instead."
+
+Agent: Reverts fix, applies user's suggestion, pushes that instead.
+```
+
+Or if no response within the window, the agent proceeds autonomously.
+
+Configuration: `review-window=10m` parameter on `/iterate-ci-flaky`. Set to `0` for fully autonomous (no waiting).
+
+### Notification Content — What Makes Each Message Actionable
+
+**`fix_applied`** — the most important notification:
+```
+:wrench: Agent: Fix Applied
+
+*What changed:*
+• `cypress/views/incidents-page.ts:45` — selector `.severity-filter` → `[data-test="severity-filter"]`
+• `cypress/e2e/incidents/regression/01.reg_filtering.cy.ts:78` — added `.should('exist')` guard before click
+
+*Why:* Screenshot showed the filter dropdown existed but had a different class. The `data-test` attribute is stable across builds.
+
+*Classification:* PAGE_OBJECT_GAP (confidence: HIGH)
+
+*Diff:* `git diff HEAD~1` on branch `test/incident-robustness-2026-03-24`
+
+*Next:* CI will trigger automatically on push. Reply in the agent session to change approach.
+
+PR #860 | Branch: test/incident-robustness-2026-03-24
+```
+
+The key: show **what** changed, **why** the agent chose that fix, and **how confident** it is. This lets the user quickly decide "looks good, let it run" vs "wrong approach, let me intervene."
+
+**`ci_complete`** — actionable status:
+```
+:white_check_mark: Agent: CI Complete — PASSED (run 2/5)
+
+*Results:* 15/15 tests passed in 1h 47m
+*Flakiness probe:* 2 of 5 confirmation runs complete, all green so far
+
+*Next:* Triggering confirmation run 3. No action needed.
+
+PR #860 | Branch: test/incident-robustness-2026-03-24 | CI Run
+```
+
+Or on failure:
+```
+:x: Agent: CI Complete — FAILED (iteration 2/3)
+
+*Results:* 13/15 passed, 2 failed
+*Failures:*
+• "should filter by severity" — Timed out on `[data-test="severity-chip"]` (same as last run)
+• "should display chart bars" — new failure, `Expected 5 bars, found 0`
+
+*Assessment:*
+• severity filter: same fix didn't work, will try different approach
+• chart bars: new failure — possibly caused by previous fix (will investigate)
+
+*Next:* Diagnosing and fixing. Will notify before pushing.
+
+PR #860 | Branch: test/incident-robustness-2026-03-24 | CI Run
+```
+
+**`blocked`** — requires user action:
+```
+:octagonal_sign: Agent: Blocked — REAL_REGRESSION
+
+*Test:* "should display incident bars in chart"
+*Issue:* Chart component renders empty. Screenshot shows the chart area with no bars, no error, no loading state.
+*Commit correlation:* `src/components/incidents/IncidentChart.tsx` was modified in this PR (+45, -12)
+
+*This is not a test issue* — the chart rendering logic appears broken. Agent cannot fix source code in Phase 1.
+
+*Action needed:* Investigate the chart component refactor. Agent will stop iterating on this test.
+
+PR #860 | Branch: test/incident-robustness-2026-03-24
+```
+
+### Implementation Options
+
+**Option A: Slack Incoming Webhook** (recommended starting point)
+- Setup: Slack → Apps → Incoming Webhooks → create webhook for your channel. 5 minutes.
+- Set `SLACK_WEBHOOK_URL` in `export-env.sh` or `~/.zshrc`
+- Agent posts via `curl` in a standalone `notify-slack.py` script
+- Messages formatted with Slack Block Kit (sections, context, code blocks)
+- Pro: No Slack app, no server, no OAuth. Just a URL.
+- Con: One-way — user sees notifications but must respond in the Claude Code session, not in Slack
+
+**Option B: Slack Bot with thread-based interaction** (no callback server needed)
+- Create a Slack App with bot token (`chat:write`, `channels:history`)
+- Agent posts messages to a channel, capturing the message `ts` (timestamp/ID)
+- Before proceeding at pause points, agent **reads thread replies** via `conversations.replies` API
+- If user replied in the Slack thread → agent reads the reply and adjusts
+- If no reply within the review window → agent proceeds
+
+```
+Agent posts:  "Fix applied. Reply in this thread to change approach. Proceeding in 10 min."
+User replies: "Use data-test attributes instead of class selectors"
+Agent reads:  conversations.replies → sees user feedback → adjusts fix
+```
+
+- Pro: Two-way interaction without a callback server. User stays in Slack.
+- Con: Needs a Slack App (not just a webhook). Polling for replies adds complexity. Bot token needs to be stored securely.
+
+**Implementation sketch for Option B:**
+```python
+# Post notification and get message timestamp
+response = slack_client.chat_postMessage(channel=CHANNEL, blocks=blocks)
+message_ts = response["ts"]
+
+# Wait for review window, polling for replies
+deadline = time.time() + review_window_seconds
+while time.time() < deadline:
+    replies = slack_client.conversations_replies(channel=CHANNEL, ts=message_ts)
+    user_replies = [r for r in replies["messages"] if r.get("user") != BOT_USER_ID]
+    if user_replies:
+        return user_replies[-1]["text"]  # Return latest user feedback
+    time.sleep(30)
+
+return None  # No feedback, proceed autonomously
+```
+
+**Option C: Claude Code hooks → Slack bridge**
+- Configure a Claude Code hook that fires on `git commit` or specific tool calls
+- The hook runs a shell script that posts to Slack
+- Pro: Zero changes to the skills — hooks are external
+- Con: Less control over notification content and timing. Can't implement review windows. Hooks are local config, not portable.
+
+**Option D: GitHub PR comments as notification channel**
+- Instead of Slack, the agent posts status updates as PR comments
+- User replies directly on the PR
+- Agent reads PR comments via `gh api` before proceeding
+- Pro: No Slack setup at all. Everything stays in GitHub. Natural for code review context.
+- Con: Noisier PR history. Not real-time (no push notifications unless GitHub notifications are configured).
+
+### Recommended Progression
+
+1. **Start with Option A** — get visibility. User monitors passively, intervenes in Claude Code session when needed.
+2. **Upgrade to Option B** when the review window pattern proves valuable — adds two-way interaction within Slack.
+3. **Option D** is a good alternative if you prefer keeping everything in GitHub — especially for team use where the PR is the natural communication hub.
+
+### Configuration
+
+```bash
+# Option A: Webhook only (one-way)
+export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T.../B.../..."
+
+# Option B: Bot with thread interaction (two-way)
+export SLACK_BOT_TOKEN="xoxb-..."
+export SLACK_CHANNEL_ID="C0123456789"
+export SLACK_REVIEW_WINDOW="600"  # seconds to wait for feedback (0 = no wait)
+```
+
+### Skill Integration Points
+
+Where notifications fire in each skill:
+
+**`/iterate-ci-flaky`:**
+- Step 3: `ci_started` — after `/test` comment or push
+- Step 5: `ci_complete` — after CI analysis
+- Step 6: `fix_applied` — after committing fix, before push (with optional review window)
+- Step 7: `flaky_found` — when flakiness detected in confirmation runs
+- Step 8: `iteration_done` — final summary
+- Any step: `blocked` — on REAL_REGRESSION, INFRA_ISSUE, auth failure
+
+**`/iterate-incident-tests`:**
+- Step 10: `fix_applied` — after committing batch (less critical since local runs are fast)
+- Step 12: `flaky_found` — during flakiness probe
+- Step 13: `iteration_done` — final summary
+- Any step: `blocked` — on REAL_REGRESSION
+
+---
+
+## Cloud Execution: Long-Running Autonomous Agent
+
+**Problem**: The current setup requires a local machine with an active Claude Code CLI session. Long CI polling (~2h per run) causes session timeouts, and the user must keep a terminal open.
+
+### Option 1: Claude Code Headless Mode (simplest)
+
+Run Claude Code non-interactively without a TTY:
+
+```bash
+claude --print --dangerously-skip-permissions \
+  -p "/iterate-ci-flaky pr=860 confirm-runs=5"
+```
+
+- `--print` / `-p`: non-interactive, outputs result and exits
+- `--dangerously-skip-permissions`: skips all approval prompts (use only in sandboxed environments)
+- Can run in `tmux`, `nohup`, GitHub Actions, or any CI runner
+- Uses the same tools, skills, and CLAUDE.md as interactive mode
+- Limitation: single-shot execution — runs the prompt and exits
+
+**Deployment**: `nohup claude --print ... > output.log 2>&1 &` on any machine, or in a GitHub Actions runner.
+
+### Option 2: Claude Agent SDK (most flexible)
+
+The Agent SDK (`@anthropic-ai/claude-code`) is a Node.js/TypeScript library that embeds Claude Code as a programmable agent:
+
+```typescript
+import { Claude } from "@anthropic-ai/claude-code";
+
+const claude = new Claude({
+  dangerouslySkipPermissions: true,
+});
+
+const result = await claude.message({
+  prompt: "/iterate-ci-flaky pr=860 confirm-runs=5",
+  workingDirectory: "/path/to/monitoring-plugin",
+});
+
+// Post result as PR comment
+await octokit.issues.createComment({
+  owner: "openshift", repo: "monitoring-plugin",
+  issue_number: 860, body: result.text,
+});
+```
+
+#### SDK vs CLI comparison
+
+| Aspect | CLI (`claude`) | Agent SDK |
+|--------|---------------|-----------|
+| Runtime | Terminal process | Node.js library |
+| Lifecycle | Single session, exits | Embed in any long-lived process |
+| Event-driven | No | Yes — webhooks, timers, PR events |
+| Permissions | Interactive prompts or skip-all | Programmatic control |
+| Tools | Built-in (Read, Write, Bash, etc.) | Same built-in + custom tools |
+| State | Session-scoped | Persistent (DB, files, etc.) |
+| Deployment | Local terminal | Anywhere Node.js runs |
+
+#### Requirements to port current skills
+
+- Node.js runtime with `@anthropic-ai/claude-code`
+- `ANTHROPIC_API_KEY` environment variable
+- `gh` CLI authenticated (or GitHub App token for comment access)
+- Git + SSH for pushing to fork
+- The repo cloned in the agent's working directory
+- All skill files (`.claude/commands/`) present in the clone
+
+#### What stays the same
+
+- Skills (`.md` files) — the SDK reads them from `.claude/commands/`
+- Polling script (`poll-ci-status.py`) — SDK runs Bash the same way
+- `/diagnose-test-failure`, `/analyze-ci-results` — all work as-is
+- File editing, git operations, Cypress execution — identical
+
+#### What changes
+
+- No permission prompts — `dangerouslySkipPermissions` in a sandboxed container
+- State between runs — persist to file or DB instead of ephemeral session
+- Triggering — webhook handler calls the SDK instead of user typing a command
+- Error recovery — the wrapping process can catch failures and retry
+
+### Option 3: GitHub Actions Workflow (cloud, event-driven)
+
+A GitHub Actions workflow that runs the agent on PR events:
+
+```yaml
+name: Flaky Test Iteration
+on:
+  issue_comment:
+    types: [created]
+
+jobs:
+  iterate:
+    if: contains(github.event.comment.body, '/run-flaky-iteration')
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+      - name: Install Claude Code
+        run: npm install -g @anthropic-ai/claude-code
+      - name: Run iteration
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          GH_TOKEN: ${{ secrets.GH_TOKEN }}
+        run: |
+          claude --print --dangerously-skip-permissions \
+            -p "/iterate-ci-flaky pr=${{ github.event.issue.number }} confirm-runs=3"
+      - name: Post results
+        run: gh pr comment ${{ github.event.issue.number }} --body-file output.md
+```
+
+**Flow**:
+1. User comments `/run-flaky-iteration` on a PR
+2. GitHub Actions triggers the workflow
+3. Claude Code runs in headless mode on the Actions runner
+4. Agent executes the full iteration loop (trigger CI, wait, analyze, fix, push)
+5. Results posted back as a PR comment
+
+**Considerations**:
+- GitHub Actions runners have a 6h timeout — enough for 2-3 CI runs
+- Needs `ANTHROPIC_API_KEY` and `GH_TOKEN` as repository secrets
+- Runner needs SSH key for git push (or use `GH_TOKEN` with HTTPS)
+- Cost: API tokens consumed + GitHub Actions minutes
+
+### Recommendation
+
+1. **Start with headless mode** (`tmux` + `--print`) to validate the flow works without interactive prompts
+2. **Move to GitHub Actions** for true cloud execution — event-driven, no local machine needed
+3. **Agent SDK** when you want a custom orchestrator with richer state management, error recovery, or Slack integration beyond what the skills provide
diff --git a/docs/agentic-test-iteration.md b/docs/agentic-test-iteration.md
new file mode 100644
index 000000000..9887d8403
--- /dev/null
+++ b/docs/agentic-test-iteration.md
@@ -0,0 +1,258 @@
+# Agentic Test Iteration Architecture
+
+Autonomous multi-agent system for iterating on Cypress test robustness, with visual feedback (screenshots + videos), CI result ingestion, and flakiness detection.
+
+## Goals
+
+| Phase | Objective |
+|-------|-----------|
+| **Phase 1** (current) | Make incident detection tests robust — fix selectors, timing, fixtures, page object gaps |
+| **Phase 2** (future) | Refactor frontend code using tests as a behavioral contract / safety net |
+
+## Architecture Overview
+
+```
+User: /iterate-incident-tests target=regression max-iterations=3
+
+Coordinator (main Claude Code session)
+  |
+  |-- [CI Analysis] /analyze-ci-results (optional first step)
+  |     Fetches CI artifacts, classifies infra vs test/code failures
+  |     Correlates failures with git commits for context
+  |     If all INFRA -> report to user and STOP
+  |
+  |-- Create branch: test/incident-robustness-<date>
+  |
+  |-- [Runner] Cypress headless via Bash (inline, not separate terminal)
+  |     Sources export-env.sh, produces mochawesome JSON + screenshots + videos
+  |
+  |-- [Parser] Extract failures from mochawesome JSON reports
+  |     Per failure: test name, error message, stack trace, screenshot path, video path
+  |
+  |-- For each failure (parallelizable):
+  |     |
+  |     |-- [Diagnosis Agent] (Explore-type sub-agent)
+  |     |     Reads: screenshot (multimodal) + error + test code + fixture + page object
+  |     |     Returns: root cause classification + recommended fix
+  |     |
+  |     |-- [Fix Agent] (general-purpose sub-agent)
+  |     |     Makes targeted edits based on diagnosis
+  |     |     Returns: diff summary
+  |     |
+  |     |-- [Validation] Re-run the specific failing test
+  |           Pass -> stage fix
+  |           Fail -> re-diagnose (max 2 retries per test)
+  |
+  |-- Commit batch of related fixes
+  |-- Repeat if failures remain (up to max-iterations)
+  |-- [Flakiness Probe] Run full suite 3x even if green
+  |-- Report final state to user
+```
+
+## Agent Roles
+
+### 1. Coordinator (main session)
+
+Owns the iteration loop, branch management, and commit strategy.
+
+Responsibilities:
+- Create and manage the working branch
+- Run Cypress tests inline via Bash
+- Parse mochawesome JSON reports
+- Dispatch Diagnosis and Fix agents
+- Track cumulative pass/fail state across iterations
+- Commit fixes in batches (threshold: **5 commits** before notifying user)
+- Run flakiness probes (multiple runs even when green)
+- Decide when to stop: all green + flakiness probe passed, max iterations, or needs human input
+
+### 2. Diagnosis Agent (Explore-type sub-agent)
+
+Input per failure:
+- Error message and stack trace from mochawesome JSON
+- Screenshot path (read with multimodal Read tool)
+- Video path (reference for user, not directly parseable by agent)
+- Test file path + relevant line numbers
+- Associated fixture YAML
+- Page object methods used
+
+Output — one of these classifications:
+
+| Classification | Description | Action |
+|---------------|-------------|--------|
+| `TEST_BUG` | Wrong selector, incorrect assertion, timing/race condition | Auto-fix |
+| `FIXTURE_ISSUE` | Missing data, wrong structure, edge case not covered | Auto-fix |
+| `PAGE_OBJECT_GAP` | Missing method, stale selector, outdated DOM reference | Auto-fix |
+| `MOCK_ISSUE` | Intercept not matching, response shape wrong | Auto-fix |
+| `REAL_REGRESSION` | Actual UI/code bug — not a test problem | **STOP and report to user** |
+| `INFRA_ISSUE` | Cluster down, cert expired, operator not installed | **STOP and report to user** |
+
+The agent should **read the screenshot first** before looking at code — visual state often reveals the root cause faster than stack traces.
+
+### 3. Fix Agent (general-purpose sub-agent)
+
+Input:
+- Diagnosis classification and details
+- Specific file references and what to change
+
+Scope — may only edit:
+- `cypress/e2e/incidents/**/*.cy.ts` (test files)
+- `cypress/fixtures/incident-scenarios/*.yaml` (fixtures)
+- `cypress/views/incidents-page.ts` (page object)
+- `cypress/support/incidents_prometheus_query_mocks/**` (mock layer)
+
+Must NOT edit:
+- Source code (`src/`) — that's Phase 2
+- Non-incident test files
+- Cypress config or support infrastructure
+
+### 4. Validation Agent
+
+Re-runs the specific failing test after a fix is applied:
+```bash
+source cypress/export-env.sh && npx cypress run --env grep="<test name>" --spec "<spec file>"
+```
+
+Reports pass/fail. If still failing, feeds back to Diagnosis Agent (max 2 retries per test).
+
+## Flakiness Detection
+
+Even if the first run is all green, the coordinator runs a **flakiness probe**:
+
+1. Run the full incident test suite 3 times consecutively
+2. Track per-test results across runs
+3. Flag any test that fails in any run as `FLAKY`
+4. For flaky tests: attempt to diagnose the timing/race condition and fix
+5. Report flakiness statistics: `test_name: 2/3 passed` etc.
+
+This catches intermittent failures that a single run would miss.
+
+## CI Result Ingestion
+
+CI analysis is handled by the dedicated `/analyze-ci-results` skill (`.claude/commands/analyze-ci-results.md`).
+
+The skill fetches artifacts from OpenShift CI (Prow) runs on GCS, classifies failures as infrastructure vs test/code issues, reads failure screenshots with multimodal vision, and correlates failures with the git commits that triggered them.
+
+### Key Capabilities
+
+- **URL normalization**: Accepts gcsweb or Prow UI URLs at any level of the artifact tree
+- **Structured metadata**: Extracts PR number, author, branch, commit SHAs from `started.json` / `finished.json` / `prowjob.json`
+- **Build log parsing**: Parses Cypress console output from `build-log.txt` for per-spec pass/fail/skip counts and error details
+- **Visual diagnosis**: Fetches and reads failure screenshots (multimodal) to understand UI state at failure time
+- **Failure classification**: Categorizes each failure as `INFRA_*` (cluster, operator, plugin, auth, CI) or test/code (`TEST_BUG`, `FIXTURE_ISSUE`, `PAGE_OBJECT_GAP`, `MOCK_ISSUE`, `CODE_REGRESSION`)
+- **Commit correlation**: Maps failures to specific file changes in the PR using `git diff {base}..{pr_head}`
+
+### Integration with Orchestrator
+
+The orchestrator uses `/analyze-ci-results` as an optional first step:
+
+1. If all failures are `INFRA_*` -> report to user and STOP (no test changes will help)
+2. If mixed -> report infra issues, proceed with test/code fixes only
+3. If all test/code -> proceed with full iteration loop
+4. Commit correlation tells the orchestrator whether to fix tests or investigate source changes
+5. CI screenshots give the Diagnosis Agent a head start before local reproduction
+
+### Usage
+
+```
+/analyze-ci-results ci-url=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/.../{RUN_ID}/
+/analyze-ci-results ci-url=https://prow.ci.openshift.org/view/gs/.../{RUN_ID} focus=regression
+```
+
+## Commit Strategy
+
+- **Branch naming**: `test/incident-robustness-YYYY-MM-DD`
+- **Commit granularity**: Group related fixes (e.g., "fix 3 selector issues in filtering tests")
+- **Review threshold**: Notify user after **5 commits** for review
+- **Never force-push**; always additive commits
+- User merges when ready, or continues iteration
+
+## Test Execution (Inline)
+
+Tests run inline via Bash, not in a separate terminal:
+
+```bash
+cd web && source cypress/export-env.sh && \
+  npx cypress run \
+    --spec "cypress/e2e/incidents/regression/**/*.cy.ts" \
+    --env grepTags="@incidents --@e2e-real --@flaky" \
+    --reporter ./node_modules/cypress-multi-reporters \
+    --reporter-options configFile=reporter-config.json
+```
+
+Results are collected from:
+- **Exit code**: 0 = all passed, non-zero = failures
+- **Mochawesome JSON**: `screenshots/cypress_report_*.json` — per-test details
+- **Screenshots**: `cypress/screenshots/{spec}/` — failure screenshots
+- **Videos**: `cypress/videos/{spec}.mp4` — kept on failure
+
+### Report Parsing
+
+Mochawesome JSON structure (per report file):
+```json
+{
+  "stats": { "passes": N, "failures": N, "skipped": N },
+  "results": [{
+    "suites": [{
+      "title": "Suite Name",
+      "tests": [{
+        "title": "test description",
+        "fullTitle": "Suite -- test description",
+        "state": "failed|passed|skipped",
+        "err": {
+          "message": "error description",
+          "estack": "full stack trace"
+        }
+      }]
+    }]
+  }]
+}
+```
+
+Use `npx mochawesome-merge screenshots/cypress_report_*.json > merged-report.json` to combine per-spec reports.
+
+## Skills
+
+| Skill | Purpose | Invoked by |
+|-------|---------|------------|
+| `/iterate-incident-tests` | Main orchestrator — local iteration loop, dispatches agents, manages commits | User |
+| `/iterate-ci-flaky` | CI-based iteration — push fixes, trigger Prow jobs, wait, analyze, repeat | User |
+| `/diagnose-test-failure` | Classifies a single test failure using screenshots + code analysis | Orchestrator (as sub-agent prompt) |
+| `/analyze-ci-results` | Fetches and analyzes OpenShift CI artifacts, classifies infra vs test/code | User or orchestrator |
+
+Skills are defined in `.claude/commands/` and can be invoked as slash commands.
+
+## Existing Infrastructure Leveraged
+
+| Asset | How the agent uses it |
+|-------|----------------------|
+| mochawesome JSON reporter | Structured test results parsing |
+| `@cypress/grep` | Run individual tests by name or tag |
+| `export-env.sh` | Source env vars for inline execution |
+| YAML fixture system | Edit fixtures to fix data issues |
+| Page object (`incidents-page.ts`) | Fix selectors and add missing methods |
+| Mock layer (`incidents_prometheus_query_mocks/`) | Fix intercept patterns |
+| `/generate-incident-fixture` skill | Generate new fixtures when needed |
+| `/validate-incident-fixtures` skill | Validate fixture edits |
+
+## Phase 2: Frontend Refactoring (Future)
+
+### Concept
+
+Tests become the behavioral contract. The agent refactors frontend code while using the test suite as a safety net.
+
+### Additional Agent Roles
+
+| Agent | Role |
+|-------|------|
+| **Refactor Planner** | Analyzes frontend code, proposes refactoring steps |
+| **Refactor Agent** | Makes code changes (replaces Fix Agent) |
+| **Contract Validator** | Runs tests, classifies failures as regression vs test-coupling |
+| **Test Adapter** | Updates tests that assert implementation details instead of behavior |
+
+### Key Principle
+
+If a test breaks due to refactoring but behavior is preserved, the test needs updating — it was too coupled to implementation. Phase 1 (robustness) reduces this coupling, making Phase 2 more effective.
+
+### Additional Classification
+
+The Diagnosis Agent gains `TEST_TOO_COUPLED` — the test asserts implementation details (specific DOM structure, internal state) rather than observable behavior. The Test Adapter agent rewrites it to be implementation-agnostic.
diff --git a/web/cypress/e2e/incidents/regression/02.reg_ui_tooltip_boundary_times.cy.ts b/web/cypress/e2e/incidents/regression/02.reg_ui_tooltip_boundary_times.cy.ts
index 8ad39e5ad..1bdce1008 100644
--- a/web/cypress/e2e/incidents/regression/02.reg_ui_tooltip_boundary_times.cy.ts
+++ b/web/cypress/e2e/incidents/regression/02.reg_ui_tooltip_boundary_times.cy.ts
@@ -103,7 +103,7 @@ describe('Regression: Mixed Severity Interval Boundary Times', { tags: ['@incide
     incidentsPage.setDays('1 day');
     incidentsPage.elements.incidentsChartContainer().should('be.visible');
     incidentsPage.elements.incidentsChartBarsGroups().should('have.length', 1);
-    cy.pause();
+
 
     cy.log('2.2 Consecutive interval boundaries: End of segment 1 should equal Start of segment 2');
     incidentsPage.hoverOverIncidentBarSegment(0, 0);
@@ -122,7 +122,7 @@ describe('Regression: Mixed Severity Interval Boundary Times', { tags: ['@incide
         ).to.equal(firstEnd);
       });
     });
-    cy.pause();
+
 
     cy.log('2.3 Incident tooltip Start vs alert tooltip Start vs alerts table Start');
     incidentsPage.hoverOverIncidentBarSegment(0, 0);
@@ -158,7 +158,7 @@ describe('Regression: Mixed Severity Interval Boundary Times', { tags: ['@incide
         });
       });
     });
-    cy.pause();
+
 
     cy.log('Expected failure: Incident tooltip Start times are 5 minutes off (OU-1221)');
   });
diff --git a/web/cypress/reports/test-stability.md b/web/cypress/reports/test-stability.md
new file mode 100644
index 000000000..a3cd4f485
--- /dev/null
+++ b/web/cypress/reports/test-stability.md
@@ -0,0 +1,34 @@
+# Test Stability Ledger
+
+Tracks incident detection test stability across local and CI iteration runs. Updated automatically by `/iterate-incident-tests` and `/iterate-ci-flaky`.
+
+## How to Read
+
+- **Pass rate**: percentage across all recorded runs (local + CI combined)
+- **Trend**: direction over last 3 runs
+- **Last failure**: most recent failure reason and which run it occurred in
+- **Fixed by**: commit that resolved the issue (if applicable)
+
+## Current Status
+
+| Test | Pass Rate | Trend | Runs | Last Failure | Fixed By |
+|------|-----------|-------|------|-------------|----------|
+| _No data yet — run `/iterate-incident-tests` or `/iterate-ci-flaky` to populate_ | | | | | |
+
+## Run History
+
+### Run Log
+
+| # | Date | Type | Branch | Tests | Passed | Failed | Flaky | Commit |
+|---|------|------|--------|-------|--------|--------|-------|--------|
+| _No runs recorded yet_ | | | | | | | | |
+
+<!-- STABILITY_DATA_START
+This section is machine-readable. Do not edit manually.
+
+{
+  "tests": {},
+  "runs": []
+}
+
+STABILITY_DATA_END -->
diff --git a/web/package.json b/web/package.json
index c66a43344..55264ebe6 100644
--- a/web/package.json
+++ b/web/package.json
@@ -38,8 +38,8 @@
     "test-cypress-coo": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@coo --@flaky --@demo'",
     "test-cypress-coo-bvt": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@coo+@smoke --@demo'",
     "test-cypress-virtualization": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@virtualization --@flaky --@demo'",
-    "test-cypress-incidents": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@incidents --@flaky --@demo'",
-    "test-cypress-incidents-e2e": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@incidents+@e2e-real --@flaky --@demo'",
+    "test-cypress-incidents": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@incidents --@flaky --@demo --@xfail'",
+    "test-cypress-incidents-e2e": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@incidents+@e2e-real --@flaky --@demo --@xfail'",
     "test-cypress-smoke": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@smoke --@flaky --@demo'",
     "test-cypress-fast": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@smoke --@slow --@demo --@flaky'",
     "test-cypress-perses-dev": "node --max-old-space-size=4096 ./node_modules/.bin/cypress run --browser chrome --headless --env grepTags='@perses-dev --@demo'",