diff --git a/README.md b/README.md
index 8d09d42..1bfbe7b 100644
--- a/README.md
+++ b/README.md
@@ -257,6 +257,15 @@ a `.scm` import query — no other changes needed.
 
 ---
 
+## Experiments archive
+
+Nine rounds of paired A/B agent comparisons drove the rule library
+to its current state. The full archive — comparison reports, agent
+deliverables, run logs, and analysis charts — lives at
+[`experiments/`](experiments/README.md).
+
+---
+
 ## License
 
 MIT — see [`LICENSE`](LICENSE).
diff --git a/README.zh-TW.md b/README.zh-TW.md
index 6d5059e..15984ad 100644
--- a/README.zh-TW.md
+++ b/README.zh-TW.md
@@ -241,6 +241,13 @@ import query。
 
 ---
 
+## 實驗資料
+
+9 輪 paired A/B agent 對比實驗推動了規則庫的演化。完整資料包（分析報告、agent 交付、run logs、分析圖表）在
+[`experiments/`](experiments/README.md)。
+
+---
+
 ## 授權
 
 MIT —— 見 [`LICENSE`](LICENSE)。
diff --git a/experiments/31flashlite-amb-b b/experiments/31flashlite-amb-b
deleted file mode 160000
index 3c7bd61..0000000
--- a/experiments/31flashlite-amb-b
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit 3c7bd6132df55a86b447d35a1789323385e93144
diff --git a/experiments/31flashlite-amb-b/README.md b/experiments/31flashlite-amb-b/README.md
new file mode 100644
index 0000000..aace7d8
--- /dev/null
+++ b/experiments/31flashlite-amb-b/README.md
@@ -0,0 +1,17 @@
+# URL Shortener REST API
+
+A minimal in-memory URL shortener REST API.
+
+## Requirements
+- Python 3.12+
+- Dependencies listed in requirements.txt
+
+## Setup
+1. pip install -r requirements.txt
+2. python app.py
+
+## Endpoints
+- POST /shorten {"url": "..."}
+- GET /<code> 302 redirect
+- DELETE /<code>
+- GET /list
diff --git a/experiments/31flashlite-amb-b/app.py b/experiments/31flashlite-amb-b/app.py
new file mode 100644
index 0000000..0c853c0
--- /dev/null
+++ b/experiments/31flashlite-amb-b/app.py
@@ -0,0 +1,40 @@
+from flask import Flask, jsonify, request, redirect
+import secrets
+
+app = Flask(__name__)
+storage = {}
+
+@app.route('/shorten', methods=['POST'])
+def shorten():
+    data = request.get_json()
+    if not data or 'url' not in data:
+        return jsonify({"error": "Missing URL"}), 400
+    url = data['url']
+    if not (url.startswith('http://') or url.startswith('https://')):
+        return jsonify({"error": "Invalid URL"}), 400
+    
+    code = secrets.token_urlsafe(6)[:6]
+    storage[code] = url
+    return jsonify({"code": code, "short_url": f"http://127.0.0.1:8080/{code}"}), 201
+
+@app.route('/<code_>', methods=['GET'])
+def redirect_to(code_):
+    url = storage.get(code_)
+    if not url:
+        return jsonify({"error": "Not found"}), 404
+    return redirect(url)
+
+@app.route('/<code_>', methods=['DELETE'])
+def delete(code_):
+    if code_ not in storage:
+        return jsonify({"error": "Not found"}), 404
+    del storage[code_]
+    return '', 204
+
+@app.route('/list', methods=['GET'])
+def list_entries():
+    entries = [{"code": k, "url": v} for k, v in storage.items()]
+    return jsonify({"entries": entries})
+
+if __name__ == '__main__':
+    app.run(port=8080)
diff --git a/experiments/31flashlite-amb-b/docs/superpowers/plans/2026-05-05-url-shortener-implementation.md b/experiments/31flashlite-amb-b/docs/superpowers/plans/2026-05-05-url-shortener-implementation.md
new file mode 100644
index 0000000..254d8c5
--- /dev/null
+++ b/experiments/31flashlite-amb-b/docs/superpowers/plans/2026-05-05-url-shortener-implementation.md
@@ -0,0 +1,223 @@
+# URL Shortener Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Build a minimal in-memory URL shortener REST API.
+
+**Architecture:** Use Flask for the REST API, a simple dictionary for storage, and `secrets` for URL generation.
+
+**Tech Stack:** Python 3.12, Flask, pytest.
+
+---
+
+### Task 1: Setup Flask App and Storage
+
+**Files:**
+- Create: `app.py`
+- Test: `tests.py`
+
+- [ ] **Step 1: Create initial `app.py`**
+
+```python
+from flask import Flask, jsonify, request
+
+app = Flask(__name__)
+storage = {}
+
+@app.route('/shorten', methods=['POST'])
+def shorten():
+    return jsonify({"message": "OK"}), 201
+
+if __name__ == '__main__':
+    app.run(port=8080)
+```
+
+- [ ] **Step 2: Create initial `tests.py`**
+
+```python
+import pytest
+from app import app
+
+@pytest.fixture
+def client():
+    return app.test_client()
+
+def test_shorten_endpoint(client):
+    response = client.post('/shorten', json={"url": "https://google.com"})
+    assert response.status_code == 201
+```
+
+- [ ] **Step 3: Verify app starts and tests pass**
+
+Run: `python3 -m pytest tests.py -v`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add app.py tests.py
+git commit -m "feat: setup basic flask app and test fixture"
+```
+
+### Task 2: Implement URL Shortening Logic
+
+**Files:**
+- Modify: `app.py`
+- Modify: `tests.py`
+
+- [ ] **Step 1: Update `tests.py` for validation and response**
+
+```python
+def test_shorten_validation(client):
+    response = client.post('/shorten', json={"url": "invalid"})
+    assert response.status_code == 400
+
+def test_shorten_success(client):
+    response = client.post('/shorten', json={"url": "https://google.com"})
+    assert response.status_code == 201
+    assert 'code' in response.get_json()
+    assert 'short_url' in response.get_json()
+```
+
+- [ ] **Step 2: Update `app.py` with validation and code generation**
+
+```python
+from flask import Flask, jsonify, request
+import secrets
+
+app = Flask(__name__)
+storage = {}
+
+@app.route('/shorten', methods=['POST'])
+def shorten():
+    data = request.get_json()
+    url = data.get('url', '')
+    if not (url.startswith('http://') or url.startswith('https://')):
+        return jsonify({"error": "Invalid URL"}), 400
+    
+    code = secrets.token_urlsafe(6)[:6]
+    storage[code] = url
+    return jsonify({"code": code, "short_url": f"http://127.0.0.1:8080/{code}"}), 201
+
+if __name__ == '__main__':
+    app.run(port=8080)
+```
+
+- [ ] **Step 3: Run tests to verify**
+
+Run: `python3 -m pytest tests.py -v`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add app.py tests.py
+git commit -m "feat: implement shorten logic"
+```
+
+### Task 3: Implement Redirect (GET /<code>)
+
+**Files:**
+- Modify: `app.py`
+- Modify: `tests.py`
+
+- [ ] **Step 1: Add redirect test**
+
+```python
+def test_redirect(client):
+    # Setup
+    client.post('/shorten', json={"url": "https://google.com"})
+    # Get code
+    res = client.get('/list')
+    code = res.get_json()['entries'][0]['code']
+    
+    # Test
+    response = client.get(f'/{code}')
+    assert response.status_code == 302
+    assert response.location == "https://google.com"
+
+def test_redirect_not_found(client):
+    response = client.get('/nonexistent')
+    assert response.status_code == 404
+```
+
+- [ ] **Step 2: Add redirect logic**
+
+```python
+@app.route('/<code_>', methods=['GET'])
+def redirect_to(code_):
+    url = storage.get(code_)
+    if not url:
+        return jsonify({"error": "Not found"}), 404
+    from flask import redirect
+    return redirect(url)
+```
+
+- [ ] **Step 3: Run tests to verify**
+
+Run: `python3 -m pytest tests.py -v`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add app.py tests.py
+git commit -m "feat: implement redirect"
+```
+
+### Task 4: Implement Delete and List
+
+**Files:**
+- Modify: `app.py`
+- Modify: `tests.py`
+
+- [ ] **Step 1: Add delete and list tests**
+
+```python
+def test_list(client):
+    client.post('/shorten', json={"url": "https://a.com"})
+    response = client.get('/list')
+    assert response.status_code == 200
+    assert len(response.get_json()['entries']) >= 1
+
+def test_delete(client):
+    # Setup
+    client.post('/shorten', json={"url": "https://b.com"})
+    res = client.get('/list')
+    code = res.get_json()['entries'][0]['code']
+    
+    # Test
+    response = client.delete(f'/{code}')
+    assert response.status_code == 204
+    
+    # Verify deletion
+    assert client.get(f'/{code}').status_code == 404
+```
+
+- [ ] **Step 2: Add delete and list logic**
+
+```python
+@app.route('/<code_>', methods=['DELETE'])
+def delete(code_):
+    if code_ not in storage:
+        return jsonify({"error": "Not found"}), 404
+    del storage[code_]
+    return '', 204
+
+@app.route('/list', methods=['GET'])
+def list_entries():
+    entries = [{"code": k, "url": v} for k, v in storage.items()]
+    return jsonify({"entries": entries})
+```
+
+- [ ] **Step 3: Run tests to verify**
+
+Run: `python3 -m pytest tests.py -v`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add app.py tests.py
+git commit -m "feat: implement delete and list"
+```
diff --git a/experiments/31flashlite-amb-b/docs/superpowers/specs/2026-05-05-url-shortener-design.md b/experiments/31flashlite-amb-b/docs/superpowers/specs/2026-05-05-url-shortener-design.md
new file mode 100644
index 0000000..78b9eb9
--- /dev/null
+++ b/experiments/31flashlite-amb-b/docs/superpowers/specs/2026-05-05-url-shortener-design.md
@@ -0,0 +1,29 @@
+# Design: URL Shortener REST API
+
+## Architecture
+- Framework: Flask
+- Storage: Python `dict` in-memory.
+- Random Code Generation: `secrets.token_urlsafe(6)` for unguessable 6-char alphanumeric codes.
+
+## Endpoints
+- `POST /shorten`:
+    - Input: `{"url": "..."}`
+    - Logic: Validate URL format (must start with `http://` or `https://`). Generate 6-char code. Store mapping `code -> url`.
+    - Returns: `{"code": "...", "short_url": "..."}`
+- `GET /<code>`:
+    - Logic: Look up code. If found, 302 redirect to original URL. If not found, 404.
+- `DELETE /<code>`:
+    - Logic: Look up code. If found, delete. Returns 204. If not found, 404.
+- `GET /list`:
+    - Logic: Iterate dict.
+    - Returns: `{"entries": [{"code": "...", "url": "..."}, ...]}`
+
+## Storage
+- `storage = {}` (global/module-level dict)
+
+## Error Handling
+- Invalid URL (POST): 400
+- Not found (GET/DELETE): 404
+
+## Testing
+- Using `pytest` with `Flask`'s `test_client`.
diff --git a/experiments/31flashlite-amb-b/requirements.txt b/experiments/31flashlite-amb-b/requirements.txt
new file mode 100644
index 0000000..4649a3c
--- /dev/null
+++ b/experiments/31flashlite-amb-b/requirements.txt
@@ -0,0 +1,2 @@
+flask
+pytest
diff --git a/experiments/31flashlite-amb-b/tests.py b/experiments/31flashlite-amb-b/tests.py
new file mode 100644
index 0000000..773ca81
--- /dev/null
+++ b/experiments/31flashlite-amb-b/tests.py
@@ -0,0 +1,32 @@
+import pytest
+from app import app
+
+@pytest.fixture
+def client():
+    return app.test_client()
+
+def test_shorten_flow(client):
+    # Shorten
+    res = client.post('/shorten', json={"url": "https://google.com"})
+    assert res.status_code == 201
+    code = res.get_json()['code']
+    
+    # List
+    res = client.get('/list')
+    assert len(res.get_json()['entries']) == 1
+    
+    # Redirect
+    res = client.get(f'/{code}')
+    assert res.status_code == 302
+    
+    # Delete
+    res = client.delete(f'/{code}')
+    assert res.status_code == 204
+    
+    # Verify gone
+    res = client.get(f'/{code}')
+    assert res.status_code == 404
+
+def test_invalid_url(client):
+    res = client.post('/shorten', json={"url": "ftp://bad.com"})
+    assert res.status_code == 400
diff --git a/experiments/README.md b/experiments/README.md
index 39be74a..484419f 100644
--- a/experiments/README.md
+++ b/experiments/README.md
@@ -3,65 +3,228 @@
 Empirical validation runs for the [Aegis](../README.md) judgment-free
 LLM-agent fact layer. Nine rounds of paired A/B comparisons across
 Anthropic Haiku / Sonnet, OpenAI GPT-5 / Codex / GPT-5.4-mini, and
-Google Gemini 2.5/3 Flash family on four task shapes:
+Google Gemini 2.5/3 Flash family on four task shapes.
 
-- **Plan A** — ambiguous-spec greenfield (URL shortener)
-- **Plan B** — brownfield with planted SEC bugs (Python `auth.py`)
-- **Plan C** — multi-module refactor (notifications feature)
-- **Round 9** — Go + Java brownfield (multi-language SEC dispatch validation)
+Each round runs the **same task twice with the same model**: variant
+**A** without the Aegis MCP server, variant **B** with it.
 
-Each round runs the same task twice with the same model: variant **A**
-without the Aegis MCP server, variant **B** with it. The deliverables
-get committed alongside the agent's `run.log` (where available) so the
-behaviour is reproducible.
+---
+
+## Overview chart
+
+```
+Total round directories: 52  (26 paired × 2 variants)
+
+Rounds by task shape:
+                  ┌──────────────────────────────┐
+  Plan A          │██████████████ 14 dirs (7 pairs)
+  ambiguous spec  │
+                  ├──────────────────────────────┤
+  Plan B          │████████████████ 16 dirs (8 pairs)
+  brownfield      │
+                  ├──────────────────────────────┤
+  Plan C          │████████████████ 16 dirs (8 pairs)
+  multi-module    │
+                  ├──────────────────────────────┤
+  Initial         │██████ 6 dirs (3 pairs)
+  (Round 1-2)     │
+                  └──────────────────────────────┘
+
+Models tested: 11
+  Haiku · Sonnet · GPT-5.2 · GPT-5.3-codex · GPT-5.4 (codex)
+  GPT-5.4-mini · Gemini 2.5 Flash · 2.5 Flash-Lite
+  3 Flash · 3.1 Flash-Lite · (Gemma family — skipped due to API errors)
+```
+
+---
+
+## Chart 1 — Plan B brownfield: 0/3 → 3/3 fix rate
+
+The headline result. Each starting `auth.py` has **3 planted SEC bugs**:
+`md5` password hash, timing-unsafe `==` compare, weak RNG for session
+token. Cells show **bugs remaining** out of 3 (lower = better, 0 = all
+fixed). The right column shows how many bugs Aegis pointed out during
+the run; the agent then fixed those plus often more.
+
+```
+Model              │ A (no aegis)     │ B (with aegis)   │ Δ      │ aegis hits
+                   │ bugs left  /3    │ bugs left  /3    │        │
+───────────────────┼──────────────────┼──────────────────┼────────┼────────
+Haiku              │ 1 ▓░░             1 ▓░░              0       1*
+Sonnet             │ 3 ▓▓▓             1 ▓░░             +2       3
+Gemini 2.5 Flash   │ 3 ▓▓▓             3 ▓▓▓              0       2*
+GPT-5.2            │ 3 ▓▓▓             0 ░░░             +3       3
+GPT-5.3-codex      │ 3 ▓▓▓             0 ░░░             +3       3
+GPT-5.4-mini       │ 3 ▓▓▓             0 ░░░             +3       3
+GPT-5.4-mini Go    │ 3 ▓▓▓             2 ▓▓░             +1       2** (md5 missed)
+GPT-5.4-mini Java  │ 2 ▓▓░             0 ░░░             +2       1** (only SEC012 fired)
+                   │                                              
+───────────────────┴──────────────────┴──────────────────┴────────┴────────
+
+  *  = Plan B was a re-run after rules were tightened — original Haiku
+       run was on a slimmer rule set.
+  ** = Aegis ran with a coverage gap (SEC009 multi-language, SEC010
+       Java enclosing-context). Both are now fixed in PRs #11 / #12.
+```
+
+**Two of the most striking observations:**
+
+1. **GPT-5.4-mini Java** — Aegis only flagged 1 out of 3 bugs (SEC012
+   timing-unsafe), but the agent fixed all 3. The remaining md5 →
+   SHA-256 and `new Random()` → `SecureRandom` were **self-driven**
+   once Aegis put the agent into security-review mode. (The "one
+   finding triggers the cascade" mechanism.)
+2. **GPT-5.2** — Went well beyond the prompt: replaced md5 with
+   OWASP-recommended **PBKDF2-HMAC-SHA256 + 16-byte salt + 210k
+   iterations** instead of plain SHA-256. Aegis only said "md5 is
+   weak"; the agent decided what to escalate to.
+
+---
+
+## Chart 2 — Plan C multi-module: anti-paralysis ritual
+
+Plan C (add a `notifications` feature to a 5-module Python project)
+has **clean starting code** — no planted bugs. Aegis can't show off
+its rule library here, but a third ROI mechanism surfaced anyway.
+
+```
+Model                  │ A (no aegis)       │ B (with aegis)
+                       │                    │
+codex (GPT-5.4)        │ ✓ 5 tests pass     │ ✓ 5 tests pass
+Gemini 2.5 Flash       │ ✓ 5 tests pass     │ ✓ 5 tests pass
+Gemini 2.5 Flash-Lite  │ ✓ 4 tests pass     │ ✓ 4 tests pass
+Gemini 3.1 Flash-Lite  │ ✗ task abandoned   │ ✗ task abandoned   ← preview-mode planning loop, both stuck
+Gemini 3 Flash         │ ✓ 5 tests pass     │ ✓ 5 tests pass
+GPT-5.2                │ ✓ 5 tests pass     │ ✓ 5 tests pass
+GPT-5.3-codex          │ ✓ 4 tests pass     │ ✓ 4 tests pass
+GPT-5.4-mini           │ ✗ task abandoned   │ ✓ 5 tests pass     ← THIS PAIR is the finding
+                       │ (no notifications.py    
+                       │  no tests.py;          
+                       │  24k tokens spent      
+                       │  on design proposals)  
+                       
+Cycle introductions   16/16 = 0  (clean architecture is self-stabilizing)
+Public symbol breaks  16/16 = 0  (no agent removed an existing public name)
+```
+
+**Key data point**: **GPT-5.4-mini's A variant abandoned the task**
+— spent 24,051 tokens describing two design alternatives and asking
+"approve this design?" without ever writing code. The B variant of
+the **same model** completed the task because the prompt's `REQUIRED
+workflow: run aegis_validate.py after every .py file you write` made
+file-writing mandatory. Even though Aegis surfaced 0 security
+findings on the (clean) starting code, the **ritual itself** kept
+the weak model action-oriented.
+
+---
+
+## Chart 3 — The three Aegis ROI mechanisms
+
+Discovered empirically across the 9 rounds. Only mechanism 1 was the
+designer's stated intent.
+
+```mermaid
+flowchart TD
+    Trigger[Agent edits a file] --> MCP[validate_file MCP call]
+    MCP --> Findings{Findings emitted?}
+
+    Findings -->|Security: SEC009/010/012/etc.| ROI1[Mechanism 1: rule-hit → fix<br/>Plan B: 0/3 → 3/3 across 3 models]
+    Findings -->|Workspace: cycle / symbol_removed| ROI2[Mechanism 2: structural guardrail<br/>0/14 hits — dead weight on clean code<br/>but would catch real cycles]
+    Findings -->|nothing — empty result| ROI3[Mechanism 3: anti-paralysis ritual<br/>Forced write-then-validate cycle<br/>prevents weak-model planning loops]
+
+    ROI1 --> Cascade[Often triggers cascade:<br/>1 finding → agent rewrites whole file<br/>g52: md5 → PBKDF2+salt+210k iter]
+    ROI3 --> Saved[Plan C: g54mini-mc-a abandoned<br/>g54mini-mc-b completed same task]
+
+    style ROI1 fill:#d4edda,stroke:#155724,color:#000
+    style ROI2 fill:#fff3cd,stroke:#856404,color:#000
+    style ROI3 fill:#d1ecf1,stroke:#0c5460,color:#000
+```
+
+| Mechanism | Trigger | Evidence | Designed in? |
+|:---:|---|---|:---:|
+| **1. Rule-hit → fix** | brownfield + planted SEC bug | Plan B 3/3 models 0/3 → 3/3 | ✅ |
+| **2. Structural guardrail** | cycle / public_symbol_removed | 0/14 hits (clean code = silent) | ✅ |
+| **3. Anti-paralysis ritual** | weak model + any task | Plan C g54mini A abandoned vs B completed | ❌ emergent |
+
+---
+
+## Chart 4 — Direct lineage: experiment finding → Aegis code change
+
+Every recent SEC PR has a specific experiment trigger. The
+dogfooding loop in action:
+
+```
+Round 8 codex Plan A
+     │
+     ▼ FP discovered: SEC010 fires on `secrets.choice` (the SECURE choice)
+     │ — agent spent a turn "fixing" already-secure code
+     │
+     ▼─────────────────────────►  PR #9: secrets./os.urandom/crypto. allowlist
+                                  +2 regression tests
+
+
+Round 9 Go brownfield
+     │
+     ▼ FN discovered: SEC009 doesn't fire on Go `md5.Sum(...)`
+     │ — agent kept md5 because aegis didn't surface it
+     │
+     ▼─────────────────────────►  PR #12: SEC009 language-aware dispatch
+                                  +8 multi-language tests
+                                  + enclosing_security_context
+                                    function-name check
+
+
+Round 9 Java brownfield
+     │
+     ▼ FN discovered: SEC010 inner-block `break` hides
+     │ `int idx = new Random().nextInt(...)` inside
+     │ `generateSessionToken()`
+     │
+     ▼─────────────────────────►  PR #11: enclosing_token_context
+                                  walks past inner blocks +
+                                  reads function name
+                                  +3 regression tests
+```
+
+| Round | Discovered | Fixed in |
+|---|---|---|
+| Round 8 codex Plan A | SEC010 false-positive on `secrets.choice` | PR #9 |
+| Round 9 Go / Java | SEC009 multi-language coverage = 0 | PR #12 |
+| Round 9 Java | SEC010 inner-block `break` hides production case | PR #11 |
+| Plan A 32+ runs | SEC010 needles too narrow (URL shorteners) | PR #6 |
+| Plan A entropy bypass | SEC002 misses placeholder-shaped strings | PR #6 |
+| Plan B 6 runs | "What aegis is NOT" missing in README | PR #6 |
+
+PR #6 — #12 (the post-experiment SEC coverage and B-class rule
+batches) all traced back to specific findings in this archive.
+
+---
 
 ## Files
 
 - [`comparison-report.md`](comparison-report.md) — the 1199-line
-  rolling analysis. Round 1 → Round 9 commentary, including the
-  three Aegis ROI mechanisms surfaced from the data:
-  1. **Rule-hit → fix** (brownfield Plan B: 0/3 → 3/3 across 3 models)
-  2. **Structural guardrail** (cycle / public_symbol_removed —
-     dead weight on clean architectures, 0/14 hits)
-  3. **Anti-paralysis ritual** (weak models complete tasks they
-     would otherwise abandon; Round 8 Plan C surfaced this)
+  rolling Round 1 → Round 9 analysis
 - `starting-code/` — Plan B Python brownfield fixture (3 planted SEC bugs)
 - `starting-go/` — Round 9 Go brownfield fixture
 - `starting-java/` — Round 9 Java brownfield fixture
 - `starting-multi/` — Plan C 5-module fixture
 - `prompt-*.txt` — the prompts handed to each agent. `-a.txt` is the
   no-aegis variant; `-b.txt` adds the `REQUIRED workflow: run
-  aegis_validate.py after every write` ritual instruction.
+  aegis_validate.py after every write` ritual instruction
 - `aegis_validate.py` — Python wrapper around `aegis-mcp` stdio
-  JSON-RPC. The agents run it after every file write.
-- `eval_round_*.sh` — analysis scripts that compare each round's
-  before/after state against the planted bugs.
+  JSON-RPC. The agents run it after every file write
+- `eval_round_*.sh` — analysis scripts
 - `<model>-<task>-<variant>/` — one directory per agent run.
-  Contains the agent's deliverables plus `run.log` for codex-driven
-  rounds. Naming convention:
+  Naming convention:
   - models: `haiku` / `sonnet` / `flash` (Gemini 2.5) /
     `25flash` / `25fl` (Gemini 2.5 Flash-Lite) / `3flash` /
-    `31flashlite` / `codex` (GPT-5.4) / `g52` (GPT-5.2) /
-    `g53codex` (GPT-5.3-codex) / `g54mini` (GPT-5.4-mini)
-  - tasks: `amb` (Plan A) / `bf` (Plan B) / `mc` (Plan C
-    multi-module) / `bf-go` / `bf-java` (Round 9)
+    `31flashlite` (Gemini 3.1 Flash-Lite Preview) / `codex`
+    (GPT-5.4) / `g52` (GPT-5.2) / `g53codex` (GPT-5.3-codex) /
+    `g54mini` (GPT-5.4-mini)
+  - tasks: `amb` (Plan A) / `bf` (Plan B Python) / `mc` (Plan C
+    multi-module) / `bf-go` (Round 9 Go) / `bf-java` (Round 9 Java)
   - variants: `a` (no Aegis) / `b` (with Aegis MCP)
 
-## Findings that drove rule changes back into Aegis
-
-The dogfooding loop: every round caught at least one Aegis
-false-positive or false-negative that became a code change.
-
-| Round | Discovered | Fixed in |
-|---|---|---|
-| Round 8 codex | SEC010 false-positive on `secrets.choice` | aegis PR #9 |
-| Round 9 Go/Java | SEC009 multi-language coverage = 0 | aegis PR #12 |
-| Round 9 Java | SEC010 inner-block `break` hides production case | aegis PR #11 |
-
-PR #6 — #12 (the post-experiment SEC coverage and B-class rule
-batches) all traced back to specific false-positives or false-
-negatives surfaced in this archive.
-
 ## Reproducing a run
 
 ```bash