From 4ce927982897ed253771b7c46ccea999fb37ad62 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Amaury=20Lev=C3=A9?= <evangelink@users.noreply.github.com>
Date: Fri, 19 Jun 2026 12:02:29 +0200
Subject: [PATCH 1/2] Direct strategy must still run the Step 7 pre-completion
 gate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The Direct strategy correctly skips the research/plan/implement sub-agents
for small single-file tasks, but the wording let agents also skip the
Step 7 pre-completion gate (test-gap-analysis + assertion-quality +
scenario coverage) — treating a single-file task that enumerates specific
behaviors as 'trivially small'.

This is the dominant failure mode observed on behavior-enumerating tasks:
the agent writes one test file directly and finishes with no
assertion-strength or scenario-coverage check, producing weak assertions
(mutation survivors) and missing required edge/negative cases.

Clarify in both the generator Step 2 strategy table and the
code-testing-agent SKILL.md that Direct trades away only the sub-agents,
never the gate, and that a request naming a specific symbol or enumerating
scenarios is not 'trivially small' and must run the gate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 plugins/dotnet-test/agents/code-testing-generator.agent.md | 6 +++---
 plugins/dotnet-test/skills/code-testing-agent/SKILL.md     | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/plugins/dotnet-test/agents/code-testing-generator.agent.md b/plugins/dotnet-test/agents/code-testing-generator.agent.md
index 48caba90d3..e4565d19ee 100644
--- a/plugins/dotnet-test/agents/code-testing-generator.agent.md
+++ b/plugins/dotnet-test/agents/code-testing-generator.agent.md
@@ -35,11 +35,11 @@ Based on the request scope, pick exactly one strategy and follow it:
 
 | Strategy | When to use | What to do |
 | ---------- | ------------- | ------------ |
-| **Direct** | A small, self-contained request (e.g., tests for a single function or class) that you can complete without sub-agents | Follow the codebase conventions on test file structure, naming, style, and testing approaches. Reuse existing test projects and test files when possible — if the code under test already has tests, add new tests to the same file or test project. Only create a new test file when no canonical file is named or discoverable for the symbol under test. Write the tests immediately. **Run them right away** — if any test fails, read the production code, fix the assertion, and re-run before writing more tests. Skip Steps 3-5 (research, plan, implement sub-agents). Then proceed to Steps 6-9 for validation and reporting. |
+| **Direct** | A small, self-contained request (e.g., tests for a single function or class) that you can complete without sub-agents | Follow the codebase conventions on test file structure, naming, style, and testing approaches. Reuse existing test projects and test files when possible — if the code under test already has tests, add new tests to the same file or test project. Only create a new test file when no canonical file is named or discoverable for the symbol under test. Write the tests immediately. **Run them right away** — if any test fails, read the production code, fix the assertion, and re-run before writing more tests. Skip Steps 3-5 (research, plan, implement sub-agents). Then proceed to Steps 6-9 for validation and reporting — **Direct skips only the sub-agents, never the Step 7 pre-completion gate** (run it whenever the request names a specific symbol or enumerates behaviors to verify; such a request is not "trivially small"). |
 | **Single pass** | A moderate scope (couple projects or modules) that a single Research → Plan → Implement cycle can cover | Execute Steps 3-8 once, then proceed to Step 9. |
 | **Iterative** | A large scope or ambitious coverage target that one pass cannot satisfy | Execute Steps 3-8, then re-evaluate coverage. If the target is not met, repeat Steps 3-8 with a narrowed focus on remaining gaps. Use unique names for each iteration's `.testagent/` documents (e.g., `research-2.md`, `plan-2.md`) so earlier results are not overwritten. Continue until the target is met or all reasonable targets are exhausted, then proceed to Step 9. |
 
-**Default to Direct** unless the request explicitly mentions multiple files, modules, or an entire project. Most test generation requests — including "generate tests for function X", "add tests covering these scenarios", and "write unit tests for this class" — should use Direct strategy. The full Research → Plan → Implement pipeline is only needed when the scope spans multiple unrelated source files.
+**Default to Direct** unless the request explicitly mentions multiple files, modules, or an entire project. Most test generation requests — including "generate tests for function X", "add tests covering these scenarios", and "write unit tests for this class" — should use Direct strategy. The full Research → Plan → Implement pipeline is only needed when the scope spans multiple unrelated source files. **Choosing Direct trades away only the sub-agent pipeline (Steps 3-5); it never trades away the Step 7 pre-completion gate.** When a request enumerates specific behaviors/scenarios (e.g., "add 1 test for each of these scenarios"), treat that list as the spec: target the exact symbol named, cover every enumerated scenario, and run the Step 7 gate before reporting completion.
 
 **Strategy decision examples:**
 
@@ -52,7 +52,7 @@ Based on the request scope, pick exactly one strategy and follow it:
 | "Generate comprehensive tests for my ASP.NET app" | Single pass | If the app has fewer than 10 controllers/services/files in scope, one R→P→I cycle should cover it |
 | "Generate comprehensive tests for my large ASP.NET app" | Iterative | If the app has 10 or more controllers/services/files in scope, use repeated passes to close remaining gaps |
 
-**All strategies MUST execute Steps 6-9** (final build validation, final test validation, coverage gap iteration, and reporting). These steps are never skipped.
+**All strategies MUST execute Steps 6-9** (final build validation, final test validation, coverage gap iteration, and reporting), and the Step 7 pre-completion gate within them. These steps are never skipped — including for Direct.
 
 ### Step 3: Research Phase
 
diff --git a/plugins/dotnet-test/skills/code-testing-agent/SKILL.md b/plugins/dotnet-test/skills/code-testing-agent/SKILL.md
index eb85bbf6ad..3dd8b82450 100644
--- a/plugins/dotnet-test/skills/code-testing-agent/SKILL.md
+++ b/plugins/dotnet-test/skills/code-testing-agent/SKILL.md
@@ -151,7 +151,7 @@ The generator picks a strategy based on request scope:
 
 | User Request | Strategy | Why |
 |---|---|---|
-| "Generate tests for `src/services/UserService.ts`" | **Direct** | Single file, small scope — write tests immediately, skip sub-agents |
+| "Generate tests for `src/services/UserService.ts`" | **Direct** | Single file, small scope — write tests immediately, skip sub-agents (but still run the pre-completion gate before finishing) |
 | "Add unit tests for my billing project" | **Single pass** | Moderate scope — one Research → Plan → Implement cycle covers it |
 | "Achieve 80% coverage across the entire solution" | **Iterative** | Large scope — multiple R→P→I cycles, each narrowing remaining gaps |
 

From c6c3a1dc1e88260d7e32150bf761a428bb14f74e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Amaury=20Lev=C3=A9?= <evangelink@users.noreply.github.com>
Date: Fri, 19 Jun 2026 13:38:33 +0200
Subject: [PATCH 2/2] Address review: align Direct gate trigger with Step 7
 threshold; clarify gate in SKILL.md

- Step 2 Direct cell no longer introduces a separate 'names a specific
  symbol' gate trigger that contradicted Step 7. It now defers to Step 7's
  own threshold (>=5 tests, or any enumerated behaviors/scenarios).
- SKILL.md now names what/where the gate is: the generator's Step 7
  (test-gap-analysis + assertion-quality).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 plugins/dotnet-test/agents/code-testing-generator.agent.md | 2 +-
 plugins/dotnet-test/skills/code-testing-agent/SKILL.md     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/plugins/dotnet-test/agents/code-testing-generator.agent.md b/plugins/dotnet-test/agents/code-testing-generator.agent.md
index e4565d19ee..4a25abe31a 100644
--- a/plugins/dotnet-test/agents/code-testing-generator.agent.md
+++ b/plugins/dotnet-test/agents/code-testing-generator.agent.md
@@ -35,7 +35,7 @@ Based on the request scope, pick exactly one strategy and follow it:
 
 | Strategy | When to use | What to do |
 | ---------- | ------------- | ------------ |
-| **Direct** | A small, self-contained request (e.g., tests for a single function or class) that you can complete without sub-agents | Follow the codebase conventions on test file structure, naming, style, and testing approaches. Reuse existing test projects and test files when possible — if the code under test already has tests, add new tests to the same file or test project. Only create a new test file when no canonical file is named or discoverable for the symbol under test. Write the tests immediately. **Run them right away** — if any test fails, read the production code, fix the assertion, and re-run before writing more tests. Skip Steps 3-5 (research, plan, implement sub-agents). Then proceed to Steps 6-9 for validation and reporting — **Direct skips only the sub-agents, never the Step 7 pre-completion gate** (run it whenever the request names a specific symbol or enumerates behaviors to verify; such a request is not "trivially small"). |
+| **Direct** | A small, self-contained request (e.g., tests for a single function or class) that you can complete without sub-agents | Follow the codebase conventions on test file structure, naming, style, and testing approaches. Reuse existing test projects and test files when possible — if the code under test already has tests, add new tests to the same file or test project. Only create a new test file when no canonical file is named or discoverable for the symbol under test. Write the tests immediately. **Run them right away** — if any test fails, read the production code, fix the assertion, and re-run before writing more tests. Skip Steps 3-5 (research, plan, implement sub-agents). Then proceed to Steps 6-9 for validation and reporting — **Direct skips only the sub-agents, never the Step 7 pre-completion gate** (which still runs per its own threshold in Step 7 — i.e. for any non-trivial addition: ≥5 tests, or any request that enumerates behaviors/scenarios to verify). |
 | **Single pass** | A moderate scope (couple projects or modules) that a single Research → Plan → Implement cycle can cover | Execute Steps 3-8 once, then proceed to Step 9. |
 | **Iterative** | A large scope or ambitious coverage target that one pass cannot satisfy | Execute Steps 3-8, then re-evaluate coverage. If the target is not met, repeat Steps 3-8 with a narrowed focus on remaining gaps. Use unique names for each iteration's `.testagent/` documents (e.g., `research-2.md`, `plan-2.md`) so earlier results are not overwritten. Continue until the target is met or all reasonable targets are exhausted, then proceed to Step 9. |
 
diff --git a/plugins/dotnet-test/skills/code-testing-agent/SKILL.md b/plugins/dotnet-test/skills/code-testing-agent/SKILL.md
index 3dd8b82450..487ecacde9 100644
--- a/plugins/dotnet-test/skills/code-testing-agent/SKILL.md
+++ b/plugins/dotnet-test/skills/code-testing-agent/SKILL.md
@@ -151,7 +151,7 @@ The generator picks a strategy based on request scope:
 
 | User Request | Strategy | Why |
 |---|---|---|
-| "Generate tests for `src/services/UserService.ts`" | **Direct** | Single file, small scope — write tests immediately, skip sub-agents (but still run the pre-completion gate before finishing) |
+| "Generate tests for `src/services/UserService.ts`" | **Direct** | Single file, small scope — write tests immediately, skip sub-agents (but still run the generator's Step 7 pre-completion gate — `test-gap-analysis` + `assertion-quality` — before finishing) |
 | "Add unit tests for my billing project" | **Single pass** | Moderate scope — one Research → Plan → Implement cycle covers it |
 | "Achieve 80% coverage across the entire solution" | **Iterative** | Large scope — multiple R→P→I cycles, each narrowing remaining gaps |