From 4ce927982897ed253771b7c46ccea999fb37ad62 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Amaury=20Lev=C3=A9?= Date: Fri, 19 Jun 2026 12:02:29 +0200 Subject: [PATCH 1/2] Direct strategy must still run the Step 7 pre-completion gate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Direct strategy correctly skips the research/plan/implement sub-agents for small single-file tasks, but the wording let agents also skip the Step 7 pre-completion gate (test-gap-analysis + assertion-quality + scenario coverage) — treating a single-file task that enumerates specific behaviors as 'trivially small'. This is the dominant failure mode observed on behavior-enumerating tasks: the agent writes one test file directly and finishes with no assertion-strength or scenario-coverage check, producing weak assertions (mutation survivors) and missing required edge/negative cases. Clarify in both the generator Step 2 strategy table and the code-testing-agent SKILL.md that Direct trades away only the sub-agents, never the gate, and that a request naming a specific symbol or enumerating scenarios is not 'trivially small' and must run the gate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- plugins/dotnet-test/agents/code-testing-generator.agent.md | 6 +++--- plugins/dotnet-test/skills/code-testing-agent/SKILL.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/plugins/dotnet-test/agents/code-testing-generator.agent.md b/plugins/dotnet-test/agents/code-testing-generator.agent.md index 48caba90d3..e4565d19ee 100644 --- a/plugins/dotnet-test/agents/code-testing-generator.agent.md +++ b/plugins/dotnet-test/agents/code-testing-generator.agent.md @@ -35,11 +35,11 @@ Based on the request scope, pick exactly one strategy and follow it: | Strategy | When to use | What to do | | ---------- | ------------- | ------------ | -| **Direct** | A small, self-contained request (e.g., tests for a single function or class) that you can complete without sub-agents | Follow the codebase conventions on test file structure, naming, style, and testing approaches. Reuse existing test projects and test files when possible — if the code under test already has tests, add new tests to the same file or test project. Only create a new test file when no canonical file is named or discoverable for the symbol under test. Write the tests immediately. **Run them right away** — if any test fails, read the production code, fix the assertion, and re-run before writing more tests. Skip Steps 3-5 (research, plan, implement sub-agents). Then proceed to Steps 6-9 for validation and reporting. | +| **Direct** | A small, self-contained request (e.g., tests for a single function or class) that you can complete without sub-agents | Follow the codebase conventions on test file structure, naming, style, and testing approaches. Reuse existing test projects and test files when possible — if the code under test already has tests, add new tests to the same file or test project. Only create a new test file when no canonical file is named or discoverable for the symbol under test. Write the tests immediately. **Run them right away** — if any test fails, read the production code, fix the assertion, and re-run before writing more tests. Skip Steps 3-5 (research, plan, implement sub-agents). Then proceed to Steps 6-9 for validation and reporting — **Direct skips only the sub-agents, never the Step 7 pre-completion gate** (run it whenever the request names a specific symbol or enumerates behaviors to verify; such a request is not "trivially small"). | | **Single pass** | A moderate scope (couple projects or modules) that a single Research → Plan → Implement cycle can cover | Execute Steps 3-8 once, then proceed to Step 9. | | **Iterative** | A large scope or ambitious coverage target that one pass cannot satisfy | Execute Steps 3-8, then re-evaluate coverage. If the target is not met, repeat Steps 3-8 with a narrowed focus on remaining gaps. Use unique names for each iteration's `.testagent/` documents (e.g., `research-2.md`, `plan-2.md`) so earlier results are not overwritten. Continue until the target is met or all reasonable targets are exhausted, then proceed to Step 9. | -**Default to Direct** unless the request explicitly mentions multiple files, modules, or an entire project. Most test generation requests — including "generate tests for function X", "add tests covering these scenarios", and "write unit tests for this class" — should use Direct strategy. The full Research → Plan → Implement pipeline is only needed when the scope spans multiple unrelated source files. +**Default to Direct** unless the request explicitly mentions multiple files, modules, or an entire project. Most test generation requests — including "generate tests for function X", "add tests covering these scenarios", and "write unit tests for this class" — should use Direct strategy. The full Research → Plan → Implement pipeline is only needed when the scope spans multiple unrelated source files. **Choosing Direct trades away only the sub-agent pipeline (Steps 3-5); it never trades away the Step 7 pre-completion gate.** When a request enumerates specific behaviors/scenarios (e.g., "add 1 test for each of these scenarios"), treat that list as the spec: target the exact symbol named, cover every enumerated scenario, and run the Step 7 gate before reporting completion. **Strategy decision examples:** @@ -52,7 +52,7 @@ Based on the request scope, pick exactly one strategy and follow it: | "Generate comprehensive tests for my ASP.NET app" | Single pass | If the app has fewer than 10 controllers/services/files in scope, one R→P→I cycle should cover it | | "Generate comprehensive tests for my large ASP.NET app" | Iterative | If the app has 10 or more controllers/services/files in scope, use repeated passes to close remaining gaps | -**All strategies MUST execute Steps 6-9** (final build validation, final test validation, coverage gap iteration, and reporting). These steps are never skipped. +**All strategies MUST execute Steps 6-9** (final build validation, final test validation, coverage gap iteration, and reporting), and the Step 7 pre-completion gate within them. These steps are never skipped — including for Direct. ### Step 3: Research Phase diff --git a/plugins/dotnet-test/skills/code-testing-agent/SKILL.md b/plugins/dotnet-test/skills/code-testing-agent/SKILL.md index eb85bbf6ad..3dd8b82450 100644 --- a/plugins/dotnet-test/skills/code-testing-agent/SKILL.md +++ b/plugins/dotnet-test/skills/code-testing-agent/SKILL.md @@ -151,7 +151,7 @@ The generator picks a strategy based on request scope: | User Request | Strategy | Why | |---|---|---| -| "Generate tests for `src/services/UserService.ts`" | **Direct** | Single file, small scope — write tests immediately, skip sub-agents | +| "Generate tests for `src/services/UserService.ts`" | **Direct** | Single file, small scope — write tests immediately, skip sub-agents (but still run the pre-completion gate before finishing) | | "Add unit tests for my billing project" | **Single pass** | Moderate scope — one Research → Plan → Implement cycle covers it | | "Achieve 80% coverage across the entire solution" | **Iterative** | Large scope — multiple R→P→I cycles, each narrowing remaining gaps | From c6c3a1dc1e88260d7e32150bf761a428bb14f74e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Amaury=20Lev=C3=A9?= Date: Fri, 19 Jun 2026 13:38:33 +0200 Subject: [PATCH 2/2] Address review: align Direct gate trigger with Step 7 threshold; clarify gate in SKILL.md - Step 2 Direct cell no longer introduces a separate 'names a specific symbol' gate trigger that contradicted Step 7. It now defers to Step 7's own threshold (>=5 tests, or any enumerated behaviors/scenarios). - SKILL.md now names what/where the gate is: the generator's Step 7 (test-gap-analysis + assertion-quality). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- plugins/dotnet-test/agents/code-testing-generator.agent.md | 2 +- plugins/dotnet-test/skills/code-testing-agent/SKILL.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/plugins/dotnet-test/agents/code-testing-generator.agent.md b/plugins/dotnet-test/agents/code-testing-generator.agent.md index e4565d19ee..4a25abe31a 100644 --- a/plugins/dotnet-test/agents/code-testing-generator.agent.md +++ b/plugins/dotnet-test/agents/code-testing-generator.agent.md @@ -35,7 +35,7 @@ Based on the request scope, pick exactly one strategy and follow it: | Strategy | When to use | What to do | | ---------- | ------------- | ------------ | -| **Direct** | A small, self-contained request (e.g., tests for a single function or class) that you can complete without sub-agents | Follow the codebase conventions on test file structure, naming, style, and testing approaches. Reuse existing test projects and test files when possible — if the code under test already has tests, add new tests to the same file or test project. Only create a new test file when no canonical file is named or discoverable for the symbol under test. Write the tests immediately. **Run them right away** — if any test fails, read the production code, fix the assertion, and re-run before writing more tests. Skip Steps 3-5 (research, plan, implement sub-agents). Then proceed to Steps 6-9 for validation and reporting — **Direct skips only the sub-agents, never the Step 7 pre-completion gate** (run it whenever the request names a specific symbol or enumerates behaviors to verify; such a request is not "trivially small"). | +| **Direct** | A small, self-contained request (e.g., tests for a single function or class) that you can complete without sub-agents | Follow the codebase conventions on test file structure, naming, style, and testing approaches. Reuse existing test projects and test files when possible — if the code under test already has tests, add new tests to the same file or test project. Only create a new test file when no canonical file is named or discoverable for the symbol under test. Write the tests immediately. **Run them right away** — if any test fails, read the production code, fix the assertion, and re-run before writing more tests. Skip Steps 3-5 (research, plan, implement sub-agents). Then proceed to Steps 6-9 for validation and reporting — **Direct skips only the sub-agents, never the Step 7 pre-completion gate** (which still runs per its own threshold in Step 7 — i.e. for any non-trivial addition: ≥5 tests, or any request that enumerates behaviors/scenarios to verify). | | **Single pass** | A moderate scope (couple projects or modules) that a single Research → Plan → Implement cycle can cover | Execute Steps 3-8 once, then proceed to Step 9. | | **Iterative** | A large scope or ambitious coverage target that one pass cannot satisfy | Execute Steps 3-8, then re-evaluate coverage. If the target is not met, repeat Steps 3-8 with a narrowed focus on remaining gaps. Use unique names for each iteration's `.testagent/` documents (e.g., `research-2.md`, `plan-2.md`) so earlier results are not overwritten. Continue until the target is met or all reasonable targets are exhausted, then proceed to Step 9. | diff --git a/plugins/dotnet-test/skills/code-testing-agent/SKILL.md b/plugins/dotnet-test/skills/code-testing-agent/SKILL.md index 3dd8b82450..487ecacde9 100644 --- a/plugins/dotnet-test/skills/code-testing-agent/SKILL.md +++ b/plugins/dotnet-test/skills/code-testing-agent/SKILL.md @@ -151,7 +151,7 @@ The generator picks a strategy based on request scope: | User Request | Strategy | Why | |---|---|---| -| "Generate tests for `src/services/UserService.ts`" | **Direct** | Single file, small scope — write tests immediately, skip sub-agents (but still run the pre-completion gate before finishing) | +| "Generate tests for `src/services/UserService.ts`" | **Direct** | Single file, small scope — write tests immediately, skip sub-agents (but still run the generator's Step 7 pre-completion gate — `test-gap-analysis` + `assertion-quality` — before finishing) | | "Add unit tests for my billing project" | **Single pass** | Moderate scope — one Research → Plan → Implement cycle covers it | | "Achieve 80% coverage across the entire solution" | **Iterative** | Large scope — multiple R→P→I cycles, each narrowing remaining gaps |