Add prompt-scenario coverage check to code-testing-generator gate by Evangelink · Pull Request #789 · dotnet/skills

Evangelink · 2026-06-18T13:56:27Z

Motivation

While comparing two MSBench sweatlas-tw-unit runs that used the same model (claude-opus-4.6) and the same dotnet-test code-testing-* agents — differing only in the CLI harness — I traced every instance where one run passed and the other failed. The failures clustered into two patterns that the current pre-completion gate does not explicitly catch, even though the agent enumerated the scenarios from the prompt:

Tested an adjacent function instead of the one named in the objective. E.g. the objective asked for in6_mactoifaceid, but the generated tests targeted the sibling in6_mac_eui64; another asked for the rewrap line-buffer behavior but tests landed in a neighboring module. The requested behavior was left uncovered (rubric/mutation failure).
Covered a single representative case where the wording implied a range. E.g. "rewrapping when dimensions stay the same or change in height" was tested with one height change only; an objective requiring an octal escape as the first character after a prefix was tested with the escape appearing somewhere in the middle.

The existing gate items (test-gap-analysis pseudo-mutation + assertion-quality) verify that assertions are strong, but not that the generated tests cover the exact scenarios the prompt asked for. These are complementary.

Change

Adds a third item to the Step 7 pre-completion gate in code-testing-generator.agent.md: a prompt-scenario coverage check that, when the prompt enumerates behaviors/scenarios, requires the agent to:

Target the exact function/feature named in the objective (preferring the canonical existing test file over a new, narrower one) rather than a look-alike sibling.
Cover the full range each scenario's wording implies (e.g. "same or changed", "wider or narrower") instead of a single representative case.
Honor positional/structural qualifiers literally (e.g. "first character after the prefix", "filename containing a literal space").

Docs-only change to one agent file (+5 lines). markdownlint-cli2 passes with 0 errors; file is well under the 30,000-char agent-prompt limit.

The pre-completion gate already verifies assertion strength (pseudo-mutation and assertion-depth checks), but two recurring failure modes still slip through when the prompt enumerates specific behaviors: - Testing an *adjacent* function/helper instead of the exact feature named in the objective, leaving the requested behavior uncovered. - Covering only a single representative case when the scenario wording implies multiple variations or pins a condition to a specific position or structure. Add a third gate item that maps each enumerated scenario to a dedicated test, requires targeting the exact named function (preferring the canonical existing test file), and requires honoring range/positional qualifiers literally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Updates the dotnet-test code-testing generator agent documentation to add an explicit “prompt-scenario coverage” item to the Step 7 pre-completion gate, aiming to ensure generated tests cover the exact behaviors/scenarios enumerated by the prompt (not adjacent/sibling functionality or only a single representative case).

Changes:

Add a third Step 7 pre-completion gate item: prompt-scenario coverage check.
Document guidance to (1) target the exact named function/feature, (2) cover implied scenario ranges, and (3) honor positional/structural qualifiers literally.

Show a summary per file

File	Description
plugins/dotnet-test/agents/code-testing-generator.agent.md	Adds a new Step 7 gate item describing prompt-scenario coverage expectations for generated tests.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 1/1 changed files
Comments generated: 2

- Remove benchmark-specific symbol names from the target-the-named-function bullet to avoid overfitting; phrase it generically. - Fix the gate intro that said 'The two skills below' now that there are three numbered items (the third is a prompt self-review, not a skill). - Update Step 8 and Rule 11 so re-running the gate includes the new prompt-scenario coverage check, not just test-gap-analysis + assertion-quality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink · 2026-06-18T14:18:46Z

/evaluate

github-actions · 2026-06-18T14:19:20Z

⏭️ No skills to evaluate — no changed skills with tests were found in this PR. View workflow run

Copilot AI review requested due to automatic review settings June 18, 2026 13:56

Copilot started reviewing on behalf of Evangelink June 18, 2026 13:57 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Comment thread plugins/dotnet-test/agents/code-testing-generator.agent.md

Comment thread plugins/dotnet-test/agents/code-testing-generator.agent.md

Evangelink enabled auto-merge (squash) June 18, 2026 14:18

YuliiaKovalova approved these changes Jun 18, 2026

View reviewed changes

Evangelink merged commit 5d717db into dotnet:main Jun 18, 2026
37 checks passed

Evangelink mentioned this pull request Jun 19, 2026

Direct strategy must still run the Step 7 pre-completion gate #793

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt-scenario coverage check to code-testing-generator gate#789

Add prompt-scenario coverage check to code-testing-generator gate#789
Evangelink merged 2 commits into
dotnet:mainfrom
Evangelink:evangelink/strengthen-codegen-scenario-coverage

Evangelink commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Evangelink commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Evangelink commented Jun 18, 2026

Motivation

Change

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Evangelink commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants