Skip to content

Add prompt-scenario coverage check to code-testing-generator gate#789

Merged
Evangelink merged 2 commits into
dotnet:mainfrom
Evangelink:evangelink/strengthen-codegen-scenario-coverage
Jun 18, 2026
Merged

Add prompt-scenario coverage check to code-testing-generator gate#789
Evangelink merged 2 commits into
dotnet:mainfrom
Evangelink:evangelink/strengthen-codegen-scenario-coverage

Conversation

@Evangelink

Copy link
Copy Markdown
Member

Motivation

While comparing two MSBench sweatlas-tw-unit runs that used the same model (claude-opus-4.6) and the same dotnet-test code-testing-* agents — differing only in the CLI harness — I traced every instance where one run passed and the other failed. The failures clustered into two patterns that the current pre-completion gate does not explicitly catch, even though the agent enumerated the scenarios from the prompt:

  • Tested an adjacent function instead of the one named in the objective. E.g. the objective asked for in6_mactoifaceid, but the generated tests targeted the sibling in6_mac_eui64; another asked for the rewrap line-buffer behavior but tests landed in a neighboring module. The requested behavior was left uncovered (rubric/mutation failure).
  • Covered a single representative case where the wording implied a range. E.g. "rewrapping when dimensions stay the same or change in height" was tested with one height change only; an objective requiring an octal escape as the first character after a prefix was tested with the escape appearing somewhere in the middle.

The existing gate items (test-gap-analysis pseudo-mutation + assertion-quality) verify that assertions are strong, but not that the generated tests cover the exact scenarios the prompt asked for. These are complementary.

Change

Adds a third item to the Step 7 pre-completion gate in code-testing-generator.agent.md: a prompt-scenario coverage check that, when the prompt enumerates behaviors/scenarios, requires the agent to:

  1. Target the exact function/feature named in the objective (preferring the canonical existing test file over a new, narrower one) rather than a look-alike sibling.
  2. Cover the full range each scenario's wording implies (e.g. "same or changed", "wider or narrower") instead of a single representative case.
  3. Honor positional/structural qualifiers literally (e.g. "first character after the prefix", "filename containing a literal space").

Docs-only change to one agent file (+5 lines). markdownlint-cli2 passes with 0 errors; file is well under the 30,000-char agent-prompt limit.

The pre-completion gate already verifies assertion strength (pseudo-mutation
and assertion-depth checks), but two recurring failure modes still slip
through when the prompt enumerates specific behaviors:

- Testing an *adjacent* function/helper instead of the exact feature named
  in the objective, leaving the requested behavior uncovered.
- Covering only a single representative case when the scenario wording
  implies multiple variations or pins a condition to a specific position
  or structure.

Add a third gate item that maps each enumerated scenario to a dedicated
test, requires targeting the exact named function (preferring the canonical
existing test file), and requires honoring range/positional qualifiers
literally.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 13:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the dotnet-test code-testing generator agent documentation to add an explicit “prompt-scenario coverage” item to the Step 7 pre-completion gate, aiming to ensure generated tests cover the exact behaviors/scenarios enumerated by the prompt (not adjacent/sibling functionality or only a single representative case).

Changes:

  • Add a third Step 7 pre-completion gate item: prompt-scenario coverage check.
  • Document guidance to (1) target the exact named function/feature, (2) cover implied scenario ranges, and (3) honor positional/structural qualifiers literally.
Show a summary per file
File Description
plugins/dotnet-test/agents/code-testing-generator.agent.md Adds a new Step 7 gate item describing prompt-scenario coverage expectations for generated tests.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 2

Comment thread plugins/dotnet-test/agents/code-testing-generator.agent.md
Comment thread plugins/dotnet-test/agents/code-testing-generator.agent.md
- Remove benchmark-specific symbol names from the target-the-named-function
  bullet to avoid overfitting; phrase it generically.
- Fix the gate intro that said 'The two skills below' now that there are
  three numbered items (the third is a prompt self-review, not a skill).
- Update Step 8 and Rule 11 so re-running the gate includes the new
  prompt-scenario coverage check, not just test-gap-analysis + assertion-quality.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Evangelink

Copy link
Copy Markdown
Member Author

/evaluate

@Evangelink Evangelink enabled auto-merge (squash) June 18, 2026 14:18
@github-actions

Copy link
Copy Markdown
Contributor

⏭️ No skills to evaluate — no changed skills with tests were found in this PR. View workflow run

@Evangelink Evangelink merged commit 5d717db into dotnet:main Jun 18, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants