Cover MSTESTxxxx analyzer diagnostics in writing-mstest-tests skill by Evangelink · Pull Request #794 · dotnet/skills

Evangelink · 2026-06-19T14:14:16Z

What

Extends the existing writing-mstest-tests skill to cover the MSTest analyzer rules (MSTESTxxxx) instead of adding a new skill per rule.

Why

There are 63 MSTESTxxxx rules. They are Roslyn analyzers that already self-surface during build and in the IDE (with messages and, in most cases, automated code fixes). What an agent needs is the idiomatic fix + rationale, which is content — not 63 separate, overlapping, activation-gated skills that would cannibalize each other''s activation and require a web of "DO NOT USE" redirects. The existing skill already teaches the correct patterns these analyzers enforce, so this consolidates the remaining gaps in one place.

Changes

New "Step 8: Fix MSTest analyzer diagnostics (MSTESTxxxx)" section with a rule → problem → fix table covering the high-value rules not previously called out (MSTEST0023, 0025, 0032, 0038, 0044, 0052, 0024, 0036, 0061, the 0042/0060 duplicates, the 0002–0014 layout family), cross-linked to existing steps for the rules already covered (0006/0017/0037/0039/0046/0045-0049-0054).
MSTestAnalysisMode (None/Default/Recommended/All) guidance and the opt-in rules note.
Link to the official MSTest code analysis overview.
Added a USE FOR keyword (fix MSTEST analyzer diagnostics (MSTESTxxxx rules)) and a When-to-Use trigger; trimmed the verbose assertion-API list to keep the description at 1000 chars (under the 1024 cap).

Validation

skill-validator check --plugin ./plugins/dotnet-test → all checks pass (27 skills, 11 agents). Only a soft "approaching comprehensive token range" warning on the skill size.

Add a 'Fix MSTest analyzer diagnostics' workflow step mapping the common MSTESTxxxx rules to their idiomatic fixes, plus MSTestAnalysisMode guidance, instead of creating one skill per rule. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-19T14:14:50Z

Skill Coverage Report

	Plugin	Skill	Covered	Coverage
✅	`dotnet-test`	`code-testing-agent`	5/5	100%
✅	`dotnet-test`	`writing-mstest-tests`	39/45	86.7%

Uncovered: dotnet-test/writing-mstest-tests

[CodePattern] Assert.IsNotEmpty (line 178)
[CodePattern] Assert.AreSame (line 154)
[CodePattern] Assert.IsEmpty (line 178)
[CodePattern] Assert.DoesNotContain (line 178)
[CodePattern] Assert.Contains (line 178)
[CodePattern] Assert.IsNull (line 154)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Extends the writing-mstest-tests skill content to explicitly cover fixing MSTest analyzer diagnostics (MSTESTxxxx) within the existing workflow, instead of creating many separate per-rule skills.

Changes:

Updates the skill description/triggering text to include fixing MSTESTxxxx analyzer diagnostics.
Adds a new “Step 8” section with a rule → problem → fix table and guidance on MSTestAnalysisMode.

Show a summary per file

File	Description
plugins/dotnet-test/skills/writing-mstest-tests/SKILL.md	Adds MSTest analyzer diagnostics guidance (new Step 8) and updates description/triggers to cover `MSTESTxxxx` rules.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 1/1 changed files
Comments generated: 2

…rammar - Don't tie MSTest.Analyzers availability to TestFramework 3.7; note metapackage/SDK/explicit reference. - Fix grammatically broken fix text for the MSTEST0002-0014 layout row. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 1/1 changed files
Comments generated: 1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink · 2026-06-19T15:15:05Z

/evaluate

github-actions · 2026-06-19T15:29:14Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
writing-mstest-tests	Write unit tests for a service class	4.3/5 → 4.3/5	✅ writing-mstest-tests; tools: skill, glob / ✅ writing-mstest-tests; tools: skill	🟡 0.33	❌ [1]
writing-mstest-tests	Write data-driven tests for a calculator	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: report_intent, skill, glob, view / ⚠️ NOT ACTIVATED	🟡 0.33	✅ [2]
writing-mstest-tests	Write async tests with cancellation	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.33	✅
writing-mstest-tests	Fix swapped Assert.AreEqual arguments	4.7/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.33	✅ [3]
writing-mstest-tests	Modernize legacy test patterns	4.3/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.33	✅ [4]
writing-mstest-tests	Replace ExpectedException with Assert.Throws	3.0/5 → 3.7/5 🟢	✅ writing-mstest-tests; tools: skill, report_intent / ⚠️ NOT ACTIVATED	🟡 0.33	✅ [5]
writing-mstest-tests	Use proper collection assertions	3.0/5 → 2.0/5 🔴	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.33	❌ [6]
writing-mstest-tests	Use proper type assertions instead of casts	4.0/5 → 4.7/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.33	✅ [7]
writing-mstest-tests	Set up test lifecycle correctly	2.0/5 → 4.0/5 🟢	✅ writing-mstest-tests; tools: skill, report_intent / ⚠️ NOT ACTIVATED	🟡 0.33	✅ [8]
writing-mstest-tests	Use DynamicData with ValueTuples over object arrays	3.0/5 → 3.0/5	✅ writing-mstest-tests; tools: skill, report_intent / ⚠️ NOT ACTIVATED	🟡 0.33	❌ [9]
writing-mstest-tests	Use string assertions for format validation	3.7/5 → 4.0/5 ⏰ 🟢	✅ writing-mstest-tests; tools: skill, edit, view, bash / ⚠️ NOT ACTIVATED	🟡 0.33	❌ [10]
writing-mstest-tests	Use comparison assertions for boundary testing	2.3/5 → 3.7/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.33	✅ [11]
writing-mstest-tests	Write tests with collection, null, and reference assertions	4.0/5 → 4.7/5 🟢	✅ writing-mstest-tests; tools: glob, skill / ⚠️ NOT ACTIVATED	🟡 0.33	✅ [12]
writing-mstest-tests	Configure conditional execution, retry, and cleanup	3.0/5 → 4.3/5 🟢	✅ writing-mstest-tests; tools: skill, report_intent / ⚠️ NOT ACTIVATED	🟡 0.33	❌ [13]
writing-mstest-tests	Configure test parallelization and MSTest.Sdk project	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill	🟡 0.33	✅ [14]

[1] ⚠️ High run-to-run variance (CV=349%) — consider re-running with --runs 5
[2] ⚠️ High run-to-run variance (CV=68%) — consider re-running with --runs 5
[3] ⚠️ High run-to-run variance (CV=2394%) — consider re-running with --runs 5
[4] ⚠️ High run-to-run variance (CV=142%) — consider re-running with --runs 5
[5] ⚠️ High run-to-run variance (CV=110%) — consider re-running with --runs 5
[6] ⚠️ High run-to-run variance (CV=68%) — consider re-running with --runs 5
[7] ⚠️ High run-to-run variance (CV=103%) — consider re-running with --runs 5
[8] ⚠️ High run-to-run variance (CV=120%) — consider re-running with --runs 5
[9] ⚠️ High run-to-run variance (CV=60%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -2.1% due to: tokens (12819 → 18051)
[10] ⚠️ High run-to-run variance (CV=232%) — consider re-running with --runs 5. (Isolated) Quality improved but weighted score is -2.9% due to: tokens (116180 → 430002), tool calls (8 → 23), time (69.0s → 151.8s)
[11] ⚠️ High run-to-run variance (CV=9923%) — consider re-running with --runs 5
[12] ⚠️ High run-to-run variance (CV=100%) — consider re-running with --runs 5
[13] ⚠️ High run-to-run variance (CV=228%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -16.6% due to: judgment, quality, tokens (13228 → 18182)
[14] ⚠️ High run-to-run variance (CV=54%) — consider re-running with --runs 5

⏰ timeout — run(s) hit the (180s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 794 in dotnet/skills, download eval artifacts with gh run download 27833908923 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/15cfbbaf67a3a47e83a2b519f39d1949b4a82468/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

github-actions · 2026-06-19T15:29:16Z

✅ Evaluation passed for 15cfbba. cc @dotnet/dotnet-testing — please review.

…om code-testing-agent In plugin runs, code-testing-agent (generic 'write/comprehensive unit tests') was stealing activation from writing-mstest-tests for MSTest-specific prompts. Broaden code-testing-agent's DO NOT USE carve-out to defer writing/fixing/modernizing MSTest-specific tests, assertions, attributes, and lifecycle to writing-mstest-tests, and have writing-mstest-tests claim 'comprehensive MSTest unit tests'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 0 new

Evangelink · 2026-06-19T16:05:58Z

/evaluate

github-actions · 2026-06-19T16:25:18Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
writing-mstest-tests	Write unit tests for a service class	4.0/5 → 4.3/5 🟢	✅ writing-mstest-tests; tools: skill, glob	🟡 0.29	✅ [1]
writing-mstest-tests	Write data-driven tests for a calculator	3.7/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: report_intent, skill, glob, view / ✅ writing-mstest-tests; tools: report_intent, view, skill, create, bash, edit	🟡 0.29	✅ [2]
writing-mstest-tests	Write async tests with cancellation	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.29	✅ [3]
writing-mstest-tests	Fix swapped Assert.AreEqual arguments	5.0/5 → 5.0/5	⚠️ NOT ACTIVATED	🟡 0.29	❌ [4]
writing-mstest-tests	Modernize legacy test patterns	4.3/5 → 4.7/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.29	❌ [5]
writing-mstest-tests	Replace ExpectedException with Assert.Throws	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: report_intent, skill / ⚠️ NOT ACTIVATED	🟡 0.29	✅
writing-mstest-tests	Use proper collection assertions	3.0/5 → 2.0/5 🔴	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.29	❌ [6]
writing-mstest-tests	Use proper type assertions instead of casts	4.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill, report_intent / ⚠️ NOT ACTIVATED	🟡 0.29	✅ [7]
writing-mstest-tests	Set up test lifecycle correctly	2.0/5 → 4.0/5 🟢	✅ writing-mstest-tests; tools: skill, report_intent / ✅ writing-mstest-tests; tools: report_intent, skill	🟡 0.29	✅ [8]
writing-mstest-tests	Use DynamicData with ValueTuples over object arrays	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: report_intent, skill / ⚠️ NOT ACTIVATED	🟡 0.29	❌ [9]
writing-mstest-tests	Use string assertions for format validation	3.7/5 → 4.0/5 ⏰ 🟢	✅ writing-mstest-tests; tools: skill, view, edit / ⚠️ NOT ACTIVATED	🟡 0.29	❌ [10]
writing-mstest-tests	Use comparison assertions for boundary testing	2.3/5 → 3.7/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.29	❌ [11]
writing-mstest-tests	Write tests with collection, null, and reference assertions	4.0/5 → 4.0/5	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.29	❌ [12]
writing-mstest-tests	Configure conditional execution, retry, and cleanup	2.7/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: report_intent, skill / ✅ writing-mstest-tests; tools: skill	🟡 0.29	✅ [13]
writing-mstest-tests	Configure test parallelization and MSTest.Sdk project	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill	🟡 0.29	✅
code-testing-agent	Generate tests for ContosoUniversity ASP.NET Core MVC app	3.3/5 → 3.0/5 🔴	✅ code-testing-agent; tools: skill / ✅ code-testing-extensions; code-testing-agent; tools: task, skill, read_agent, grep	✅ 0.18	❌ [14]
code-testing-agent	Generate pytest tests for the Flask tasks API (Python polyglot)	4.0/5 → 4.3/5 🟢	✅ code-testing-agent; tools: skill / ⚠️ NOT ACTIVATED	✅ 0.18	❌ [15]
code-testing-agent	Generate Vitest tests for the shopping-cart library (TypeScript polyglot)	4.7/5 → 4.7/5	✅ code-testing-agent; tools: skill	✅ 0.18	✅ [16]
code-testing-agent	Does not revert a gutted-looking workspace (workspace integrity)	5.0/5 → 5.0/5	⚠️ NOT ACTIVATED	✅ 0.18	❌ [17]

[1] ⚠️ High run-to-run variance (CV=2057%) — consider re-running with --runs 5
[2] ⚠️ High run-to-run variance (CV=113%) — consider re-running with --runs 5
[3] ⚠️ High run-to-run variance (CV=203%) — consider re-running with --runs 5
[4] (Plugin) Quality unchanged but weighted score is -2.3% due to: tokens (12757 → 17995)
[5] ⚠️ High run-to-run variance (CV=217%) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -15.3% due to: judgment, quality
[6] ⚠️ High run-to-run variance (CV=82%) — consider re-running with --runs 5
[7] ⚠️ High run-to-run variance (CV=618%) — consider re-running with --runs 5. (Plugin) Quality dropped but weighted score is +3.4% due to: tool calls (1 → 0), tokens (21035 → 17873)
[8] ⚠️ High run-to-run variance (CV=74%) — consider re-running with --runs 5
[9] ⚠️ High run-to-run variance (CV=105%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -13.8% due to: judgment, tokens (12916 → 18055)
[10] ⚠️ High run-to-run variance (CV=739%) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -27.7% due to: judgment, quality, tokens (97530 → 311183), tool calls (6 → 19), time (61.3s → 104.8s)
[11] ⚠️ High run-to-run variance (CV=131%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -13.7% due to: judgment, tokens (13590 → 18635)
[12] ⚠️ High run-to-run variance (CV=52%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -2.5% due to: tokens (186872 → 265498)
[13] ⚠️ High run-to-run variance (CV=157%) — consider re-running with --runs 5
[14] ⚠️ High run-to-run variance (CV=61%) — consider re-running with --runs 5
[15] ⚠️ High run-to-run variance (CV=1596%) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -16.0% due to: judgment, tokens (184221 → 259184), quality
[16] ⚠️ High run-to-run variance (CV=145%) — consider re-running with --runs 5
[17] ⚠️ High run-to-run variance (CV=82%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -1.3% due to: tokens (101101 → 130138)

⏰ timeout — run(s) hit the (180s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 794 in dotnet/skills, download eval artifacts with gh run download 27836324404 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/c263a61f795a2aabc2788e97629d5eb54350b824/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

…ugin activation Post-fix eval (run 27836324404) showed code-testing-agent no longer steals (no sibling fires), but string/comparison/reference-assertion scenarios still don't activate in the plugin run — their trigger keywords (StartsWith, EndsWith, MatchesRegex, IsGreaterThan, IsLessThan, IsInRange, AreSame) had been trimmed for budget. Rebuild the description to restore all eval-relevant assertion APIs and lead with write/create/modernize/fix, staying at 1013 chars. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink · 2026-06-19T16:40:43Z

/evaluate

github-actions · 2026-06-19T17:04:36Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
writing-mstest-tests	Write unit tests for a service class	4.3/5 → 4.0/5 🔴	✅ writing-mstest-tests; tools: skill, glob	🟡 0.34	❌
writing-mstest-tests	Write data-driven tests for a calculator	3.3/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill, report_intent, glob / ✅ writing-mstest-tests; tools: skill, report_intent, view, create, bash, edit	🟡 0.34	✅ [1]
writing-mstest-tests	Write async tests with cancellation	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.34	✅
writing-mstest-tests	Fix swapped Assert.AreEqual arguments	5.0/5 → 5.0/5	⚠️ NOT ACTIVATED	🟡 0.34	❌ [2]
writing-mstest-tests	Modernize legacy test patterns	4.3/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.34	❌ [3]
writing-mstest-tests	Replace ExpectedException with Assert.Throws	3.0/5 → 4.7/5 🟢	✅ writing-mstest-tests; tools: report_intent, skill / ⚠️ NOT ACTIVATED	🟡 0.34	✅ [4]
writing-mstest-tests	Use proper collection assertions	3.3/5 → 2.7/5 🔴	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.34	❌ [5]
writing-mstest-tests	Use proper type assertions instead of casts	4.0/5 → 4.3/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.34	❌ [6]
writing-mstest-tests	Set up test lifecycle correctly	2.0/5 → 4.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.34	✅
writing-mstest-tests	Use DynamicData with ValueTuples over object arrays	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.34	❌ [7]
writing-mstest-tests	Use string assertions for format validation	4.0/5 → 4.7/5 🟢	✅ writing-mstest-tests; tools: skill, bash, edit, view / ⚠️ NOT ACTIVATED	🟡 0.34	✅ [8]
writing-mstest-tests	Use comparison assertions for boundary testing	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.34	❌ [9]
writing-mstest-tests	Write tests with collection, null, and reference assertions	4.0/5 → 4.7/5 🟢	✅ writing-mstest-tests; tools: skill, glob / ⚠️ NOT ACTIVATED	🟡 0.34	✅ [10]
writing-mstest-tests	Configure conditional execution, retry, and cleanup	2.7/5 → 4.3/5 🟢	✅ writing-mstest-tests; tools: report_intent, skill / ⚠️ NOT ACTIVATED	🟡 0.34	❌ [11]
writing-mstest-tests	Configure test parallelization and MSTest.Sdk project	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill	🟡 0.34	✅
code-testing-agent	Generate tests for ContosoUniversity ASP.NET Core MVC app	3.3/5 → 3.0/5 🔴	✅ code-testing-agent; tools: skill, task, glob, read_agent, grep / ✅ code-testing-agent; code-testing-extensions; tools: skill, task, read_agent, glob	🟡 0.21	❌
code-testing-agent	Generate pytest tests for the Flask tasks API (Python polyglot)	4.3/5 → 4.0/5 🔴	✅ code-testing-agent; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.21	❌ [12]
code-testing-agent	Generate Vitest tests for the shopping-cart library (TypeScript polyglot)	5.0/5 → 4.3/5 🔴	✅ code-testing-agent; tools: skill / ✅ code-testing-agent; tools: skill, edit	🟡 0.21	❌ [13]
code-testing-agent	Does not revert a gutted-looking workspace (workspace integrity)	5.0/5 → 5.0/5	✅ code-testing-agent; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.21	❌ [14]

[1] ⚠️ High run-to-run variance (CV=119%) — consider re-running with --runs 5
[2] (Plugin) Quality unchanged but weighted score is -2.5% due to: tokens (12766 → 18009)
[3] ⚠️ High run-to-run variance (CV=88%) — consider re-running with --runs 5. (Isolated) Quality improved but weighted score is -0.3% due to: quality
[4] ⚠️ High run-to-run variance (CV=154%) — consider re-running with --runs 5
[5] ⚠️ High run-to-run variance (CV=112%) — consider re-running with --runs 5
[6] ⚠️ High run-to-run variance (CV=56%) — consider re-running with --runs 5
[7] ⚠️ High run-to-run variance (CV=617%) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -3.0% due to: tokens (12804 → 18055), time (6.9s → 9.4s)
[8] ⚠️ High run-to-run variance (CV=424%) — consider re-running with --runs 5
[9] ⚠️ High run-to-run variance (CV=67814%) — consider re-running with --runs 5
[10] ⚠️ High run-to-run variance (CV=261%) — consider re-running with --runs 5
[11] ⚠️ High run-to-run variance (CV=111%) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -1.6% due to: tokens (13263 → 18271)
[12] ⚠️ High run-to-run variance (CV=113%) — consider re-running with --runs 5
[13] ⚠️ High run-to-run variance (CV=54%) — consider re-running with --runs 5
[14] ⚠️ High run-to-run variance (CV=53%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -2.0% due to: tokens (85464 → 117141)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 794 in dotnet/skills, download eval artifacts with gh run download 27837850135 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/7e553b51f9796b219537fef735c7a6b5bef4b257/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

Copilot AI review requested due to automatic review settings June 19, 2026 14:14

Copilot started reviewing on behalf of Evangelink June 19, 2026 14:14 View session

Fix MD012 markdown lint (trailing blank line)

3d6b040

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Comment thread plugins/dotnet-test/skills/writing-mstest-tests/SKILL.md Outdated

Comment thread plugins/dotnet-test/skills/writing-mstest-tests/SKILL.md Outdated

Copilot AI review requested due to automatic review settings June 19, 2026 14:17

Copilot started reviewing on behalf of Evangelink June 19, 2026 14:18 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Comment thread plugins/dotnet-test/skills/writing-mstest-tests/SKILL.md Outdated

Clarify the MSTESTxxxx table is non-exhaustive; defer to full reference

15cfbba

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink enabled auto-merge (squash) June 19, 2026 15:14

github-actions Bot added the waiting-on-review PR state label label Jun 19, 2026

Copilot AI review requested due to automatic review settings June 19, 2026 15:56

Copilot started reviewing on behalf of Evangelink June 19, 2026 15:57 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

github-actions Bot added a commit that referenced this pull request Jun 19, 2026

Update PR token usage data (PR #794)

3dd530f

github-actions Bot added a commit that referenced this pull request Jun 19, 2026

Update PR token usage data (PR #794)

6de3c3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cover MSTESTxxxx analyzer diagnostics in writing-mstest-tests skill#794

Cover MSTESTxxxx analyzer diagnostics in writing-mstest-tests skill#794
Evangelink wants to merge 6 commits into
mainfrom
improve/mstest-analyzer-rules

Evangelink commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Evangelink commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Evangelink commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

Evangelink commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Evangelink commented Jun 19, 2026

What

Why

Changes

Validation

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Skill Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Evangelink commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Skill Validation Results

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Evangelink commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Skill Validation Results

Uh oh!

Evangelink commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Skill Validation Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 19, 2026 •

edited

Loading