feat: robust YAML handling + from-the-web benchmark (real published skills)#43
Merged
Conversation
…b" benchmark Searched public GitHub skill collections, fetched 69 community SKILL.md files, and ran them through skval (each safety-scanned as inert text first — 0 tripped the gate, none executed). The run surfaced a real bug class — invalid YAML frontmatter (an unquoted colon in `description`) — that *crashed* skval: parse_frontmatter caught ValueError but not yaml.YAMLError. Fix: convert yaml.YAMLError -> ValueError so the scan reports `frontmatter_valid_yaml: False` and scores the skill (50/D/Revise) instead of throwing. Regression test in tests/test_precision.py. Benchmark (docs/examples/skill-benchmark): - from-the-web.md — the real run over 69 web skills (3 public repos), the safety note, the genuine finding classes, and the skval upgrade. - improved/web-frontmatter.* — a 50 -> 100 case study on the invalid-YAML class (skval's fix: quote the description), with the compare.py diff (+50, Revise -> Ship). - README links the new tier. Test count 162 -> 163. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
Find bad skills published on the web, run them through skval, and show the upgrade skval makes.
Safety
Every fetched skill was treated as inert text and run through skval's D6 safety gate before anything else — 0 / 69 tripped it, none were executed (only
SKILL.mdtext was read and statically analysed). Symlink/pointer entries (raw file = a path) were excluded as fetch artifacts, not real skills.What the run found
Scored 69 community skills from 3 public collections (alirezarezvani, glebis, smartnews). Most are clean. Genuine classes: invalid YAML frontmatter (unquoted colon in
description),<>in description, oversized SKILL.md.The skval upgrade (driven by a real skill)
The invalid-YAML class used to crash skval —
parse_frontmattercaughtValueErrorbut notyaml.YAMLError. Now it converts the error and reportsfrontmatter_valid_yaml: False, scoring the skill (50/D/Revise) instead of throwing. Regression test:test_invalid_yaml_frontmatter_scores_not_crashes.Benchmark + case study
docs/examples/skill-benchmark/from-the-web.md— the run, the safety note, the findings, the upgrade.improved/web-frontmatter.*— 50 → 100 case study on the invalid-YAML class (fix: quote the description). skvalcompare.py:overall_delta +50,Revise → Ship.docs/examples(not the landing page); source repos linked/credited.Validation
163 tests pass (+1); self-validation 100/A/Ship; ruff clean. Test count 162 → 163.
🤖 Generated with Claude Code