Skip to content

feat: robust YAML handling + from-the-web benchmark (real published skills)#43

Merged
DCCA merged 1 commit into
mainfrom
feat/robust-yaml-web-benchmark
Jun 23, 2026
Merged

feat: robust YAML handling + from-the-web benchmark (real published skills)#43
DCCA merged 1 commit into
mainfrom
feat/robust-yaml-web-benchmark

Conversation

@DCCA

@DCCA DCCA commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Goal

Find bad skills published on the web, run them through skval, and show the upgrade skval makes.

Safety

Every fetched skill was treated as inert text and run through skval's D6 safety gate before anything else0 / 69 tripped it, none were executed (only SKILL.md text was read and statically analysed). Symlink/pointer entries (raw file = a path) were excluded as fetch artifacts, not real skills.

What the run found

Scored 69 community skills from 3 public collections (alirezarezvani, glebis, smartnews). Most are clean. Genuine classes: invalid YAML frontmatter (unquoted colon in description), <> in description, oversized SKILL.md.

The skval upgrade (driven by a real skill)

The invalid-YAML class used to crash skvalparse_frontmatter caught ValueError but not yaml.YAMLError. Now it converts the error and reports frontmatter_valid_yaml: False, scoring the skill (50/D/Revise) instead of throwing. Regression test: test_invalid_yaml_frontmatter_scores_not_crashes.

Benchmark + case study

  • docs/examples/skill-benchmark/from-the-web.md — the run, the safety note, the findings, the upgrade.
  • improved/web-frontmatter.*50 → 100 case study on the invalid-YAML class (fix: quote the description). skval compare.py: overall_delta +50, Revise → Ship.
  • Kept in docs/examples (not the landing page); source repos linked/credited.

Validation

163 tests pass (+1); self-validation 100/A/Ship; ruff clean. Test count 162 → 163.

🤖 Generated with Claude Code

…b" benchmark

Searched public GitHub skill collections, fetched 69 community SKILL.md files,
and ran them through skval (each safety-scanned as inert text first — 0 tripped
the gate, none executed). The run surfaced a real bug class — invalid YAML
frontmatter (an unquoted colon in `description`) — that *crashed* skval:
parse_frontmatter caught ValueError but not yaml.YAMLError.

Fix: convert yaml.YAMLError -> ValueError so the scan reports
`frontmatter_valid_yaml: False` and scores the skill (50/D/Revise) instead of
throwing. Regression test in tests/test_precision.py.

Benchmark (docs/examples/skill-benchmark):
- from-the-web.md — the real run over 69 web skills (3 public repos), the
  safety note, the genuine finding classes, and the skval upgrade.
- improved/web-frontmatter.* — a 50 -> 100 case study on the invalid-YAML class
  (skval's fix: quote the description), with the compare.py diff (+50,
  Revise -> Ship).
- README links the new tier. Test count 162 -> 163.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@DCCA DCCA merged commit c73ffc4 into main Jun 23, 2026
1 check passed
@DCCA DCCA deleted the feat/robust-yaml-web-benchmark branch June 23, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants