feat: robust YAML handling + from-the-web benchmark (real published skills) by DCCA · Pull Request #43 · DCCA/skval

DCCA · 2026-06-23T01:52:06Z

Goal

Find bad skills published on the web, run them through skval, and show the upgrade skval makes.

Safety

Every fetched skill was treated as inert text and run through skval's D6 safety gate before anything else — 0 / 69 tripped it, none were executed (only SKILL.md text was read and statically analysed). Symlink/pointer entries (raw file = a path) were excluded as fetch artifacts, not real skills.

What the run found

Scored 69 community skills from 3 public collections (alirezarezvani, glebis, smartnews). Most are clean. Genuine classes: invalid YAML frontmatter (unquoted colon in description), <> in description, oversized SKILL.md.

The skval upgrade (driven by a real skill)

The invalid-YAML class used to crash skval — parse_frontmatter caught ValueError but not yaml.YAMLError. Now it converts the error and reports frontmatter_valid_yaml: False, scoring the skill (50/D/Revise) instead of throwing. Regression test: test_invalid_yaml_frontmatter_scores_not_crashes.

Benchmark + case study

docs/examples/skill-benchmark/from-the-web.md — the run, the safety note, the findings, the upgrade.
improved/web-frontmatter.* — 50 → 100 case study on the invalid-YAML class (fix: quote the description). skval compare.py: overall_delta +50, Revise → Ship.
Kept in docs/examples (not the landing page); source repos linked/credited.

Validation

163 tests pass (+1); self-validation 100/A/Ship; ruff clean. Test count 162 → 163.

🤖 Generated with Claude Code

…b" benchmark Searched public GitHub skill collections, fetched 69 community SKILL.md files, and ran them through skval (each safety-scanned as inert text first — 0 tripped the gate, none executed). The run surfaced a real bug class — invalid YAML frontmatter (an unquoted colon in `description`) — that *crashed* skval: parse_frontmatter caught ValueError but not yaml.YAMLError. Fix: convert yaml.YAMLError -> ValueError so the scan reports `frontmatter_valid_yaml: False` and scores the skill (50/D/Revise) instead of throwing. Regression test in tests/test_precision.py. Benchmark (docs/examples/skill-benchmark): - from-the-web.md — the real run over 69 web skills (3 public repos), the safety note, the genuine finding classes, and the skval upgrade. - improved/web-frontmatter.* — a 50 -> 100 case study on the invalid-YAML class (skval's fix: quote the description), with the compare.py diff (+50, Revise -> Ship). - README links the new tier. Test count 162 -> 163. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

DCCA merged commit c73ffc4 into main Jun 23, 2026
1 check passed

DCCA deleted the feat/robust-yaml-web-benchmark branch June 23, 2026 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: robust YAML handling + from-the-web benchmark (real published skills)#43

feat: robust YAML handling + from-the-web benchmark (real published skills)#43
DCCA merged 1 commit into
mainfrom
feat/robust-yaml-web-benchmark

DCCA commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DCCA commented Jun 23, 2026

Goal

Safety

What the run found

The skval upgrade (driven by a real skill)

Benchmark + case study

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants