Skip to content

[codex] Refine verify skill instructions#27

Merged
bensonwong merged 3 commits into
mainfrom
fix/verify-skill-instructions
Apr 16, 2026
Merged

[codex] Refine verify skill instructions#27
bensonwong merged 3 commits into
mainfrom
fix/verify-skill-instructions

Conversation

@bensonwong

Copy link
Copy Markdown
Contributor

What changed

  • Tightened the skills/verify/SKILL.md guidance around claim/evidence triage for HTML and other claim-bearing inputs.
  • Simplified the prepare example command and removed the stale --model flag from the verify example.
  • Cleaned up a few wording issues in the instruction text.

Why

  • The skill should reflect the current CLI surface and keep the HTML embed path explicit.
  • The wording changes make the guidance easier to follow for agents using the verification workflow.

Validation

  • Reviewed the modified skills/verify/SKILL.md diff locally.
  • Confirmed the branch is pushed cleanly to origin.

bensonwong and others added 2 commits April 15, 2026 19:46
…es file

- New §3 "HTML annotation path": guides agents to annotate a source HTML
  file with data-cite="N" attributes and append <<<CITATION_DATA>>> after
  </html>, then run verify --html to preserve original HTML structure
- New §2 triage row for "embed citations into static HTML" case;
  narrows the existing "Existing verified HTML" row to CLI-prior-run only
- §4 --html note updated to cover both the embed-into and re-verify cases
- Parallel generation guidance extracted to rules/parallel-generation.md;
  SKILL.md now defers to it for 100+ page / 3+ file scenarios
- AGENTS.md guidance router updated with parallel-generation.md entry
- Removed cloud-sandbox probe block, proxy invariants, and tool
  alternatives list (moved to their respective rules files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude

claude Bot commented Apr 16, 2026

Copy link
Copy Markdown

PR Review

The restructuring intent is good - pulling parallel-generation detail into its own rules file reduces cognitive load for single-document tasks. A few issues need attention before merging.

Bugs / Inconsistencies

1. Flag mismatch: short vs long markdown output flag

SKILL.md section 4 now uses the short form of the markdown output flag, but parallel-generation.md still uses the long form. One will break at runtime. Reconcile them across both files.

2. Model flag removed in SKILL.md but kept in parallel-generation.md

The main verify command in SKILL.md dropped the model flag, matching the PR description. But the merge+verify command in parallel-generation.md still includes it. Either remove it from the rules file too, or restore it to SKILL.md. The current state is contradictory.

3. Parallel-generation trigger condition diverges between files

SKILL.md states the trigger as "100+ pages AND 3+ distinct files". parallel-generation.md states it as "100+ pages AND 2+ distinct topics". Files vs. topics, and 3 vs. 2 - agents reading only SKILL.md will apply the wrong threshold. Reconcile these or have SKILL.md defer entirely to the rules file.

Content Regressions

4. Explicit alternative-tool list removed

The old block named pdfplumber, PyPDF2, Tesseract, libreoffice, curl/wget, etc. by name. The replacement ("if you are tempted to reach for a generic read tool...") is weaker - agents pattern-match on concrete names. Consider keeping the list collapsed or moving it to a rules/tool-precedence.md file the router points to.

5. Proxy and timeout hard rules silently dropped

Two HARD RULES removed from the Invariants section with no replacement anywhere:

  • Never modify proxy environment variables on individual command runs
  • Never extend command timeouts via shell wrappers (no backgrounding, no timeout 600 npx)

These are not in cloud-sandbox-constraints.md either. If moving them there, do so explicitly.

6. Cloud sandbox probing logic removed without a redirect

SKILL.md used to tell agents exactly when/how to probe for sandbox markers and explicitly warned not to gate solely on CLAUDE_CODE_REMOTE. That probing logic is gone, and the AGENTS.md router only routes to cloud-sandbox-constraints.md when an agent is already working on sandbox behavior. A first-time agent will not know to load the file. Add at minimum a note to probe for sandbox markers before the first deepcitation command.

7. prepare text flag change unexplained

The prepare example dropped the text flag. If it was removed from the CLI, a brief note confirming this is intentional would help reviewers.

Minor Notes

8. verify html example now includes claim flag

The new html example adds a claim flag which was absent before. If this is now required, clarify when to omit it (e.g. re-running existing verified HTML).

9. Results summary format removed without replacement

The checkmark/warning/X summary line gave users a scannable result at a glance. The new Step 4 closure is less structured. If leaving format to the agent, say so explicitly.

What works well

  • Extracting parallel-generation into its own rules file is the right call; SKILL.md is much easier to scan now.
  • The HTML annotation path in section 3 is clearly written with concrete examples.
  • Triage table improvements (explicit HTML embed row, clarified "prior run" wording) are good.
  • AGENTS.md router correctly points to the new file.

Summary: The structural refactor is sound, but items 1-3 are bugs producing broken CLI commands or wrong agent behavior. Items 4-6 remove safety guardrails. Fix 1-3 before merging; decide whether 4-6 need explicit re-anchoring in the rules tree.

@bensonwong bensonwong marked this pull request as ready for review April 16, 2026 17:24
@bensonwong bensonwong merged commit 92f0090 into main Apr 16, 2026
@bensonwong bensonwong deleted the fix/verify-skill-instructions branch April 16, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant