[codex] Refine verify skill instructions by bensonwong · Pull Request #27 · DeepCitation/skills

bensonwong · 2026-04-16T17:03:32Z

What changed

Tightened the skills/verify/SKILL.md guidance around claim/evidence triage for HTML and other claim-bearing inputs.
Simplified the prepare example command and removed the stale --model flag from the verify example.
Cleaned up a few wording issues in the instruction text.

Why

The skill should reflect the current CLI surface and keep the HTML embed path explicit.
The wording changes make the guidance easier to follow for agents using the verification workflow.

Validation

Reviewed the modified skills/verify/SKILL.md diff locally.
Confirmed the branch is pushed cleanly to origin.

…es file - New §3 "HTML annotation path": guides agents to annotate a source HTML file with data-cite="N" attributes and append <<<CITATION_DATA>>> after </html>, then run verify --html to preserve original HTML structure - New §2 triage row for "embed citations into static HTML" case; narrows the existing "Existing verified HTML" row to CLI-prior-run only - §4 --html note updated to cover both the embed-into and re-verify cases - Parallel generation guidance extracted to rules/parallel-generation.md; SKILL.md now defers to it for 100+ page / 3+ file scenarios - AGENTS.md guidance router updated with parallel-generation.md entry - Removed cloud-sandbox probe block, proxy invariants, and tool alternatives list (moved to their respective rules files) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-04-16T17:18:58Z

PR Review

The restructuring intent is good - pulling parallel-generation detail into its own rules file reduces cognitive load for single-document tasks. A few issues need attention before merging.

Bugs / Inconsistencies

1. Flag mismatch: short vs long markdown output flag

SKILL.md section 4 now uses the short form of the markdown output flag, but parallel-generation.md still uses the long form. One will break at runtime. Reconcile them across both files.

2. Model flag removed in SKILL.md but kept in parallel-generation.md

The main verify command in SKILL.md dropped the model flag, matching the PR description. But the merge+verify command in parallel-generation.md still includes it. Either remove it from the rules file too, or restore it to SKILL.md. The current state is contradictory.

3. Parallel-generation trigger condition diverges between files

SKILL.md states the trigger as "100+ pages AND 3+ distinct files". parallel-generation.md states it as "100+ pages AND 2+ distinct topics". Files vs. topics, and 3 vs. 2 - agents reading only SKILL.md will apply the wrong threshold. Reconcile these or have SKILL.md defer entirely to the rules file.

Content Regressions

4. Explicit alternative-tool list removed

The old block named pdfplumber, PyPDF2, Tesseract, libreoffice, curl/wget, etc. by name. The replacement ("if you are tempted to reach for a generic read tool...") is weaker - agents pattern-match on concrete names. Consider keeping the list collapsed or moving it to a rules/tool-precedence.md file the router points to.

5. Proxy and timeout hard rules silently dropped

Two HARD RULES removed from the Invariants section with no replacement anywhere:

Never modify proxy environment variables on individual command runs
Never extend command timeouts via shell wrappers (no backgrounding, no timeout 600 npx)

These are not in cloud-sandbox-constraints.md either. If moving them there, do so explicitly.

6. Cloud sandbox probing logic removed without a redirect

SKILL.md used to tell agents exactly when/how to probe for sandbox markers and explicitly warned not to gate solely on CLAUDE_CODE_REMOTE. That probing logic is gone, and the AGENTS.md router only routes to cloud-sandbox-constraints.md when an agent is already working on sandbox behavior. A first-time agent will not know to load the file. Add at minimum a note to probe for sandbox markers before the first deepcitation command.

7. prepare text flag change unexplained

The prepare example dropped the text flag. If it was removed from the CLI, a brief note confirming this is intentional would help reviewers.

Minor Notes

8. verify html example now includes claim flag

The new html example adds a claim flag which was absent before. If this is now required, clarify when to omit it (e.g. re-running existing verified HTML).

9. Results summary format removed without replacement

The checkmark/warning/X summary line gave users a scannable result at a glance. The new Step 4 closure is less structured. If leaving format to the agent, say so explicitly.

What works well

Extracting parallel-generation into its own rules file is the right call; SKILL.md is much easier to scan now.
The HTML annotation path in section 3 is clearly written with concrete examples.
Triage table improvements (explicit HTML embed row, clarified "prior run" wording) are good.
AGENTS.md router correctly points to the new file.

Summary: The structural refactor is sound, but items 1-3 are bugs producing broken CLI commands or wrong agent behavior. Items 4-6 remove safety guardrails. Fix 1-3 before merging; decide whether 4-6 need explicit re-anchoring in the rules tree.

bensonwong and others added 2 commits April 15, 2026 19:46

Refine verify skill instructions

edea977

Align parallel generation rules

e4f4970

bensonwong marked this pull request as ready for review April 16, 2026 17:24

bensonwong merged commit 92f0090 into main Apr 16, 2026

bensonwong deleted the fix/verify-skill-instructions branch April 16, 2026 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] Refine verify skill instructions#27

[codex] Refine verify skill instructions#27
bensonwong merged 3 commits into
mainfrom
fix/verify-skill-instructions

bensonwong commented Apr 16, 2026

Uh oh!

claude Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

bensonwong commented Apr 16, 2026

What changed

Why

Validation

Uh oh!

claude Bot commented Apr 16, 2026

PR Review

Bugs / Inconsistencies

Content Regressions

Minor Notes

What works well

Summary: The structural refactor is sound, but items 1-3 are bugs producing broken CLI commands or wrong agent behavior. Items 4-6 remove safety guardrails. Fix 1-3 before merging; decide whether 4-6 need explicit re-anchoring in the rules tree.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant