Skip to content

refactor(verify): narrow trigger to claim+evidence; drop read-only mode#28

Merged
bensonwong merged 2 commits into
mainfrom
refactor/verify-narrow-to-claim-evidence
Apr 17, 2026
Merged

refactor(verify): narrow trigger to claim+evidence; drop read-only mode#28
bensonwong merged 2 commits into
mainfrom
refactor/verify-narrow-to-claim-evidence

Conversation

@bensonwong

Copy link
Copy Markdown
Contributor

Summary

  • Skill was firing too broadly — description and §1 Orient included OCR, extract, summarize, read, parse as triggers, so /verify ran even when the user only wanted file contents
  • Dropped read-only mode from §1 — skill now requires both a claim and an evidence source; plain document reading uses prepare directly and answers normally
  • Removed Tool precedence section ("ALWAYS use this skill for PDFs…") — overclaim replaced with a single clear note: use prepare for reading, /verify for citing
  • Dropped §2 Read-only fast path (~13 lines) — no longer a concern of this skill
  • Triage table 8→6 rows — removed read-only row and the self-correction loop ("you prepared the claims file as evidence")
  • Collapsed Format 2 k≠claimText subsection (8 lines) into a 2-line gotcha note — concept still enforced by field table and STOP AND CHECK
  • Updated frontmatter description to verify-only scope
  • Net: 309 → 271 lines (−38)

Test plan

  • Prompt with only a PDF and "summarize this" — skill should NOT fire; model uses prepare and answers directly
  • Prompt with a claim + evidence doc — skill fires, full §1→§4 pipeline runs
  • /verify in prompt — skill fires regardless
  • HTML embed case — §2 triage routes to HTML annotation path correctly
  • Existing verified HTML re-run — triage routes to verify --html correctly

Skill was firing on any document mention (read, OCR, summarize, extract)
and running the full pipeline when the user only wanted file contents.

- Remove read-only mode from §1 Orient — skill now requires both a claim
  and an evidence source to run
- Drop Tool precedence section ("ALWAYS use this skill for PDFs...")
- Drop §2 Read-only fast path (13 lines)
- Remove read-only and self-correction-loop rows from §2 triage table
  (8 rows → 6)
- Collapse redundant Format 2 k≠claimText subsection into a 2-line gotcha
- Update frontmatter description to reflect verify-only scope
- Add explicit note: if only a document is provided, use prepare directly
  and answer normally; /verify only kicks in when there is something to cite

309 → 271 lines

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude

claude Bot commented Apr 16, 2026

Copy link
Copy Markdown

Review: refactor(verify): narrow trigger; drop read-only mode

Overall the refactor is sound. Narrowing the trigger is correct — the old description fired the skill on plain reads, burning steps unnecessarily. The changes are internally consistent.


What works well

Frontmatter description is now precise. The old version was a catch-all; the new one correctly scopes to verify/cite/fact-check.

citation-anchors.md reference block (lines 96-103) is now a structured when-to-consult list rather than an afterthought at the end of the Format 2 subsection. Clearer decision criteria for when to open that rules file — a genuine improvement.

Field table k note (line 166) captures both formats in one place. Cleaner than the old 8-line subsection, and STOP AND CHECK (line 208) reinforces it identically.

Triage table simplification is justified. The "you prepared claims as evidence" row was a self-correction path that the narrower trigger now prevents upstream.


Minor gaps worth addressing

1. Follow-up verification: no guidance on re-using an existing .txt

Line 15 introduces: "If the user first asks a question about a document, answer it using prepare. If they then ask you to verify that answer, run this skill..."

But there is no hint about whether to re-run prepare or reuse .deepcitation/name.txt if it still exists. The removed read-only fast path had this explicitly: "If the same document was used and .deepcitation/name.txt still exists on disk, no need to re-run prepare."

Worth preserving — either appended to line 15 or as a note in S2 under the file/URL row. Without it, an agent will re-run prepare unnecessarily on a doc it already processed.

2. Format 2 gotcha example (low priority)

"converts automatically" reads like a SAFE agreement. AGENTS.md favors generic, domain-neutral examples (invoices, leases, contracts, memos). A SAFE is a contract so it is technically fine, but something more self-evidently neutral — e.g. "expires automatically" from a lease — would better match the spirit of the invariant.

3. Test plan gap

No coverage for the follow-up scenario introduced on line 15 (user reads first, then verifies). Worth adding: prepare a doc, get an answer, then ask to verify — skill fires, prepare is skipped if .txt exists.


No issues with

  • Removal of the Tool precedence section ("ALWAYS use this skill") — overclaim, rightly gone
  • Preamble + prepare in same turn (unchanged, still correct)
  • HTML annotation path (untouched)
  • STOP AND CHECK (untouched)
  • Invariants section (untouched)

Net: approve. The re-prepare optimization is the one substantive thing worth adding before merge; the rest are suggestions.

- Add re-prepare optimization note to follow-up scenario (line 15):
  if .deepcitation/<name>.txt exists from a prior prepare run, skip §2
  and go straight to §3 — avoids redundant prepare on same doc [must-fix]
- Swap SAFE-flavored Format 2 gotcha example ("converts automatically")
  for a lease-neutral one ("renewed automatically" / "automatically renew")
  per AGENTS.md domain-neutral example invariant [nice-to-have]

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bensonwong

Copy link
Copy Markdown
Contributor Author

@claude review the latest changes

@bensonwong bensonwong merged commit 271b8bb into main Apr 17, 2026
@bensonwong bensonwong deleted the refactor/verify-narrow-to-claim-evidence branch April 17, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant