rework /verify around lavish + live citations; drop proprietary form specs by bensonwong · Pull Request #29 · DeepCitation/skills

bensonwong · 2026-06-23T21:42:32Z

Summary

Two related cleanups to the public skills, landed on one branch:

1. `/verify` reworked around lavish styling + DeepCitation live citations

The skill now produces a clean lavish-styled HTML report whose citations are DeepCitation's own interactive citations (click → matched phrase, evidence keyhole, page view), opened in lavish-axi for an annotate → poll → reply review loop. DeepCitation owns verification + the citation UX; lavish owns the report shell + review loop.

Corrected end-to-end against the live verify --html CLI (not assumptions):

Binary status model — verified / unverified, anchored on the sourceContext sentence. No variance/pending/isVerbatim and no confidence score in the CLI embed. The internal ambiguity.confidence is a localization signal and is never surfaced.
CITATION_DATA must be a single JSON object grouped by attachmentId (a flat list is rejected with "No valid CITATION_DATA block found").
verify --html writes {stem}-verified.html (ignores --out); --local-only keeps it off "My Verifications".
Coexistence — a post-embed data-lavish-action sweep over [data-citation-key] so citation clicks reach DeepCitation's popover while prose stays commentable (verified against artifact-sdk.js: all three lavish handlers bail on isLavishAction).
Accepted prepare inputs documented (PDF / image / Office / CSV-TSV / URL — .txt rejected).
Adds a physician chart-prep test scenario (docs/scenario-physician-chart-prep.md) as a manual acceptance script.

2. Remove proprietary content from the MIT submodule

Drops the [Judgement]/[Missing] marker teaching plus the medical form specs, rules, scripts, and source PDFs that don't belong in the open-source skills repo.

Test plan

/verify pipeline run end-to-end on a real OCR'd chart: auth → prepare → author → verify --html (553 KB interactive report) → popover inspection → lavish coexistence. Every documented behavior is observed, not assumed.
Each correction in the skill is backed by an observed CLI result (binary status, attachmentId grouping, -verified.html naming, data-lavish-action exclusion).

Replace the bracket-token markers with plain prose throughout the form-fill skill: - SKILL.md / field-citation-mapping.md: physician-judgment fields now state in plain language what the signing physician must confirm, instead of appending a [Physician Judgment] token. - manual-copy-output.md: drop [Missing] from the copy-block exclusion list; undocumented fields are stated as prose. - lint-form-draft.mjs: COPY_BLOCK_FORBIDDEN now detects the prose conventions ('not documented in the available records', 'physician must confirm') rather than the old marker tokens; drop the obsolete MISSING_OUTSIDE_COPY rule. The markers read as distracting, unprofessional UI for the first release; the verbose prose carries the same meaning to physicians.

The 15 medical form specs + the forms/INDEX.md registry are proprietary, jurisdiction-specific work product. They were moved to packages/internal-skills (non-OSS) and are the source of truth for formSpecs.generated.ts. The copies here were redundant and out of place in the MIT-licensed skill library. review-perspectives.md no longer points reviewers at a forms/medical/ path — the active form spec is provided to the workflow at runtime.

Settle the unused verify skill on a single model: lavish-axi owns the report shell and the annotate/poll/reply review loop; DeepCitation owns verification and the interactive citation UX, embedded via verify --html. No confidence scores, no invented verification surfaces (evidence table / status grid / discrepancy list), no hero framing, no runtime adversarial roster — status is DeepCitation's discrete verified/variance/unverified/pending badge. - SKILL.md: embed model, status table, two-lane review loop, invariants - rules/lavish-loop.md: styling+loop+verify --html embed, [data-citation-key] coexistence - rules/cloud-sandbox-constraints.md, parallel-generation.md: reconciled to verify --html - AGENTS.md: router collapsed (roster + playbooks entries removed) - delete playbooks/* and rules/runtime-roster.md (invented surfaces / roster) - docs/scenario-physician-chart-prep.md: manual test walkthrough

Local end-to-end testing of the pipeline (prepare → author → verify --html → lavish) surfaced five factual errors and a model mismatch; fix all: - verification model is BINARY (verified/unverified), anchored on sourceContext, not the web-app's 4-state verified/variance/unverified/pending. Badge follows the context sentence, not the key — so author must make sourceContext support the claim and sourceMatch a verbatim substring. ambiguity.confidence is a localization signal, never rendered, never surfaced. - CITATION_DATA block must be a single JSON object grouped by attachmentId; a flat list fails 'No valid CITATION_DATA block found'. Capture attachmentId from prepare. - prepare rejects plain .txt (PDF/image/Office/CSV-TSV/ODF/URL only). - verify --html ignores --out and writes {stem}-verified.html; pass --local-only to avoid auto-upload. Authored file is <topic>.html, artifact <topic>-verified.html. - coexistence is a POST-embed sweep: add data-lavish-action to [data-citation-key] (keys are hashed at verify time, can't pre-mark). Confirmed against artifact-sdk.js that all three handlers (hover/select/click) honor it. - scenario doc checks updated to binary + 'click is what verifies' caveat. Test artifacts kept local under scratch/ (uncommitted).

claude · 2026-06-23T21:48:55Z

test

claude · 2026-06-23T21:50:06Z

PR Review

Overall this is a well-motivated rework. The division-of-labor model (DeepCitation owns verification + citation UX; lavish owns the report shell + review loop) is a cleaner architecture, and the status model corrections are accurate. A few things worth discussing before merging.

Issues

1. attachmentId capture -- not shown how to get it from stderr

SKILL.md step 2 says the attachmentId is 'printed on stderr' but the code block only redirects stdout, so an agent cannot observe stderr during the run. Since the PR also says it is 'stored as attachmentId in the JSON', the simpler fix is to drop the 'printed on stderr' parenthetical and just say 'read attachmentId from the JSON.' Otherwise show the separate stderr redirect explicitly. The ambiguity could cause an agent to stall looking for something it cannot see.

2. auth.md narrowed verify out of the 'action needed' check -- is that intentional?

Old: 'If prepare or verify output contains action needed...'
New: 'If prepare output contains action needed...'

verify --html hits the API too. If a session token expires between prepare and verify, that error path now has no documented recovery. If this cannot happen in practice (verify reuses an already-established auth context), a one-line rationale in auth.md would close the ambiguity.

3. Comprehensiveness guidance dropped without a replacement

The old 'Comprehensiveness' section guarded against agents that answer the easy sub-question deeply and the hard one shallowly. Nothing in the new SKILL.md fills that gap. A brief note in step 3 or the invariants would suffice.

4. Per-citation SELF-CHECK removed

The old SKILL.md had a 4-step in-flow self-check and a STOP AND CHECK gate before verify, both enforcing that k is a verbatim substring of f. The new skill documents the rule in the status model section but drops the in-flow reminder. A one-line checkpoint in step 3 or the invariants would prevent agents from discovering the constraint only when verify flags bad anchors.

Smaller observations

verify --html ignoring --out -- correctly documented in SKILL.md step 4 and cloud-sandbox-constraints. Good catch.
2-redirect prohibition on prepare -- explicitly noted and explained. Correct.
perl sweep for data-lavish-action -- regex looks right for single-line attributes. Multi-line edge case is unlikely from verify --html output; acceptable as-is.
Scenario doc -- medical examples are appropriate in a test file; the disclaimer is correctly placed; acceptance checks are specific and testable. LGTM.
Form-fill removal -- clean. Worth confirming no downstream host caches the old skill list in a way that ghost-routes /form-fill prompts post-merge.
--local-only -- consistently documented across SKILL.md, cloud-sandbox-constraints, and lavish-loop. Good.

Summary

Area	Status
Status model (binary verified/unverified)	Correct
Output path (stem-verified.html, ignores --out)	Correct
lavish/DeepCitation division of labor	Clean
data-lavish-action coexistence sweep	Well-documented
prepare stderr / attachmentId capture	Needs clarification
auth.md scope narrowing	Needs rationale or reversion
Comprehensiveness guidance	Lost in revision
Per-citation SELF-CHECK	Lost in revision
form-fill removal	Appropriate
Scenario test doc	Correctly scoped

…JSON, restore self-check - auth.md: 'action needed' recovery now explicitly covers verify, not just prepare (the session token can expire between prepare and verify — observed in live testing). - SKILL.md step 2: read attachmentId from the prepare JSON (the only redirect is stdout, so stderr isn't observable mid-run). - SKILL.md step 3: restore the in-flow per-citation self-check (k must be a verbatim substring of f) and the comprehensiveness reminder dropped in the rework.

bensonwong · 2026-06-23T23:47:03Z

Thanks — addressed all four findings in 5acae36:

attachmentId capture — step 2 now reads attachmentId from the prepare JSON (attachmentId field) instead of stderr, since the run only redirects stdout and stderr isn't observable mid-run.
auth scope — reverted the narrowing. auth.md now applies the "action needed" recovery to prepare or verify, with a one-line rationale: the session token can expire between the two (both hit the API). This wasn't hypothetical — it happened during live end-to-end testing of the skill.
Comprehensiveness — restored as an in-flow line in step 3: "answer the hard part as fully as the easy part; a deep answer to the easy half is a failure."
Per-citation self-check — restored in step 3: find the verbatim f first, derive k as a word-for-word substring of f, fix f first if it isn't — don't wait for verify to flag a bad anchor.

Smaller observations (form-fill ghost-routing, multi-line perl edge case) noted; both are acceptable as-is per your assessment.

bensonwong added 14 commits May 16, 2026 23:38

chore: update disability package asset

54ff5a3

chore: remove moved form-fill bundle

b878ff6

docs(verify): add lavish review loop

18774f0

docs(verify): clarify lavish polling contract

6af6d78

docs(verify): refine runtime roster tiers

3c582fa

docs(verify): align annotated report playbook

5ae56f4

docs(verify): align evidence table playbook

958d0e6

docs(verify): align discrepancy and sandbox guidance

ec2905a

docs(verify): narrow auth recovery to prepare

112702b

docs(verify): route lavish workflow guidance

2ee5028

bensonwong merged commit d5605db into main Jun 23, 2026

bensonwong deleted the feat/judgement-marker branch June 23, 2026 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rework /verify around lavish + live citations; drop proprietary form specs#29

rework /verify around lavish + live citations; drop proprietary form specs#29
bensonwong merged 15 commits into
mainfrom
feat/judgement-marker

bensonwong commented Jun 23, 2026

Uh oh!

claude Bot commented Jun 23, 2026

Uh oh!

claude Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

bensonwong commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

bensonwong commented Jun 23, 2026

Summary

1. /verify reworked around lavish styling + DeepCitation live citations

2. Remove proprietary content from the MIT submodule

Test plan

Uh oh!

claude Bot commented Jun 23, 2026

Uh oh!

claude Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Issues

Smaller observations

Summary

Uh oh!

bensonwong commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `/verify` reworked around lavish styling + DeepCitation live citations

claude Bot commented Jun 23, 2026 •

edited

Loading