rework /verify around lavish + live citations; drop proprietary form specs#29
Conversation
Replace the bracket-token markers with plain prose throughout the
form-fill skill:
- SKILL.md / field-citation-mapping.md: physician-judgment fields now
state in plain language what the signing physician must confirm,
instead of appending a [Physician Judgment] token.
- manual-copy-output.md: drop [Missing] from the copy-block exclusion
list; undocumented fields are stated as prose.
- lint-form-draft.mjs: COPY_BLOCK_FORBIDDEN now detects the prose
conventions ('not documented in the available records',
'physician must confirm') rather than the old marker tokens; drop
the obsolete MISSING_OUTSIDE_COPY rule.
The markers read as distracting, unprofessional UI for the first
release; the verbose prose carries the same meaning to physicians.
The 15 medical form specs + the forms/INDEX.md registry are proprietary, jurisdiction-specific work product. They were moved to packages/internal-skills (non-OSS) and are the source of truth for formSpecs.generated.ts. The copies here were redundant and out of place in the MIT-licensed skill library. review-perspectives.md no longer points reviewers at a forms/medical/ path — the active form spec is provided to the workflow at runtime.
Settle the unused verify skill on a single model: lavish-axi owns the report shell and the annotate/poll/reply review loop; DeepCitation owns verification and the interactive citation UX, embedded via verify --html. No confidence scores, no invented verification surfaces (evidence table / status grid / discrepancy list), no hero framing, no runtime adversarial roster — status is DeepCitation's discrete verified/variance/unverified/pending badge. - SKILL.md: embed model, status table, two-lane review loop, invariants - rules/lavish-loop.md: styling+loop+verify --html embed, [data-citation-key] coexistence - rules/cloud-sandbox-constraints.md, parallel-generation.md: reconciled to verify --html - AGENTS.md: router collapsed (roster + playbooks entries removed) - delete playbooks/* and rules/runtime-roster.md (invented surfaces / roster) - docs/scenario-physician-chart-prep.md: manual test walkthrough
Local end-to-end testing of the pipeline (prepare → author → verify --html →
lavish) surfaced five factual errors and a model mismatch; fix all:
- verification model is BINARY (verified/unverified), anchored on sourceContext,
not the web-app's 4-state verified/variance/unverified/pending. Badge follows
the context sentence, not the key — so author must make sourceContext support
the claim and sourceMatch a verbatim substring. ambiguity.confidence is a
localization signal, never rendered, never surfaced.
- CITATION_DATA block must be a single JSON object grouped by attachmentId;
a flat list fails 'No valid CITATION_DATA block found'. Capture attachmentId
from prepare.
- prepare rejects plain .txt (PDF/image/Office/CSV-TSV/ODF/URL only).
- verify --html ignores --out and writes {stem}-verified.html; pass --local-only
to avoid auto-upload. Authored file is <topic>.html, artifact <topic>-verified.html.
- coexistence is a POST-embed sweep: add data-lavish-action to [data-citation-key]
(keys are hashed at verify time, can't pre-mark). Confirmed against artifact-sdk.js
that all three handlers (hover/select/click) honor it.
- scenario doc checks updated to binary + 'click is what verifies' caveat.
Test artifacts kept local under scratch/ (uncommitted).
|
test |
PR ReviewOverall this is a well-motivated rework. The division-of-labor model (DeepCitation owns verification + citation UX; lavish owns the report shell + review loop) is a cleaner architecture, and the status model corrections are accurate. A few things worth discussing before merging. Issues1. attachmentId capture -- not shown how to get it from stderr SKILL.md step 2 says the attachmentId is 'printed on stderr' but the code block only redirects stdout, so an agent cannot observe stderr during the run. Since the PR also says it is 'stored as attachmentId in the JSON', the simpler fix is to drop the 'printed on stderr' parenthetical and just say 'read attachmentId from the JSON.' Otherwise show the separate stderr redirect explicitly. The ambiguity could cause an agent to stall looking for something it cannot see. 2. auth.md narrowed verify out of the 'action needed' check -- is that intentional? Old: 'If prepare or verify output contains action needed...' verify --html hits the API too. If a session token expires between prepare and verify, that error path now has no documented recovery. If this cannot happen in practice (verify reuses an already-established auth context), a one-line rationale in auth.md would close the ambiguity. 3. Comprehensiveness guidance dropped without a replacement The old 'Comprehensiveness' section guarded against agents that answer the easy sub-question deeply and the hard one shallowly. Nothing in the new SKILL.md fills that gap. A brief note in step 3 or the invariants would suffice. 4. Per-citation SELF-CHECK removed The old SKILL.md had a 4-step in-flow self-check and a STOP AND CHECK gate before verify, both enforcing that k is a verbatim substring of f. The new skill documents the rule in the status model section but drops the in-flow reminder. A one-line checkpoint in step 3 or the invariants would prevent agents from discovering the constraint only when verify flags bad anchors. Smaller observations
Summary
|
…JSON, restore self-check - auth.md: 'action needed' recovery now explicitly covers verify, not just prepare (the session token can expire between prepare and verify — observed in live testing). - SKILL.md step 2: read attachmentId from the prepare JSON (the only redirect is stdout, so stderr isn't observable mid-run). - SKILL.md step 3: restore the in-flow per-citation self-check (k must be a verbatim substring of f) and the comprehensiveness reminder dropped in the rework.
|
Thanks — addressed all four findings in
Smaller observations (form-fill ghost-routing, multi-line perl edge case) noted; both are acceptable as-is per your assessment. |
Summary
Two related cleanups to the public skills, landed on one branch:
1.
/verifyreworked around lavish styling + DeepCitation live citationsThe skill now produces a clean lavish-styled HTML report whose citations are DeepCitation's own interactive citations (click → matched phrase, evidence keyhole, page view), opened in lavish-axi for an annotate → poll → reply review loop. DeepCitation owns verification + the citation UX; lavish owns the report shell + review loop.
Corrected end-to-end against the live
verify --htmlCLI (not assumptions):verified/unverified, anchored on thesourceContextsentence. Novariance/pending/isVerbatimand no confidence score in the CLI embed. The internalambiguity.confidenceis a localization signal and is never surfaced.attachmentId(a flat list is rejected with "No valid CITATION_DATA block found").verify --htmlwrites{stem}-verified.html(ignores--out);--local-onlykeeps it off "My Verifications".data-lavish-actionsweep over[data-citation-key]so citation clicks reach DeepCitation's popover while prose stays commentable (verified againstartifact-sdk.js: all three lavish handlers bail onisLavishAction).prepareinputs documented (PDF / image / Office / CSV-TSV / URL —.txtrejected).docs/scenario-physician-chart-prep.md) as a manual acceptance script.2. Remove proprietary content from the MIT submodule
Drops the
[Judgement]/[Missing]marker teaching plus the medical form specs, rules, scripts, and source PDFs that don't belong in the open-source skills repo.Test plan
/verifypipeline run end-to-end on a real OCR'd chart: auth →prepare→ author →verify --html(553 KB interactive report) → popover inspection → lavish coexistence. Every documented behavior is observed, not assumed.-verified.htmlnaming,data-lavish-actionexclusion).