refactor(verify): proxy hard rule, retry guard, multi-signal sandbox detection, Format 1 citations, page/line tagging#24
Conversation
…detection, Format 1 citations, and page/line tagging Proxy rule (BREAKING reversal): - Remove the old "clear stale localhost proxy" snippet — that code was actively harmful; the Cowork proxy vars are the session's network lifeline, not noise to filter. - Add explicit hard rule: NEVER unset/override HTTP_PROXY or HTTPS_PROXY. Retry / fallback hard rules: - No retry spirals: if prepare/verify exits non-zero or is killed, stop and report stderr — don't background with nohup, wrap in sleep/timeout, swap --out↔--text, or shrink input. - No direct-read fallback: cannot substitute Read/urllib/web-fetch for a failed prepare and present the result as verified. - One exception: prepare emitting only the 2-line banner → retry once. Sandbox detection (multi-signal): - Probe on ANY of: $CLAUDE_CODE_REMOTE=="true", HTTP_PROXY containing localhost:3128, whoami returning adjective-color-name, or a prior bash call killed at ~45s. Do NOT gate on $CLAUDE_CODE_REMOTE alone. - cloud-sandbox-constraints.md: update detection header to match. Format 1 only for verifiable citations: - Remove Format 2 guidance from sub-agent prompts — Format 2 was proven broken in alignment iter4 (55 citations, 0 verified). - Add [N] adjacency hard rule: [N] must appear immediately after **k**, never at clause end. Page/line tagging for split-agent evidence files: - Replace bare page-range slice with render_chunk() that wraps each page in <page_number_N_index_I> tags and tags lines at idx 0, last, and every 5th — matching the CLI renderer. Subagents copy tags verbatim into p/l fields; without tags they confabulate page_ids from global line offsets, producing citations that 404 in the viewer. - Add validation: grep -c '<page_number_' before dispatching agents. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR ReviewOverall this is a solid, well-motivated set of changes. The proxy reversal and multi-signal sandbox detection are clearly backed by operational evidence, and the Format 1–only pivot is justified by the empirical data cited (iter4: 55 citations, 0 verified). A few issues worth addressing before merge: Bugs / correctness1. Missing Path(".deepcitation/evidence-a.txt").write_text(render_chunk(agent_a_pages))
2. Empty-line stripping may break raw_lines = [l for l in page_text.split("\n") if l.strip()]Stripping blank lines before assigning 3. Validation only checks Agent A's file is never validated. Check both files, or note explicitly that the same check applies to Guidance gap introduced by Format 2 removalThe old "Format 2 when bold ≠ k — HARD RULE" explicitly covered terms that aren't verbatim in the source (acronym expansions like The new CoT gate handles it indirectly: "if your planned key phrase doesn't appear word-for-word in Consider adding a one-liner: e.g., "If no short verbatim phrase in Minor nitsStep 1 preamble: parallelism urgency removed The deleted latency table and "fire
The BAD/GOOD example is clear. Worth adding that prose may continue after What's working well
🤖 Generated with Claude Code |
…al validation, non-verbatim k, prepare parallelism urgency, [N] continuation note - Fix: add `from pathlib import Path` to render_chunk() snippet (NameError on copy-paste) - Fix: remove `if l.strip()` blank-line filter from raw_lines — count all lines so idx+1 matches the CLI's 1-based line ids - Fix: extend pre-dispatch validation to both evidence files (was only checking evidence-b.txt) - Fix: add guidance for non-verbatim k — "bold the closest literal term that does appear word-for-word" when no short phrase in f can serve as k - Improve: restore the "emit preamble and call prepare in the same assistant turn" directive dropped from Step 1 when the latency table was removed - Improve: note in [N] adjacency example that prose may continue after [N] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@claude review the latest changes |
|
The PR diff is unchanged since my previous review above. Let me re-examine with fresh eyes to confirm what still applies. Still open:
Previously raised, now resolved on re-read:
No new issues found. Fix the two items above and this is good to merge. 🤖 Generated with Claude Code |
`grep -c '<page_number_'` matched both opening and closing page tags, returning 2×chunk_size instead of chunk_size. Use `grep -cP '^<page_number_'` (anchored to line start) to count only opening tags — closing tags start with `</` and are excluded by the anchor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review update — latest changesChecking against the two open items from the previous review: Both are resolved in the current diff:
One minor note
No blocking issues. Good to merge. 🤖 Generated with Claude Code |
…hunk Trailing \n on each part + \n from join produced two blank lines between page blocks. Drop the trailing \n from parts.append so join supplies the single separator. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
unset/overrideHTTP_PROXYorHTTPS_PROXY. The CLI auto-detects the proxy correctly; touching these variables breaks it.prepare/verifyexits non-zero or is killed, stop and report stderr — no backgrounding withnohup, nosleep/timeoutwrappers, no--out↔--textswaps. No direct-read fallback (Read/urllib/web-fetch cannot substitute for a failedprepare). One exception: the 2-line banner-only truncation → retry once.$CLAUDE_CODE_REMOTE=="true",HTTP_PROXYcontaininglocalhost:3128,whoamireturning an adjective-color-name, or a prior bash call killed at ~45 s. Do NOT gate on$CLAUDE_CODE_REMOTEalone — it is not reliably set in every Cowork session. Updatedcloud-sandbox-constraints.mdto match.[N]adjacency hard rule:[N]must appear immediately after**k**, never at clause end.render_chunk()that wraps each page in<page_number_N_index_I>tags and tags lines at index 0, last, and every 5th (matching the CLI renderer). Without tags, subagents confabulatepage_id/line_idsfrom global file offsets, producing citations that 404 in the viewer.Test plan
skills/verify/SKILL.mdStep 2 — confirm the proxy-clearing bash block is gone and the "Never modify proxy environment variables" rule is prominentenv | grep+whoamiprobe snippetrender_chunk()code is present and the Format 2 guidance is fully removedskills/verify/rules/cloud-sandbox-constraints.md— confirm the detection header matches the multi-signal description in SKILL.md