[codex] admit sourceMatch-only citations and downgrade ambiguous matches#460
Conversation
…er fully Verified A citation whose model-displayed claimText differs from the verbatim sourceMatch located in the source — the case the popover marks with a `≈` — was still classified Verified (green). Verified must mean verified; an approximate anchor is at most a Partial Match. getCitationStatus now accepts the citation's claimText/sourceMatch and downgrades isPartialMatch=true when they differ. CitationVariants and primitives (CitationRoot) pass the citation through. See FileLasso tracker issue 18.
…ue-235) Relax the sourceContext gate in both parseCitationResponse.ts and getAllCitationsFromNumericResponse (citationParser.ts) to admit citations that carry sourceMatch without sourceContext. Previously, such entries were silently dropped, leaving [N] markers in prose with no citationMarkerMap entry, causing permanently-pulsing chips in the UI. Adds unit tests verifying that (a) a sourceMatch-only entry produces a markerMap entry, (b) an entry with neither field is still dropped, and (c) normal entries with both fields are unaffected.
…ission Two tests still expected sourceMatch-only citations to be filtered out, but commit b30f0b8 relaxed the gate. Update them to assert the new admitted behavior.
Gemini's infinite-loop hallucination isn't always a single character — it also emits repeated HTML tokens like </font>\n dozens or hundreds of times. The previous check only caught single-char repetition. Adds a line-split pass: if every line in the trimmed content is the same non-trivial string (≥ 2 chars, ≥ 2 lines), classify as garbage. Also adds the first test file for parseWorkAround, covering both the single-char and multi-char cases, plus cleanRepeatingLastSentence.
…rtial match (issue-228) sourceMatch ⊄ sourceContext violates §1 (anchor must be a verbatim substring). The status was treated as green/Verified, producing misleading "Verified" badges for DSM-5 code citations (F43.10, F60.3) whose context didn't contain the code. - Move found_context_missed_source_match into PARTIAL_STATUSES → badge shows amber "Partial Match" instead of green "Verified" - Remove the explicit status === "found_context_missed_source_match" arm from isVerified (now covered by isPartialMatch) - statusRegistry: colorScheme green → amber, headerKey → status.partialMatch - Update citationStatus and CitationDrawer tests to assert the new behaviour Fixes: misleading "Verified" badge when sourceMatch is not in sourceContext
|
The latest updates on your projects. Learn more about Vercel for GitHub. 4 Skipped Deployments
|
✅ Playwright Test ReportStatus: Tests passed 📊 Download Report & Snapshots (see Artifacts section) What's in the Visual SnapshotsThe gallery includes visual snapshots for:
Run ID: 26307946475 |
|
@codex review for security regressions, missing tests, and risky behavior changes. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f07b5d482
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| outcome: "partial_match", | ||
| colorScheme: "amber", | ||
| headerKey: "status.partialMatch", |
There was a problem hiding this comment.
Update fallback narrative for downgraded context-miss status
After downgrading found_context_missed_source_match to partial (status.partialMatch/amber) in STATUS_MAP, the fallback path in getOutcomeSummary still maps this status to outcome.exactMatch (src/analysis/narrative.ts, switch at lines 204-207). In cases where searchAttempts is empty or lacks matchedVariation, users will see conflicting signals (partial status header/color but exact-match summary text), which regresses the intended downgrade behavior and can mislead verification interpretation.
Useful? React with 👍 / 👎.
| outcome: "partial_match", | ||
| colorScheme: "amber", | ||
| headerKey: "status.partialMatch", | ||
| showOnlyHit: true, |
There was a problem hiding this comment.
Show full search log for downgraded context-miss status
found_context_missed_source_match is now marked as a partial outcome, but this entry still has showOnlyHit: true, so buildSearchNarrative() treats it like an exact-match status and sets groupedAttemptCount to 0 (src/analysis/narrative.ts), which suppresses the search-log toggle in the evidence footer (src/react/evidence/EvidenceTray.tsx, toggle requires searchCount > 0). For this newly-partial status, users can lose access to the search trail that explains why it was downgraded.
Useful? React with 👍 / 👎.
| */ | ||
| export const PARTIAL_STATUSES: ReadonlySet<SearchStatus> = new Set<SearchStatus>([ | ||
| "found_source_match_only", | ||
| "found_context_missed_source_match", |
There was a problem hiding this comment.
Classify downgraded context-miss as related in intent summary
By adding found_context_missed_source_match to PARTIAL_STATUSES, this status now enters the miss/partial evidence path, but buildIntentSummary() still classifies it as exact_match (src/analysis/intent.ts, status === "found" || status === "found_context_missed_source_match"). In that case SearchAnalysisSummary receives no primary message and no snippets, so the partial-analysis section renders essentially empty instead of explaining the partial result.
Useful? React with 👍 / 👎.
Summary
This PR tightens DeepCitation citation parsing and status classification around partial/ambiguous evidence:
sourceMatchwithoutsourceContext, including orphan numeric-marker casesfound_source_match_onlyandfound_context_missed_source_matchas partial matches instead of fully verified resultsclaimText/sourceMatchmismatches to partial-match UI/status treatmentWhy
Some LLM outputs include useful source anchors without a full
sourceContext, and verification can find related-but-not-exact evidence. Those should remain visible and traceable, but they should not be shown as fully verified when the evidence is partial, ambiguous, or source-match-only.Validation
bun run test -- --runInBand— 2628 passed, 0 failedbun run build— passedbun run lint— passed with one Biome warning insrc/react/Citation.tsx:1003aboutviewStatehook dependency