Skip to content

v0.3: Make DOC-01 decision-grade#13

Merged
t3chn merged 1 commit into
mainfrom
feat/doc01-decision-grade
May 25, 2026
Merged

v0.3: Make DOC-01 decision-grade#13
t3chn merged 1 commit into
mainfrom
feat/doc01-decision-grade

Conversation

@t3chn
Copy link
Copy Markdown
Contributor

@t3chn t3chn commented May 25, 2026

Closes #12.

Adds DOC-01 as the third decision-grade task family for fixed-corpus document QA, grounded answers, citation validation, and unsupported-claim detection.

Includes:

  • synthetic public DOC-01 cases;
  • deterministic scorer;
  • answer.md / citations.json / claims.json artifact contract;
  • local corpus citation validation;
  • unsupported-claim and stale/distractor source checks;
  • mutation support;
  • sample artifact generation;
  • doc01-smoke;
  • tests and docs.

Scorer contract mapping:

  • artifact_exact for required files and no-extra-file checks;
  • schema_contract for citations.json and claims.json;
  • claim_rubric for supported/unsupported claims and evidence;
  • security_leak for future private canaries / hidden corpus leaks;
  • redacted feedback remains enforced through public compare report tests.

Non-goals:

  • no live web;
  • no browser workflow;
  • no private holdout fixtures in public repo;
  • no real customer data;
  • no private scorer configs;
  • no consumer app coupling;
  • no private bundle runtime.

Test plan:

  • make validate
  • make test
  • make smoke
  • make compare-smoke
  • make if01-smoke
  • make data01-smoke
  • make doc01-smoke
  • make leak-check
  • python3 -m ruff check .
  • git diff --check
  • tracked-file audit

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa99722ea2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tasks/DOC-01/scorer.py
Comment on lines +150 to +154
unique_id = bool(citation_id and citation_id not in seen_ids)
checks.append(check(f"citation_id_unique:{item_name}", item_ok and unique_id, 0.02))
if citation_id:
seen_ids.add(citation_id)
citation_map[citation_id] = item
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject duplicate citation IDs as policy violations

A duplicate citation ID currently only loses 0.02 points and never adds a policy violation, so malformed citations.json can still pass (e.g., adding a second entry with an existing id yields success=true and a high score). Because citation_map is still updated for duplicates, claim resolution remains ambiguous while the run is marked successful, which weakens the decision-grade contract.

Useful? React with 👍 / 👎.

Comment thread tasks/DOC-01/scorer.py
Comment on lines +206 to +210
actual = {
str(item.get("id")): item
for item in items
if isinstance(item, dict) and item.get("id") is not None
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Detect duplicate claim IDs before collapsing to a dict

The scorer collapses claims into a dict keyed by id, so repeated claim IDs are silently overwritten and never penalized; with allow_extra_claims=false, duplicated required IDs still produce a perfect pass. This accepts structurally invalid claims.json as successful output and can hide contradictory claim entries in evaluation artifacts.

Useful? React with 👍 / 👎.

@t3chn t3chn merged commit 3e8ec69 into main May 25, 2026
1 check passed
@t3chn t3chn deleted the feat/doc01-decision-grade branch May 25, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.3: Make DOC-01 decision-grade

1 participant