Skip to content

v0.2: Make DATA-01 decision-grade#6

Merged
t3chn merged 1 commit into
mainfrom
feat/data01-decision-grade
May 25, 2026
Merged

v0.2: Make DATA-01 decision-grade#6
t3chn merged 1 commit into
mainfrom
feat/data01-decision-grade

Conversation

@t3chn
Copy link
Copy Markdown
Contributor

@t3chn t3chn commented May 24, 2026

Closes #3.

Summary:

  • Add three synthetic public DATA-01 cases with CSV and SQLite fixtures.
  • Replace the sample DATA-01 scorer with config-driven deterministic checks for metrics.json, report.md, chart_spec.json, extra files, and critical caps.
  • Align DATA-01 with the standard scorer-contract model: artifact_exact for required files, schema_contract for metrics.json/chart_spec.json, numeric_metric for exact/tolerance values, claim_rubric for report.md factual checks, and future security_leak checks for private canaries/honey rows.
  • Add DATA-01 mutation support, sample artifact generation, data01-smoke, CI coverage, tests, and docs.

Test plan:

  • make validate
  • make test
  • make smoke
  • make compare-smoke
  • make if01-smoke
  • make data01-smoke
  • make leak-check
  • python3 -m ruff check .
  • git diff --check

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1db9638f39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tasks/DATA-01/scorer.py
Comment on lines +275 to +278
if "max_words" in report_config:
count = word_count(text)
maximum = int(report_config["max_words"])
checks.append(check("report_max_words", count <= maximum, 0.04, f"got={count}"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce report word-limit failures as policy violations

When report_max_words fails, the scorer only records a failed check and does not add a policy violation or score cap, so overlong reports can still pass. For example, with otherwise-correct artifacts, a 500-word report in case_001 still returns success=True because the score remains above PASS_THRESHOLD and success is gated on not violations. This makes the configured max_words constraint effectively non-blocking and undermines the task’s “concise report” requirement.

Useful? React with 👍 / 👎.

@t3chn t3chn force-pushed the feat/data01-decision-grade branch from 1db9638 to eae65f7 Compare May 25, 2026 05:46
@t3chn t3chn merged commit 31ff201 into main May 25, 2026
1 check passed
@t3chn t3chn deleted the feat/data01-decision-grade branch May 25, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.2: Make DATA-01 decision-grade

1 participant