The repo has an eval harness and CI, but there is no explicit Phase 3B exit bar or published baseline that says when hardening is “good enough.”
Acceptance criteria:
- Define per-domain thresholds and an overall gate for Phase 3B.
- Publish the current baseline from the existing eval harness.
- Make CI and docs point at the same gate.
The repo has an eval harness and CI, but there is no explicit Phase 3B exit bar or published baseline that says when hardening is “good enough.”
Acceptance criteria: