diff --git a/README.md b/README.md index 403d435..3cdb18d 100644 --- a/README.md +++ b/README.md @@ -93,7 +93,7 @@ The command exits `0` and writes a verification report shaped like this: ## Try the alpha verification loop Ethos is source-only pre-alpha. There are no release artifacts or package installs yet. From a -source checkout, the current product-proof command is: +source checkout, the current verification loop is: ```bash make verify-alpha @@ -104,7 +104,8 @@ That command builds the CLI and checks the alpha grounding loop across: - native Ethos document JSON - synthetic OpenDataLoader-style JSON - pinned real OpenDataLoader 2.4.7 JSON fixtures -- grounded, ungrounded, stale-fingerprint, and capability-limited citation cases +- grounded, ungrounded, not-found, stale-fingerprint, and capability-limited citation cases +- malformed citation inputs that must fail with usage diagnostics - byte-identical repeated verification reports for the checked-in fixtures - deterministic native crop descriptor JSON artifacts @@ -142,7 +143,14 @@ test result: ok. 40 passed; 0 failed ok native-grounded matches examples/verify/goldens/native_grounded_report.json ok opendataloader-grounded matches examples/verify/goldens/opendataloader_grounded_report.json +ok native-ungrounded matches examples/verify/goldens/native_ungrounded_report.json +ok opendataloader-not-found matches examples/verify/goldens/opendataloader_not_found_report.json +ok native-stale matches examples/verify/goldens/native_stale_report.json +ok opendataloader-capability-limited matches examples/verify/goldens/opendataloader_capability_limited_report.json ok real-opendataloader-grounded matches fixtures/foreign/opendataloader/real/expected.verification_report.json +ok real-opendataloader-ungrounded matches fixtures/foreign/opendataloader/real/expected.ungrounded.verification_report.json +ok invalid-table-cell-citation exits 2 with expected usage diagnostic +ok invalid-bbox-citation exits 2 with expected usage diagnostic ok native-grounded-crops crop descriptors validate against schemas/ethos-crop-descriptor.schema.json verify-alpha demo checks passed diff --git a/docs/demos/verify-alpha.md b/docs/demos/verify-alpha.md index e14f411..4701ef1 100644 --- a/docs/demos/verify-alpha.md +++ b/docs/demos/verify-alpha.md @@ -1,19 +1,21 @@ # Verify Alpha Demo -## Product Proof +## Verification Loop Ethos verifies whether AI citations are grounded in document evidence. This is a citation grounding check, not a semantic-truth system: Ethos does not claim semantic -entailment, factual truth, arithmetic correctness, or answer quality. The alpha proof is the +entailment, factual truth, arithmetic correctness, or answer quality. The alpha loop is the repeatable `make verify-alpha` path: - native Ethos JSON citation checks can pass against checked-in document evidence - OpenDataLoader-style JSON can enter the same verification loop through a grounding adapter - real pinned OpenDataLoader 2.4.7 output has both grounded and ungrounded citation cases +- native and synthetic OpenDataLoader fixtures cover missing cited elements +- malformed citation inputs return usage diagnostics with exit code `2` - `--fail-on-ungrounded` turns the report into a CI/agent gate with exit code `1` when evidence is not fully grounded - native Ethos verification can emit deterministic crop descriptor artifacts with `--crop-dir` -- every demo report is compared against a golden and regenerated twice to prove byte-identical output +- every demo report is compared against a golden and regenerated twice to check byte-identical output - crop descriptor files are regenerated twice, compared byte-for-byte, validated against schema, and checked against the committed descriptor example Ethos verifies document evidence for AI systems. The deterministic parser is one grounding @@ -41,6 +43,26 @@ Golden report: examples/verify/goldens/native_grounded_report.json ``` +## Native Ethos Ungrounded Citations + +```bash +ethos verify schemas/examples/document.example.json \ + --citations examples/verify/native_ungrounded_citations.json \ + --out /tmp/ethos-native-ungrounded-report.json +``` + +Expected outcome: + +- `all_evidence_grounded` is `false` +- one quote check status is `mismatch` with reason `text_mismatch` +- one presence check status is `not_found` with reason `element_not_found` + +Golden report: + +```text +examples/verify/goldens/native_ungrounded_report.json +``` + ## OpenDataLoader-Style JSON ```bash @@ -67,6 +89,28 @@ Golden report: examples/verify/goldens/opendataloader_grounded_report.json ``` +## OpenDataLoader-Style Missing Element + +```bash +ethos verify examples/verify/opendataloader.json \ + --grounding opendataloader-json \ + --citations examples/verify/opendataloader_not_found_citations.json \ + --out /tmp/ethos-odl-not-found-report.json +``` + +Expected outcome: + +- `grounding.parser.adapter` is `opendataloader-json` +- `all_evidence_grounded` is `false` +- the presence check status is `not_found` with reason `element_not_found` +- `warnings` includes `capability_limited` + +Golden report: + +```text +examples/verify/goldens/opendataloader_not_found_report.json +``` + ## Real OpenDataLoader JSON ```bash @@ -172,6 +216,26 @@ Exit behavior: - `1`: verification completed, but not all requested evidence is grounded - `2`: invalid input, malformed citations, adapter failure, or another usage error +## Malformed Citation Inputs + +The harness also checks citation validation failures: + +```bash +ethos verify schemas/examples/document.example.json \ + --citations examples/verify/invalid_table_cell_citations.json +``` + +Expected outcome: exit code `2` with a diagnostic that the table-cell citation must include +`table_id` and `cell`. + +```bash +ethos verify schemas/examples/document.example.json \ + --citations examples/verify/invalid_bbox_citations.json +``` + +Expected outcome: exit code `2` with a diagnostic that the bbox citation requires a page unless +another target locator is present. + ## Crop Descriptors Native Ethos document grounding can emit deterministic crop descriptor JSON files for each diff --git a/examples/verify/README.md b/examples/verify/README.md index 1e2f897..34e3a6d 100644 --- a/examples/verify/README.md +++ b/examples/verify/README.md @@ -1,6 +1,6 @@ # WS-VERIFY-ALPHA Demo -This directory contains the first parser-agnostic verification demo. +This directory contains verify-alpha fixtures, citations, and golden reports. ## Native Ethos Grounding @@ -22,6 +22,17 @@ ethos verify schemas/examples/document.example.json \ --format summary ``` +## Native Ethos Ungrounded Citations + +```bash +ethos verify schemas/examples/document.example.json \ + --citations examples/verify/native_ungrounded_citations.json \ + --out verification_report.json +``` + +Expected result: `all_evidence_grounded: false`. The quote check reports `text_mismatch`, and +the missing element check reports `element_not_found`. + ## OpenDataLoader-Style Grounding ```bash @@ -36,6 +47,18 @@ warning. The warning is intentional: the synthetic OpenDataLoader-style fixture fingerprint, spans, character offsets, or known coordinate origin, but its element and table evidence can still ground these claims. +## OpenDataLoader-Style Missing Element + +```bash +ethos verify examples/verify/opendataloader.json \ + --grounding opendataloader-json \ + --citations examples/verify/opendataloader_not_found_citations.json \ + --out verification_report.json +``` + +Expected result: `all_evidence_grounded: false`, check status `not_found`, reason +`element_not_found`, and the same `capability_limited` warning as the grounded synthetic fixture. + ## Stale Fingerprint ```bash @@ -71,10 +94,17 @@ Non-grounded checks may include a stable `reason` label: | `missing_table_capability` | The claim needs table-cell lookup, but the grounding source does not expose tables. | | `missing_source_fingerprint` | Citations were fingerprint-pinned, but the grounding source did not declare one. | | `unknown_coordinate_origin` | A bbox locator was used with a source whose coordinate origin is unknown. | +| `element_not_found` | The cited element id was not found in a source that exposes element ids. | | `table_not_found` | The cited table id was not found in a source that exposes tables. | | `table_cell_not_found` | The cited table exists, but the cited cell address was not found. | | `unsupported_claim_kind` | The claim kind is unsupported by this verifier or the active config. | +## Usage Diagnostics + +Malformed citations are covered as usage errors. `invalid_table_cell_citations.json` must exit +`2` because a table-cell claim is missing `table_id` and `cell`. `invalid_bbox_citations.json` +must exit `2` because a bbox locator is missing a page or another target locator. + The OpenDataLoader fixtures are synthetic and limited to the adapter's documented alpha subset. They are not real pinned OpenDataLoader artifacts. Golden reports live in `examples/verify/goldens/` and are covered by the CLI verification test. diff --git a/examples/verify/check_verify_alpha.py b/examples/verify/check_verify_alpha.py index c3cd130..2458c76 100644 --- a/examples/verify/check_verify_alpha.py +++ b/examples/verify/check_verify_alpha.py @@ -24,6 +24,19 @@ "citations": "examples/verify/opendataloader_grounded_citations.json", "golden": "examples/verify/goldens/opendataloader_grounded_report.json", }, + { + "name": "native-ungrounded", + "input": "schemas/examples/document.example.json", + "citations": "examples/verify/native_ungrounded_citations.json", + "golden": "examples/verify/goldens/native_ungrounded_report.json", + }, + { + "name": "opendataloader-not-found", + "input": "examples/verify/opendataloader.json", + "grounding": "opendataloader-json", + "citations": "examples/verify/opendataloader_not_found_citations.json", + "golden": "examples/verify/goldens/opendataloader_not_found_report.json", + }, { "name": "native-stale", "input": "schemas/examples/document.example.json", @@ -53,6 +66,21 @@ }, ] +USAGE_ERROR_CASES = [ + { + "name": "invalid-table-cell-citation", + "input": "schemas/examples/document.example.json", + "citations": "examples/verify/invalid_table_cell_citations.json", + "stderr_contains": "table_cell citation must include table_id and cell", + }, + { + "name": "invalid-bbox-citation", + "input": "schemas/examples/document.example.json", + "citations": "examples/verify/invalid_bbox_citations.json", + "stderr_contains": "citation bbox requires page unless another target locator is present", + }, +] + def parse_args(): parser = argparse.ArgumentParser(description=__doc__) @@ -257,6 +285,34 @@ def verify_case(case, args): compare_json(first, args.repo_root / case["golden"], args.repo_root, case["name"]) +def verify_usage_error_case(case, args): + command = [ + str(args.ethos_bin), + "verify", + str(args.repo_root / case["input"]), + "--citations", + str(args.repo_root / case["citations"]), + ] + if "grounding" in case: + command.extend(["--grounding", case["grounding"]]) + + print("$ " + " ".join(str(part) for part in command), flush=True) + result = subprocess.run(command, cwd=args.repo_root, capture_output=True, check=False) + if result.returncode != 2: + sys.stderr.write(f"{case['name']} exited {result.returncode}, expected 2\n") + sys.stderr.write(result.stderr.decode("utf-8", errors="replace")) + sys.stderr.write(result.stdout.decode("utf-8", errors="replace")) + raise SystemExit(1) + if result.stdout: + raise SystemExit(f"{case['name']} wrote unexpected stdout") + stderr = result.stderr.decode("utf-8", errors="replace") + if case["stderr_contains"] not in stderr: + raise SystemExit( + f"{case['name']} stderr did not contain {case['stderr_contains']!r}\n{stderr}" + ) + print(f"ok {case['name']} exits 2 with expected usage diagnostic") + + def main(): args = parse_args() args.repo_root = args.repo_root.resolve() @@ -270,6 +326,8 @@ def main(): for case in CASES: verify_case(case, args) + for case in USAGE_ERROR_CASES: + verify_usage_error_case(case, args) verify_crop_descriptor_case(args) print("\nverify-alpha demo checks passed") diff --git a/examples/verify/goldens/native_ungrounded_report.json b/examples/verify/goldens/native_ungrounded_report.json new file mode 100644 index 0000000..f07729f --- /dev/null +++ b/examples/verify/goldens/native_ungrounded_report.json @@ -0,0 +1,66 @@ +{ + "all_evidence_grounded": false, + "capability_limits": [], + "checks": [ + { + "claim": { + "citation": { + "element_id": "e000002", + "page": "p0001" + }, + "kind": "quote", + "text": "Operating margin was 99%" + }, + "evidence": { + "bbox": [ + 7200, + 10100, + 54000, + 11500 + ], + "page": "p0001", + "text": "Revenue grew to $12.4M in Q3 2025, driven by enterprise expansion." + }, + "id": "v0001", + "match_method": "normalized_text_contains", + "reason": "text_mismatch", + "semantic_unverified": false, + "status": "mismatch", + "warnings": [] + }, + { + "claim": { + "citation": { + "element_id": "missing-element" + }, + "kind": "presence" + }, + "id": "v0002", + "match_method": "none", + "reason": "element_not_found", + "semantic_unverified": false, + "status": "not_found", + "warnings": [] + } + ], + "document_fingerprint": "sha256:b5d30710d0c25cc38d8dec924ecaf57ae4f81276dd5dc14d75cb3b5b6bde62d3", + "fingerprint_stale": false, + "grounding": { + "capabilities": { + "char_offsets": true, + "coordinate_origin": "top-left", + "crop_support": false, + "fingerprint": true, + "spans": true, + "tables": true + }, + "parser": { + "name": "ethos", + "version": "0.1.0" + } + }, + "schema_version": "1.0.0", + "unsupported_claim_kinds": [], + "verification_config_sha256": "4bb224166a04a25fed2dd3ecdb9638ddcc5b398658532b73f1c0547e4983d0b0", + "warnings": [] +} diff --git a/examples/verify/goldens/opendataloader_not_found_report.json b/examples/verify/goldens/opendataloader_not_found_report.json new file mode 100644 index 0000000..9aff6dc --- /dev/null +++ b/examples/verify/goldens/opendataloader_not_found_report.json @@ -0,0 +1,48 @@ +{ + "all_evidence_grounded": false, + "capability_limits": [ + "missing_fingerprint", + "missing_spans", + "missing_char_offsets", + "unknown_coordinate_origin" + ], + "checks": [ + { + "claim": { + "citation": { + "element_id": "odl-missing" + }, + "kind": "presence" + }, + "id": "v0001", + "match_method": "none", + "reason": "element_not_found", + "semantic_unverified": false, + "status": "not_found", + "warnings": [] + } + ], + "fingerprint_stale": false, + "grounding": { + "capabilities": { + "char_offsets": false, + "coordinate_origin": "unknown", + "crop_support": false, + "fingerprint": false, + "spans": false, + "tables": true + }, + "parser": { + "adapter": "opendataloader-json", + "adapter_version": "0.1.0", + "name": "opendataloader-pdf", + "version": "0.0.0-synthetic" + } + }, + "schema_version": "1.0.0", + "unsupported_claim_kinds": [], + "verification_config_sha256": "4bb224166a04a25fed2dd3ecdb9638ddcc5b398658532b73f1c0547e4983d0b0", + "warnings": [ + "capability_limited" + ] +} diff --git a/examples/verify/invalid_bbox_citations.json b/examples/verify/invalid_bbox_citations.json new file mode 100644 index 0000000..2d82653 --- /dev/null +++ b/examples/verify/invalid_bbox_citations.json @@ -0,0 +1,15 @@ +{ + "claims": [ + { + "kind": "presence", + "citation": { + "bbox": [ + 7300, + 10200, + 8000, + 11000 + ] + } + } + ] +} diff --git a/examples/verify/invalid_table_cell_citations.json b/examples/verify/invalid_table_cell_citations.json new file mode 100644 index 0000000..1f1f686 --- /dev/null +++ b/examples/verify/invalid_table_cell_citations.json @@ -0,0 +1,11 @@ +{ + "claims": [ + { + "kind": "table_cell", + "text": "$12.4M", + "citation": { + "element_id": "e000002" + } + } + ] +} diff --git a/examples/verify/native_ungrounded_citations.json b/examples/verify/native_ungrounded_citations.json new file mode 100644 index 0000000..fcaca48 --- /dev/null +++ b/examples/verify/native_ungrounded_citations.json @@ -0,0 +1,19 @@ +{ + "document_fingerprint": "sha256:b5d30710d0c25cc38d8dec924ecaf57ae4f81276dd5dc14d75cb3b5b6bde62d3", + "claims": [ + { + "kind": "quote", + "text": "Operating margin was 99%", + "citation": { + "page": "p0001", + "element_id": "e000002" + } + }, + { + "kind": "presence", + "citation": { + "element_id": "missing-element" + } + } + ] +} diff --git a/examples/verify/opendataloader_not_found_citations.json b/examples/verify/opendataloader_not_found_citations.json new file mode 100644 index 0000000..0cfb7a8 --- /dev/null +++ b/examples/verify/opendataloader_not_found_citations.json @@ -0,0 +1,10 @@ +{ + "claims": [ + { + "kind": "presence", + "citation": { + "element_id": "odl-missing" + } + } + ] +}