Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ The command exits `0` and writes a verification report shaped like this:
## Try the alpha verification loop

Ethos is source-only pre-alpha. There are no release artifacts or package installs yet. From a
source checkout, the current product-proof command is:
source checkout, the current verification loop is:

```bash
make verify-alpha
Expand All @@ -104,7 +104,8 @@ That command builds the CLI and checks the alpha grounding loop across:
- native Ethos document JSON
- synthetic OpenDataLoader-style JSON
- pinned real OpenDataLoader 2.4.7 JSON fixtures
- grounded, ungrounded, stale-fingerprint, and capability-limited citation cases
- grounded, ungrounded, not-found, stale-fingerprint, and capability-limited citation cases
- malformed citation inputs that must fail with usage diagnostics
- byte-identical repeated verification reports for the checked-in fixtures
- deterministic native crop descriptor JSON artifacts

Expand Down Expand Up @@ -142,7 +143,14 @@ test result: ok. 40 passed; 0 failed

ok native-grounded matches examples/verify/goldens/native_grounded_report.json
ok opendataloader-grounded matches examples/verify/goldens/opendataloader_grounded_report.json
ok native-ungrounded matches examples/verify/goldens/native_ungrounded_report.json
ok opendataloader-not-found matches examples/verify/goldens/opendataloader_not_found_report.json
ok native-stale matches examples/verify/goldens/native_stale_report.json
ok opendataloader-capability-limited matches examples/verify/goldens/opendataloader_capability_limited_report.json
ok real-opendataloader-grounded matches fixtures/foreign/opendataloader/real/expected.verification_report.json
ok real-opendataloader-ungrounded matches fixtures/foreign/opendataloader/real/expected.ungrounded.verification_report.json
ok invalid-table-cell-citation exits 2 with expected usage diagnostic
ok invalid-bbox-citation exits 2 with expected usage diagnostic
ok native-grounded-crops crop descriptors validate against schemas/ethos-crop-descriptor.schema.json

verify-alpha demo checks passed
Expand Down
70 changes: 67 additions & 3 deletions docs/demos/verify-alpha.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
# Verify Alpha Demo

## Product Proof
## Verification Loop

Ethos verifies whether AI citations are grounded in document evidence.

This is a citation grounding check, not a semantic-truth system: Ethos does not claim semantic
entailment, factual truth, arithmetic correctness, or answer quality. The alpha proof is the
entailment, factual truth, arithmetic correctness, or answer quality. The alpha loop is the
repeatable `make verify-alpha` path:

- native Ethos JSON citation checks can pass against checked-in document evidence
- OpenDataLoader-style JSON can enter the same verification loop through a grounding adapter
- real pinned OpenDataLoader 2.4.7 output has both grounded and ungrounded citation cases
- native and synthetic OpenDataLoader fixtures cover missing cited elements
- malformed citation inputs return usage diagnostics with exit code `2`
- `--fail-on-ungrounded` turns the report into a CI/agent gate with exit code `1` when evidence is not fully grounded
- native Ethos verification can emit deterministic crop descriptor artifacts with `--crop-dir`
- every demo report is compared against a golden and regenerated twice to prove byte-identical output
- every demo report is compared against a golden and regenerated twice to check byte-identical output
- crop descriptor files are regenerated twice, compared byte-for-byte, validated against schema, and checked against the committed descriptor example

Ethos verifies document evidence for AI systems. The deterministic parser is one grounding
Expand Down Expand Up @@ -41,6 +43,26 @@ Golden report:
examples/verify/goldens/native_grounded_report.json
```

## Native Ethos Ungrounded Citations

```bash
ethos verify schemas/examples/document.example.json \
--citations examples/verify/native_ungrounded_citations.json \
--out /tmp/ethos-native-ungrounded-report.json
```

Expected outcome:

- `all_evidence_grounded` is `false`
- one quote check status is `mismatch` with reason `text_mismatch`
- one presence check status is `not_found` with reason `element_not_found`

Golden report:

```text
examples/verify/goldens/native_ungrounded_report.json
```

## OpenDataLoader-Style JSON

```bash
Expand All @@ -67,6 +89,28 @@ Golden report:
examples/verify/goldens/opendataloader_grounded_report.json
```

## OpenDataLoader-Style Missing Element

```bash
ethos verify examples/verify/opendataloader.json \
--grounding opendataloader-json \
--citations examples/verify/opendataloader_not_found_citations.json \
--out /tmp/ethos-odl-not-found-report.json
```

Expected outcome:

- `grounding.parser.adapter` is `opendataloader-json`
- `all_evidence_grounded` is `false`
- the presence check status is `not_found` with reason `element_not_found`
- `warnings` includes `capability_limited`

Golden report:

```text
examples/verify/goldens/opendataloader_not_found_report.json
```

## Real OpenDataLoader JSON

```bash
Expand Down Expand Up @@ -172,6 +216,26 @@ Exit behavior:
- `1`: verification completed, but not all requested evidence is grounded
- `2`: invalid input, malformed citations, adapter failure, or another usage error

## Malformed Citation Inputs

The harness also checks citation validation failures:

```bash
ethos verify schemas/examples/document.example.json \
--citations examples/verify/invalid_table_cell_citations.json
```

Expected outcome: exit code `2` with a diagnostic that the table-cell citation must include
`table_id` and `cell`.

```bash
ethos verify schemas/examples/document.example.json \
--citations examples/verify/invalid_bbox_citations.json
```

Expected outcome: exit code `2` with a diagnostic that the bbox citation requires a page unless
another target locator is present.

## Crop Descriptors

Native Ethos document grounding can emit deterministic crop descriptor JSON files for each
Expand Down
32 changes: 31 additions & 1 deletion examples/verify/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# WS-VERIFY-ALPHA Demo

This directory contains the first parser-agnostic verification demo.
This directory contains verify-alpha fixtures, citations, and golden reports.

## Native Ethos Grounding

Expand All @@ -22,6 +22,17 @@ ethos verify schemas/examples/document.example.json \
--format summary
```

## Native Ethos Ungrounded Citations

```bash
ethos verify schemas/examples/document.example.json \
--citations examples/verify/native_ungrounded_citations.json \
--out verification_report.json
```

Expected result: `all_evidence_grounded: false`. The quote check reports `text_mismatch`, and
the missing element check reports `element_not_found`.

## OpenDataLoader-Style Grounding

```bash
Expand All @@ -36,6 +47,18 @@ warning. The warning is intentional: the synthetic OpenDataLoader-style fixture
fingerprint, spans, character offsets, or known coordinate origin, but its element and table
evidence can still ground these claims.

## OpenDataLoader-Style Missing Element

```bash
ethos verify examples/verify/opendataloader.json \
--grounding opendataloader-json \
--citations examples/verify/opendataloader_not_found_citations.json \
--out verification_report.json
```

Expected result: `all_evidence_grounded: false`, check status `not_found`, reason
`element_not_found`, and the same `capability_limited` warning as the grounded synthetic fixture.

## Stale Fingerprint

```bash
Expand Down Expand Up @@ -71,10 +94,17 @@ Non-grounded checks may include a stable `reason` label:
| `missing_table_capability` | The claim needs table-cell lookup, but the grounding source does not expose tables. |
| `missing_source_fingerprint` | Citations were fingerprint-pinned, but the grounding source did not declare one. |
| `unknown_coordinate_origin` | A bbox locator was used with a source whose coordinate origin is unknown. |
| `element_not_found` | The cited element id was not found in a source that exposes element ids. |
| `table_not_found` | The cited table id was not found in a source that exposes tables. |
| `table_cell_not_found` | The cited table exists, but the cited cell address was not found. |
| `unsupported_claim_kind` | The claim kind is unsupported by this verifier or the active config. |

## Usage Diagnostics

Malformed citations are covered as usage errors. `invalid_table_cell_citations.json` must exit
`2` because a table-cell claim is missing `table_id` and `cell`. `invalid_bbox_citations.json`
must exit `2` because a bbox locator is missing a page or another target locator.

The OpenDataLoader fixtures are synthetic and limited to the adapter's documented alpha
subset. They are not real pinned OpenDataLoader artifacts. Golden reports live in
`examples/verify/goldens/` and are covered by the CLI verification test.
Expand Down
58 changes: 58 additions & 0 deletions examples/verify/check_verify_alpha.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,19 @@
"citations": "examples/verify/opendataloader_grounded_citations.json",
"golden": "examples/verify/goldens/opendataloader_grounded_report.json",
},
{
"name": "native-ungrounded",
"input": "schemas/examples/document.example.json",
"citations": "examples/verify/native_ungrounded_citations.json",
"golden": "examples/verify/goldens/native_ungrounded_report.json",
},
{
"name": "opendataloader-not-found",
"input": "examples/verify/opendataloader.json",
"grounding": "opendataloader-json",
"citations": "examples/verify/opendataloader_not_found_citations.json",
"golden": "examples/verify/goldens/opendataloader_not_found_report.json",
},
{
"name": "native-stale",
"input": "schemas/examples/document.example.json",
Expand Down Expand Up @@ -53,6 +66,21 @@
},
]

USAGE_ERROR_CASES = [
{
"name": "invalid-table-cell-citation",
"input": "schemas/examples/document.example.json",
"citations": "examples/verify/invalid_table_cell_citations.json",
"stderr_contains": "table_cell citation must include table_id and cell",
},
{
"name": "invalid-bbox-citation",
"input": "schemas/examples/document.example.json",
"citations": "examples/verify/invalid_bbox_citations.json",
"stderr_contains": "citation bbox requires page unless another target locator is present",
},
]


def parse_args():
parser = argparse.ArgumentParser(description=__doc__)
Expand Down Expand Up @@ -257,6 +285,34 @@ def verify_case(case, args):
compare_json(first, args.repo_root / case["golden"], args.repo_root, case["name"])


def verify_usage_error_case(case, args):
command = [
str(args.ethos_bin),
"verify",
str(args.repo_root / case["input"]),
"--citations",
str(args.repo_root / case["citations"]),
]
if "grounding" in case:
command.extend(["--grounding", case["grounding"]])

print("$ " + " ".join(str(part) for part in command), flush=True)
result = subprocess.run(command, cwd=args.repo_root, capture_output=True, check=False)
if result.returncode != 2:
sys.stderr.write(f"{case['name']} exited {result.returncode}, expected 2\n")
sys.stderr.write(result.stderr.decode("utf-8", errors="replace"))
sys.stderr.write(result.stdout.decode("utf-8", errors="replace"))
raise SystemExit(1)
if result.stdout:
raise SystemExit(f"{case['name']} wrote unexpected stdout")
stderr = result.stderr.decode("utf-8", errors="replace")
if case["stderr_contains"] not in stderr:
raise SystemExit(
f"{case['name']} stderr did not contain {case['stderr_contains']!r}\n{stderr}"
)
print(f"ok {case['name']} exits 2 with expected usage diagnostic")


def main():
args = parse_args()
args.repo_root = args.repo_root.resolve()
Expand All @@ -270,6 +326,8 @@ def main():

for case in CASES:
verify_case(case, args)
for case in USAGE_ERROR_CASES:
verify_usage_error_case(case, args)
verify_crop_descriptor_case(args)

print("\nverify-alpha demo checks passed")
Expand Down
66 changes: 66 additions & 0 deletions examples/verify/goldens/native_ungrounded_report.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{
"all_evidence_grounded": false,
"capability_limits": [],
"checks": [
{
"claim": {
"citation": {
"element_id": "e000002",
"page": "p0001"
},
"kind": "quote",
"text": "Operating margin was 99%"
},
"evidence": {
"bbox": [
7200,
10100,
54000,
11500
],
"page": "p0001",
"text": "Revenue grew to $12.4M in Q3 2025, driven by enterprise expansion."
},
"id": "v0001",
"match_method": "normalized_text_contains",
"reason": "text_mismatch",
"semantic_unverified": false,
"status": "mismatch",
"warnings": []
},
{
"claim": {
"citation": {
"element_id": "missing-element"
},
"kind": "presence"
},
"id": "v0002",
"match_method": "none",
"reason": "element_not_found",
"semantic_unverified": false,
"status": "not_found",
"warnings": []
}
],
"document_fingerprint": "sha256:b5d30710d0c25cc38d8dec924ecaf57ae4f81276dd5dc14d75cb3b5b6bde62d3",
"fingerprint_stale": false,
"grounding": {
"capabilities": {
"char_offsets": true,
"coordinate_origin": "top-left",
"crop_support": false,
"fingerprint": true,
"spans": true,
"tables": true
},
"parser": {
"name": "ethos",
"version": "0.1.0"
}
},
"schema_version": "1.0.0",
"unsupported_claim_kinds": [],
"verification_config_sha256": "4bb224166a04a25fed2dd3ecdb9638ddcc5b398658532b73f1c0547e4983d0b0",
"warnings": []
}
Loading
Loading