docushell · docushell-dev · Jun 16, 2026 · Jun 16, 2026
diff --git a/docs/execution-status.md b/docs/execution-status.md
@@ -16,7 +16,7 @@ The committed implementation now includes:
 - The determinism workflow includes a Windows x64 preflight lane for core c14n/profile/fingerprint contract tests, while PDFium-backed corpus work remains explicitly skipped unless the pinned runtime is configured on that runner. A static workflow test guards that matrix wiring.
 - `ethos doc parse` / `ethos fingerprint` PDF execution through a worker process with `max_parse_ms` timeout enforcement, stable error-envelope relay, diagnostics-gated worker stderr, and page-range validation/filtering.
 - Quantized page/span extraction at the backend boundary, plus a basic deterministic layout pass that assembles paragraph `text_block` elements, fixture-backed alpha heading and flat list-item elements, and simple column reading order for the current born-digital fixtures. Current alpha layout confidence is explicit for heading signals, and below-threshold layout confidence emits deterministic `low_confidence_reading_order` diagnostics instead of staying silent. Fixture validation binds selected `fixture.json` expectations to committed extraction/layout goldens and binds current alpha text/Markdown exports to committed layout output so current read-order, element-type, heading-export, list-item, and export cases fail closed on drift.
-- An internal layout evaluator scaffold exists at `fixtures/evaluate_layout_alpha.py` and `make layout-evaluator-alpha`. It reads committed `fixture.json` and `layout.json` files, summarizes alpha element-type and subset coverage, and fails closed on missing layout expectations, confidence-policy drift, or drift in fixture-backed reading order / heading / list-item cases.
+- An internal layout evaluator scaffold exists at `fixtures/evaluate_layout_alpha.py` and `make layout-evaluator-alpha`. It reads committed `fixture.json`, `layout.json`, `text.txt`, and `markdown.md` files, summarizes alpha element-type and subset coverage, and fails closed on missing layout expectations, dangling/invalid warning references, confidence-policy drift, export-golden drift, or drift in fixture-backed reading order / heading / list-item cases.
 - Schema/example/profile validation is green through `schemas/validate_examples.py` using `jsonschema` draft 2020-12 validation, including the crop descriptor artifact contract plus referential-integrity and bbox sanity checks outside JSON Schema.
 - `ethos verify` now produces non-empty quote, value, presence, and table-cell verification checks over native Ethos document JSON and synthetic OpenDataLoader-style JSON through `--grounding opendataloader-json`; it also verifies quote/value/presence citations over pinned real OpenDataLoader 2.4.7 JSON, including grounded and ungrounded cases. Citation/config inputs are rejected when they drift outside the closed schemas. The public demo harness covers grounded, ungrounded, split-quote, not-found, stale-fingerprint, unsupported non-v1 claim, capability-limited, malformed-citation, malformed OpenDataLoader-style input, and summary-format reject paths.
 - Verification semantics are now trust-honest at alpha scope: quote containment is explicitly labeled, value/table-cell checks require normalized equality, fingerprint-pinned citations fail closed when source fingerprints are unavailable, and structured capability limits explain why a run is downgraded.
@@ -53,7 +53,7 @@ Milestone A has an accepted internal Gate Zero decision for roadmap control, so
 | PDFium loader/runtime checks | Landed: missing/mismatched version, artifact, and runtime library hashes fail deterministically | Release packaging and operator setup path still need hardening |
 | Real PDF backend | Landed for simple born-digital PDFs: page count, quantized spans, worker execution, timeout, page filtering, and fingerprint path exist | Wider corpus coverage, failure fixtures, memory-limit behavior, quirk log, and Gate Zero run are still missing |
 | Layout groundwork | Landed: basic paragraph text blocks, fixture-backed alpha heading and flat list-item elements, simple column reading order over quantized spans, explicit alpha heading-confidence values, deterministic below-threshold confidence diagnostics, fixture metadata checks against committed extraction/layout goldens for current read-order and element-type expectations, and alpha text/Markdown export goldens derived from committed layout output | Tables, nested/richer list and heading semantics, rotation/quirk handling, and broader confidence dimensions remain future work |
-| Layout evaluator scaffold | Landed: deterministic internal evaluator over committed layout fixture expectations, with heading/list/reading-order coverage checks, confidence-policy checks, expectation drift diagnostics, report JSON, Make target, and unit coverage | Broader evaluator dimensions and CI matrix integration remain future work |
+| Layout evaluator scaffold | Landed: deterministic internal evaluator over committed layout fixture expectations, with heading/list/reading-order coverage checks, warning-reference checks, confidence-policy checks, text/Markdown export-golden checks, expectation drift diagnostics, report JSON, Make target, and unit coverage | Broader evaluator dimensions and CI matrix integration remain future work |
 | Python surface scaffold | Landed: internal stdlib wrapper over a caller-provided local `ethos doc parse` command, with explicit JSON/Markdown/text methods, page selection passthrough, diagnostics passthrough, timeout handling, command failure reporting, and mocked-command unit coverage | Native binding work, broader API design, and public setup path remain future work |
 | Font policy groundwork | Partially landed: substitution table and profile policy are present; fixture output uses deterministic substitution IDs | Bundled fallback asset hashing and broader font/CID validation remain open |
 | Schema/example validation | Landed: schemas, examples, deterministic profile, referential integrity, and bbox sanity pass the `jsonschema` validation gate | Contract changes still require explicit versioning and compatibility review |

diff --git a/fixtures/evaluate_layout_alpha.py b/fixtures/evaluate_layout_alpha.py
@@ -20,7 +20,8 @@
 This script does not parse PDFs and does not compare Ethos to other tools. It
 summarizes the committed alpha layout fixture expectations and fails closed when
 layout.json drifts away from fixture.json expectations, when required expectation
-fields are missing, or when heading/list/reading-order fixture coverage is absent.
+fields are missing, when committed export goldens drift, or when
+heading/list/reading-order fixture coverage is absent.
 """
 
 from __future__ import annotations
@@ -36,6 +37,8 @@
 REQUIRED_EXPECTATION_FIELDS = ("expected_text", "expected_element_types")
 ALPHA_LAYOUT_CONFIDENCE_WARNING_THRESHOLD = 800
 LOW_CONFIDENCE_READING_ORDER_CODE = "low_confidence_reading_order"
+TEXT_EXPORT = "text.txt"
+MARKDOWN_EXPORT = "markdown.md"
 COVERAGE_GATES = {
     "heading_fixture": {
         "subset": "headings",
@@ -86,6 +89,7 @@ def main(argv: Optional[List[str]] = None) -> int:
             f"{json.dumps(report['element_type_counts'], sort_keys=True)}"
         )
         print("ok    layout evaluator heading/list/reading-order coverage present")
+        print("ok    layout evaluator export and warning diagnostics present")
         if args.out is not None:
             print(f"ok    layout evaluator report wrote {args.out}")
         return 0
@@ -294,13 +298,27 @@ def evaluate_fixture(
         len(elements),
         diagnostics,
     )
+    warning_shape_status = compare_warning_shape(
+        fixture_id,
+        fixture_rel,
+        elements,
+        warnings,
+        diagnostics,
+    )
     confidence_policy_status = compare_confidence_policy(
         fixture_id,
         fixture_rel,
         elements,
         warnings,
         diagnostics,
     )
+    export_goldens_status = compare_export_goldens(
+        fixture_id,
+        fixture_dir,
+        fixture_rel,
+        elements,
+        diagnostics,
+    )
     subset_status = compare_subset_expectations(
         fixture_id,
         fixture_rel,
@@ -319,7 +337,9 @@ def evaluate_fixture(
         "expected_text": expected_text_status,
         "expected_element_types": expected_element_types_status,
         "expected_elements": expected_elements_status,
+        "warning_shape": warning_shape_status,
         "confidence_policy": confidence_policy_status,
+        "export_goldens": export_goldens_status,
         "subset_expectations": subset_status,
     }
 
@@ -438,6 +458,136 @@ def compare_expected_elements(
     return "pass"
 
 
+def compare_warning_shape(
+    fixture_id: str,
+    fixture_rel: str,
+    elements: List[Any],
+    warnings: List[Any],
+    diagnostics: List[Dict[str, Any]],
+) -> str:
+    invalid = False
+    mismatch = False
+    checked = False
+    element_ids = {
+        element.get("id")
+        for element in elements
+        if isinstance(element, dict) and isinstance(element.get("id"), str)
+    }
+    warning_ids = set()
+
+    for warning_index, warning in enumerate(warnings):
+        checked = True
+        if not isinstance(warning, dict):
+            diagnostics.append(
+                diagnostic(
+                    "invalid_layout",
+                    fixture_id,
+                    f"layout warning {warning_index} must be an object",
+                    f"{fixture_rel}/layout.json",
+                )
+            )
+            invalid = True
+            continue
+        warning_id = warning.get("id")
+        if not isinstance(warning_id, str) or not warning_id:
+            diagnostics.append(
+                diagnostic(
+                    "invalid_layout",
+                    fixture_id,
+                    f"layout warning {warning_index} id must be a non-empty string",
+                    f"{fixture_rel}/layout.json",
+                )
+            )
+            invalid = True
+        elif warning_id in warning_ids:
+            diagnostics.append(
+                diagnostic(
+                    "invalid_layout",
+                    fixture_id,
+                    f"layout warning {warning_index} id must be unique",
+                    f"{fixture_rel}/layout.json",
+                )
+            )
+            invalid = True
+        else:
+            warning_ids.add(warning_id)
+        for field in ("code", "message"):
+            if not isinstance(warning.get(field), str) or not warning[field]:
+                diagnostics.append(
+                    diagnostic(
+                        "invalid_layout",
+                        fixture_id,
+                        f"layout warning {warning_index} {field} must be a non-empty string",
+                        f"{fixture_rel}/layout.json",
+                    )
+                )
+                invalid = True
+        for field in ("page", "element_ref", "span_ref", "region_ref"):
+            value = warning.get(field)
+            if value is not None and not isinstance(value, str):
+                diagnostics.append(
+                    diagnostic(
+                        "invalid_layout",
+                        fixture_id,
+                        f"layout warning {warning_index} {field} must be a string when present",
+                        f"{fixture_rel}/layout.json",
+                    )
+                )
+                invalid = True
+        element_ref = warning.get("element_ref")
+        if isinstance(element_ref, str) and element_ref not in element_ids:
+            diagnostics.append(
+                diagnostic(
+                    "warning_ref_mismatch",
+                    fixture_id,
+                    "layout warning element_ref must reference a committed layout element",
+                    f"{fixture_rel}/layout.json",
+                    expected=sorted(element_ids),
+                    actual=element_ref,
+                )
+            )
+            mismatch = True
+
+    for element_index, element in enumerate(elements):
+        if not isinstance(element, dict):
+            continue
+        warning_refs = element.get("warning_refs", [])
+        if warning_refs:
+            checked = True
+        if not isinstance(warning_refs, list) or not all(
+            isinstance(item, str) for item in warning_refs
+        ):
+            diagnostics.append(
+                diagnostic(
+                    "invalid_layout",
+                    fixture_id,
+                    f"layout element {element_index} warning_refs must be a string array",
+                    f"{fixture_rel}/layout.json",
+                )
+            )
+            invalid = True
+            continue
+        for warning_ref in warning_refs:
+            if warning_ref not in warning_ids:
+                diagnostics.append(
+                    diagnostic(
+                        "warning_ref_mismatch",
+                        fixture_id,
+                        "layout element warning_refs must reference committed layout warnings",
+                        f"{fixture_rel}/layout.json",
+                        expected=sorted(warning_ids),
+                        actual=warning_ref,
+                    )
+                )
+                mismatch = True
+
+    if invalid:
+        return "invalid"
+    if mismatch:
+        return "mismatch"
+    return "pass" if checked else "not_applicable"
+
+
 def compare_confidence_policy(
     fixture_id: str,
     fixture_rel: str,
@@ -537,6 +687,130 @@ def compare_confidence_policy(
     return "pass" if checked else "not_applicable"
 
 
+def compare_export_goldens(
+    fixture_id: str,
+    fixture_dir: Path,
+    fixture_rel: str,
+    elements: List[Any],
+    diagnostics: List[Dict[str, Any]],
+) -> Dict[str, str]:
+    return {
+        "text": compare_export_file(
+            fixture_id,
+            fixture_dir / TEXT_EXPORT,
+            f"{fixture_rel}/{TEXT_EXPORT}",
+            render_text_export(elements),
+            "text export",
+            diagnostics,
+        ),
+        "markdown": compare_export_file(
+            fixture_id,
+            fixture_dir / MARKDOWN_EXPORT,
+            f"{fixture_rel}/{MARKDOWN_EXPORT}",
+            render_markdown_export(fixture_id, fixture_rel, elements, diagnostics),
+            "Markdown export",
+            diagnostics,
+        ),
+    }
+
+
+def compare_export_file(
+    fixture_id: str,
+    path: Path,
+    display_path: str,
+    expected: Optional[str],
+    label: str,
+    diagnostics: List[Dict[str, Any]],
+) -> str:
+    if expected is None:
+        return "invalid"
+    try:
+        actual = path.read_bytes()
+    except FileNotFoundError:
+        diagnostics.append(
+            diagnostic(
+                "missing_file",
+                fixture_id,
+                f"{path.name} is missing",
+                display_path,
+            )
+        )
+        return "missing"
+    try:
+        actual_text = actual.decode("utf-8")
+    except UnicodeDecodeError as exc:
+        diagnostics.append(
+            diagnostic(
+                "invalid_export",
+                fixture_id,
+                f"{path.name} must be UTF-8 text: {exc.reason}",
+                display_path,
+            )
+        )
+        return "invalid"
+    if actual_text != expected:
+        diagnostics.append(
+            diagnostic(
+                "export_golden_mismatch",
+                fixture_id,
+                f"{path.name} does not match {label} rendered from layout.json",
+                display_path,
+                expected=expected,
+                actual=actual_text,
+            )
+        )
+        return "mismatch"
+    return "pass"
+
+
+def render_text_export(elements: List[Any]) -> Optional[str]:
+    text_blocks = []
+    for element in elements:
+        if not isinstance(element, dict):
+            return None
+        text = element.get("text")
+        if not isinstance(text, str):
+            return None
+        text_blocks.append(text)
+    return "\n\n".join(text_blocks) + "\n"
+
+
+def render_markdown_export(
+    fixture_id: str,
+    fixture_rel: str,
+    elements: List[Any],
+    diagnostics: List[Dict[str, Any]],
+) -> Optional[str]:
+    blocks = []
+    invalid = False
+    for element_index, element in enumerate(elements):
+        if not isinstance(element, dict):
+            return None
+        text = element.get("text")
+        if not isinstance(text, str):
+            return None
+        if element.get("type") == "heading":
+            level = element.get("heading_level", 1)
+            if not isinstance(level, int):
+                diagnostics.append(
+                    diagnostic(
+                        "invalid_layout",
+                        fixture_id,
+                        f"layout heading element {element_index} heading_level must be an integer",
+                        f"{fixture_rel}/layout.json",
+                    )
+                )
+                invalid = True
+                level = 1
+            level = min(max(level, 1), 6)
+            blocks.append(f"{'#' * level} {text}")
+        else:
+            blocks.append(text)
+    if invalid:
+        return None
+    return "\n\n".join(blocks) + "\n"
+
+
 def compare_subset_expectations(
     fixture_id: str,
     fixture_rel: str,