cosai-oasis · davidlabianca · May 27, 2026 · May 28, 2026 · May 29, 2026 · May 29, 2026
diff --git a/docs/adr/017-yaml-prose-authoring-subset.md b/docs/adr/017-yaml-prose-authoring-subset.md
@@ -38,6 +38,8 @@ Authors may use exactly these forms in any prose field. Everything else is rejec
 
 Bold and italic may compose (`**emphatically *not* this**` is valid). Sentinels are atomic identifier tokens; they do not nest into bold or italic. An author who wants the rendered title to appear bold relies on the renderer's stylesheet, not on wrapping `**` around `{{<entity-id>}}`.
 
+The canonical mechanism the lint uses to enforce the "no nested same-family emphasis" rule and the "no emphasis-wrapped sentinel" rule is the depth-counter bracket-matching pass specified in [ADR-028](028-prose-linter-bracket-matching-architecture.md) D5.
+
 The subset operates on the string contents of each prose paragraph. Paragraph and hard-break shape is carried by the YAML *array* structure ([ADR-011](011-persona-site-data-schema-contract.md) `definitions/prose`); prose strings are not list-bearing.
 
 ### D2. Disallowed by construction
@@ -106,6 +108,8 @@ The two hooks have overlapping rejection sets (raw `<a>` blocked by both, bare c
 
 If ADR-016 lands its hook before this one, the bare-camelCase and raw-`<a>` checks live in `validate_prose_references.py` until ADR-017's hook ships and the shared tokenizer is extracted. The end state is the two-hook-shared-tokenizer split above.
 
+The shared tokenizer's Token contract — the `Token` NamedTuple structure (including the emphasis-shape field), the `TokenKind` enumeration, the tokenizer's emission invariants, and the consumer surface — is formally specified in [ADR-028](028-prose-linter-bracket-matching-architecture.md) D1-D4. ADR-017 owns the grammar's authoring rules; ADR-028 owns the contract every consumer of the shared tokenizer reads.
+
 ### D6. Redistribution contract surface
 
 Per [ADR-014](014-yaml-content-security-posture.md) P5, the framework guarantees "shape via schemas" to downstream consumers. ADR-017 is the canonical statement of "content within prose strings." After the conformance sweep closes, the contract becomes strictly: **YAML prose contains no URLs at all.** Every URL lives in a structured `externalReferences` entry (ADR-016) and is referenced from prose by sentinel. This is a stronger guarantee than "YAML prose contains some URLs you must sanitize" — a third-party redistributor parsing the YAML knows that any URL it ingests came through a typed, schema-validated structured field.

diff --git a/docs/adr/028-prose-linter-bracket-matching-architecture.md b/docs/adr/028-prose-linter-bracket-matching-architecture.md
diff --git a/docs/adr/README.md b/docs/adr/README.md
@@ -43,6 +43,7 @@ If a decision is about *how the Risk Map content model is shaped*, it belongs in
 | [025](025-testing-strategy.md) | Testing strategy and posture across Python, site JS, schemas, and infrastructure | Accepted | 2026-05-05 |
 | [026](026-issue-template-domain.md) | Issue-template domain — generator scope, schema-derived enums, and ADR-content alignment contract | Accepted | 2026-05-20 |
 | [027](027-framework-versioning-and-mapping-convention.md) | Per-mapping framework version pinning | Accepted | 2026-05-22 |
+| [028](028-prose-linter-bracket-matching-architecture.md) | Prose-linter emphasis enforcement via bracket-matching depth pass over a Token-shape contract | Accepted | 2026-05-28 |
 
 ## Conventions
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -2,6 +2,7 @@
 pythonpath = ["scripts/hooks"]
 markers = [
     "slow: marks tests as slow (deselect with '-m \"not slow\"')",
+    "live_corpus: marks tests that read the live risk-map YAML corpus (deselect with '-m \"not live_corpus\"')",
 ]
 
 [tool.coverage.run]

diff --git a/scripts/hooks/precommit/_prose_tokens.py b/scripts/hooks/precommit/_prose_tokens.py
@@ -38,7 +38,7 @@
 
 import re
 from enum import Enum
-from typing import NamedTuple
+from typing import Literal, NamedTuple
 
 
 class TokenKind(Enum):
@@ -89,10 +89,15 @@ class Token(NamedTuple):
     Attributes:
         kind:  The token's classification (accepting or rejecting).
         value: The exact substring from the input that this token covers.
+        shape: Emphasis classification per ADR-028 D3. 'neutral' for every
+               non-emphasis token. One of 'complete', 'open', 'close', 'neutral'.
+               Default 'neutral' so two-positional construction Token(kind, value)
+               still compiles and yields a token with shape='neutral'.
     """
 
     kind: TokenKind
     value: str
+    shape: Literal["complete", "open", "close", "neutral"] = "neutral"
 
 
 # ---------------------------------------------------------------------------
@@ -177,15 +182,67 @@ class Token(NamedTuple):
 _RE_ITALIC_ASTERISK = re.compile(r"\*(.+?)\*", re.DOTALL)
 
 # Italic underscore: _..._ — single underscore only; __ is NOT italic (ADR-017 D1).
-# Lookahead/lookbehind prevent matching when adjacent to another underscore.
-_RE_ITALIC_UNDERSCORE = re.compile(r"(?<![_])_(?![_])(.+?)(?<![_])_(?![_])")
+# ADR-028 D3 invariant 3: intraword \S_\S does NOT qualify as an italic delimiter.
+# Requirements (combined):
+#   - Opening _: preceded by whitespace or start-of-string (left-flank)
+#   - Opening _: NOT adjacent to another _ (no __)
+#   - Opening _: followed by non-whitespace (interior must start immediately)
+#   - Closing _: preceded by non-whitespace
+#   - Closing _: NOT adjacent to another _ (no __)
+#   - Closing _: followed by whitespace or end-of-string (right-flank)
+# The (?=\S) after the open and (?<=\S) before the close structurally guarantee
+# non-whitespace at both interior edges, so _classify_emphasis_shape(span, "_")
+# always returns "complete" — there is no open/close-shape underscore-italic token.
+_RE_ITALIC_UNDERSCORE = re.compile(r"(?:^|(?<=\s))(?<![_])_(?![_])(?=\S)(.+?)(?<=\S)(?<![_])_(?![_])(?=\s|$)")
 
 # Bare camelCase entity-prefix identifier: (risk|control|component|persona) immediately
 # followed by a capital letter, then the rest of the identifier word.
 # This fires only on plain prose; the sentinel branch consumes it first when inside {{}}.
 _RE_BARE_CAMELCASE = re.compile(r"(risk|control|component|persona)([A-Z]\w*)")
 
 
+# ---------------------------------------------------------------------------
+# Emphasis-shape classifier (ADR-028 D3)
+# ---------------------------------------------------------------------------
+
+
+def _classify_emphasis_shape(span: str, delim: str) -> Literal["complete", "open", "close", "neutral"]:
+    """Classify the emphasis shape of a matched span by examining interior edge whitespace.
+
+    The tokenizer calls this at emission time for BOLD, ITALIC-asterisk, and
+    ITALIC-underscore tokens.  The shape drives the depth-counter walk in the
+    prose-subset linter (ADR-028 D5).
+
+    Rules (ADR-028 D3 table):
+      - Both edges whitespace  -> 'open'  (convention: leading test fires first)
+      - Trailing whitespace only -> 'open'  (greedy close on intended inner open)
+      - Leading whitespace only  -> 'close' (trailing half of an early-closed match)
+      - Neither edge whitespace  -> 'complete' (well-formed span)
+      - Empty interior           -> 'neutral' (defensive; not emitted in practice)
+
+    Uses str.isspace() — no regex (ADR-028 D-Open-6).
+
+    Args:
+        span:  The full matched span including delimiters (e.g. '**foo **').
+        delim: The delimiter string ('**', '*', or '_').
+
+    Returns:
+        One of 'complete', 'open', 'close', 'neutral'.
+    """
+    interior = span[len(delim) : -len(delim)]
+    if not interior:
+        return "neutral"
+    leading_ws = interior[0].isspace()
+    trailing_ws = interior[-1].isspace()
+    if leading_ws and trailing_ws:
+        return "open"
+    if leading_ws:
+        return "close"
+    if trailing_ws:
+        return "open"
+    return "complete"
+
+
 # ---------------------------------------------------------------------------
 # Sentinel helper
 # ---------------------------------------------------------------------------
@@ -263,7 +320,7 @@ def tokenize(text: str) -> list[Token]:
     The `text` argument is expected to be a single prose field value as
     decoded by PyYAML — not raw YAML, not a file path.
 
-    Test fixtures for all 42 grammar cases live at:
+    Test fixtures live at:
         scripts/hooks/tests/fixtures/prose_subset/
 
     Args:
@@ -286,11 +343,11 @@ def flush_text(end: int) -> None:
             tokens.append(Token(TokenKind.TEXT, text[pending_text_start:end]))
         pending_text_start = -1
 
-    def emit(kind: TokenKind, value: str) -> None:
-        """Flush any pending TEXT, then emit the given token."""
+    def emit(kind: TokenKind, value: str, *, shape: str = "neutral") -> None:
+        """Flush any pending TEXT, then emit the given token with the given shape."""
         nonlocal i
         flush_text(i)
-        tokens.append(Token(kind, value))
+        tokens.append(Token(kind, value, shape))
         i += len(value)
 
     def at_line_start() -> bool:
@@ -422,21 +479,21 @@ def at_line_start() -> bool:
         if ch == "*" and i + 1 < len(text) and text[i + 1] == "*":
             m = _RE_BOLD.match(text, i)
             if m:
-                emit(TokenKind.BOLD, m.group())
+                emit(TokenKind.BOLD, m.group(), shape=_classify_emphasis_shape(m.group(), "**"))
                 continue
 
         # --- Rule 13: Italic asterisk *...* ---
         if ch == "*":
             m = _RE_ITALIC_ASTERISK.match(text, i)
             if m:
-                emit(TokenKind.ITALIC, m.group())
+                emit(TokenKind.ITALIC, m.group(), shape=_classify_emphasis_shape(m.group(), "*"))
                 continue
 
         # --- Rule 14: Italic underscore _..._ (single underscore only) ---
         if ch == "_":
             m = _RE_ITALIC_UNDERSCORE.match(text, i)
             if m:
-                emit(TokenKind.ITALIC, m.group())
+                emit(TokenKind.ITALIC, m.group(), shape=_classify_emphasis_shape(m.group(), "_"))
                 continue
 
         # --- Rule 15: Bare camelCase entity-prefix identifier ---

diff --git a/scripts/hooks/precommit/validate_yaml_prose_subset.py b/scripts/hooks/precommit/validate_yaml_prose_subset.py
@@ -30,7 +30,20 @@
 
 from precommit._linter_types import Diagnostic, ProseField, format_diagnostic_line  # noqa: E402
 from precommit._prose_fields import find_prose_fields  # noqa: E402
-from precommit._prose_tokens import TokenKind  # noqa: E402
+from precommit._prose_tokens import (  # noqa: E402
+    _RE_SENTINEL_INTRA_INNER,
+    _RE_SENTINEL_REF_INNER,
+    TokenKind,
+)
+
+# Deliberate cross-module coupling: _RE_SENTINEL_INTRA_INNER and
+# _RE_SENTINEL_REF_INNER are internal to _prose_tokens (leading-underscore per
+# ADR-028 D4).  The wrapped-sentinel predicate (ADR-028 D5) reuses them directly
+# so the linter's notion of a "sentinel" cannot drift from the tokenizer's own
+# classification.  They are NOT promoted to public constants — ADR-028 D4 fixes
+# the public surface of _prose_tokens at exactly Token, TokenKind, and tokenize();
+# a consumer importing these _RE_* names accepts the reorganization-coupling risk
+# that D4 describes.
 
 # Re-export so callers can import ProseField and Diagnostic from this module
 # (the test suite imports both from here, not from _linter_types).
@@ -87,6 +100,69 @@
     "If you added a new INVALID_* kind to _REJECTED_KINDS, add its reason to _REASONS too."
 )
 
+# Reason strings for emphasis violations (ADR-028 D6). These are stable
+# constants; any change requires a D6 amendment.
+_REASON_NESTED_EMPHASIS = "nested emphasis"
+_REASON_EMPHASIS_WRAPPED_SENTINEL = "emphasis-wrapped sentinel"
+
+# The two emphasis token kinds; used in the depth-counter walk (ADR-028 D5).
+_EMPHASIS_KINDS: frozenset[TokenKind] = frozenset({TokenKind.BOLD, TokenKind.ITALIC})
+
+
+def _is_emphasis_wrapped_sentinel(token_value: str, delim: str) -> bool:
+    """Return True if the emphasis token wraps exactly one sentinel.
+
+    Strips the emphasis delimiter pair from token_value, .strip()s whitespace,
+    then checks whether the result is a `{{ }}` span whose inner content
+    fullmatches either the intra-doc or ref sentinel inner regex.
+
+    This mirrors how _match_sentinel classifies sentinels: outer {{ }} are
+    stripped first, then the inner content is matched against the patterns.
+
+    Args:
+        token_value: The full emphasis token value including delimiters.
+        delim:       The delimiter string ('**', '*', or '_').
+
+    Returns:
+        True if the stripped interior is a well-formed sentinel.
+    """
+    interior = token_value[len(delim) : -len(delim)].strip()
+    # Interior must be wrapped in {{ }} to be a sentinel form.
+    if not (interior.startswith("{{") and interior.endswith("}}")):
+        return False
+    inner = interior[2:-2]
+    return bool(_RE_SENTINEL_INTRA_INNER.fullmatch(inner) or _RE_SENTINEL_REF_INNER.fullmatch(inner))
+
+
+def _delim_for_token(token_value: str) -> str:
+    """Return the delimiter prefix for an emphasis token value.
+
+    Inspects the leading characters to distinguish '**' (BOLD) from '*' (ITALIC
+    asterisk) from '_' (ITALIC underscore). Called only on BOLD/ITALIC tokens,
+    whose values always start with one of those delimiters.
+
+    Args:
+        token_value: The full token value string (a BOLD or ITALIC token).
+
+    Returns:
+        The delimiter string: '**', '*', or '_'.
+
+    Raises:
+        ValueError: if token_value does not start with '**', '*', or '_'. The
+            helper fails loud rather than guessing a delimiter, so a future
+            emphasis kind that reaches it with an unhandled delimiter surfaces
+            immediately instead of silently mis-slicing the token interior.
+    """
+    if token_value.startswith("**"):
+        return "**"
+    if token_value.startswith("*"):
+        return "*"
+    if token_value.startswith("_"):
+        return "_"
+    raise ValueError(
+        f"_delim_for_token expects a BOLD/ITALIC token value starting with '**', '*', or '_'; got {token_value!r}"
+    )
+
 
 def check_prose_field(field: ProseField) -> list[Diagnostic]:
     """Check one ProseField against the ADR-017 D4 grammar rejection rules.
@@ -96,21 +172,18 @@ def check_prose_field(field: ProseField) -> list[Diagnostic]:
     — ADR-017 D4 rule 5 delegates bare-camelCase rejection to
     validate_prose_references.
 
+    Also runs the ADR-028 D5 depth-counter emphasis-rejection walk, emitting
+    diagnostics for nested emphasis and emphasis-wrapped sentinels.
+
     Args:
         field: A ProseField with tokens already populated by tokenize().
 
     Returns:
         List of Diagnostic objects (empty if the field is clean).
     """
     diagnostics: list[Diagnostic] = []
-    for token in field.tokens:
-        if token.kind not in _REJECTED_KINDS:
-            continue
-        base_reason = _REASONS[token.kind]
-        # ADR-017 D4: append the offending token value as a snippet for context.
-        # Only append when token.value is non-empty (tokenizer guarantees this,
-        # but guard defensively to avoid "at ''" in edge cases).
-        reason = f"{base_reason} at {token.value!r}" if token.value else base_reason
+
+    def _emit_diag(reason: str) -> None:
         diagnostics.append(
             Diagnostic(
                 hook_id=_HOOK_ID,
@@ -122,6 +195,53 @@ def check_prose_field(field: ProseField) -> list[Diagnostic]:
                 nested_index=field.nested_index,
             )
         )
+
+    # --- INVALID_* token rejection (ADR-017 D4) ---
+    for token in field.tokens:
+        if token.kind not in _REJECTED_KINDS:
+            continue
+        base_reason = _REASONS[token.kind]
+        # ADR-017 D4: append the offending token value as a snippet for context.
+        # Only append when token.value is non-empty (tokenizer guarantees this,
+        # but guard defensively to avoid "at ''" in edge cases).
+        reason = f"{base_reason} at {token.value!r}" if token.value else base_reason
+        _emit_diag(reason)
+
+    # --- ADR-028 D5 depth-counter emphasis walk ---
+    # Single pass over the token stream with a bare integer depth counter.
+    # Emphasis tokens with shape='open' increment depth; 'close' decrements.
+    # Any emphasis token arriving at depth > 0 is a nested-emphasis violation.
+    # The wrapped-sentinel predicate is independent of depth state.
+    depth = 0
+    for token in field.tokens:
+        if token.kind not in _EMPHASIS_KINDS:
+            continue
+
+        # Nested-emphasis predicate (ADR-028 D5).
+        if token.shape == "open":
+            if depth > 0:
+                _emit_diag(f"{_REASON_NESTED_EMPHASIS} at {token.value!r}")
+            depth += 1
+        elif token.shape == "close":
+            # Check before decrementing: the close token is the one arriving
+            # at depth > 0 in the canonical [open, text, close] stream
+            # (e.g. **foo **nested** bar**), so it is the attribution point for
+            # the single nested-emphasis diagnostic. ADR-028 D5 (as amended
+            # 2026-05-29) emits in the close branch when depth > 0, before the
+            # decrement.
+            if depth > 0:
+                _emit_diag(f"{_REASON_NESTED_EMPHASIS} at {token.value!r}")
+            depth = max(0, depth - 1)
+        elif token.shape == "complete":
+            if depth > 0:
+                _emit_diag(f"{_REASON_NESTED_EMPHASIS} at {token.value!r}")
+            # complete = open + close, net depth change 0
+
+        # Emphasis-wrapped-sentinel predicate (independent of depth state).
+        delim = _delim_for_token(token.value)
+        if _is_emphasis_wrapped_sentinel(token.value, delim):
+            _emit_diag(f"{_REASON_EMPHASIS_WRAPPED_SENTINEL} at {token.value!r}")
+
     return diagnostics
 
 

diff --git a/...ts/fixtures/prose_subset/accepting/emphasis_shapes/bold_both_edges_whitespace.tokens.json b/...ts/fixtures/prose_subset/accepting/emphasis_shapes/bold_both_edges_whitespace.tokens.json
@@ -0,0 +1 @@
+[{"kind": "BOLD", "value": "** foo **"}]
diff --git a/...ooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_both_edges_whitespace.txt b/...ooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_both_edges_whitespace.txt
@@ -0,0 +1 @@
+** foo **
diff --git a/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_close.tokens.json b/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_close.tokens.json
@@ -0,0 +1 @@
+[{"kind": "BOLD", "value": "** bar**"}]
diff --git a/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_close.txt b/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_close.txt
@@ -0,0 +1 @@
+** bar**
diff --git a/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_open.tokens.json b/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_open.tokens.json
@@ -0,0 +1 @@
+[{"kind": "BOLD", "value": "**foo **"}]
diff --git a/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_open.txt b/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_open.txt
@@ -0,0 +1 @@
+**foo **
diff --git a/...ts/fixtures/prose_subset/accepting/emphasis_shapes/bold_wraps_sentinel_nested.tokens.json b/...ts/fixtures/prose_subset/accepting/emphasis_shapes/bold_wraps_sentinel_nested.tokens.json
@@ -0,0 +1 @@
+[{"kind": "BOLD", "value": "**x **"}, {"kind": "TEXT", "value": "y"}, {"kind": "BOLD", "value": "**{{ref:x}}**"}]
diff --git a/...ooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_wraps_sentinel_nested.txt b/...ooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/bold_wraps_sentinel_nested.txt
@@ -0,0 +1 @@
+**x **y**{{ref:x}}**
diff --git a/...ks/tests/fixtures/prose_subset/accepting/emphasis_shapes/italic_asterisk_open.tokens.json b/...ks/tests/fixtures/prose_subset/accepting/emphasis_shapes/italic_asterisk_open.tokens.json
@@ -0,0 +1 @@
+[{"kind": "ITALIC", "value": "*foo *"}]
diff --git a/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/italic_asterisk_open.txt b/scripts/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/italic_asterisk_open.txt
@@ -0,0 +1 @@
+*foo *
diff --git a/...tests/fixtures/prose_subset/accepting/emphasis_shapes/nested_bold_three_token.tokens.json b/...tests/fixtures/prose_subset/accepting/emphasis_shapes/nested_bold_three_token.tokens.json
@@ -0,0 +1 @@
+[{"kind": "BOLD", "value": "**foo **"}, {"kind": "TEXT", "value": "nested"}, {"kind": "BOLD", "value": "** bar**"}]
diff --git a/...s/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/nested_bold_three_token.txt b/...s/hooks/tests/fixtures/prose_subset/accepting/emphasis_shapes/nested_bold_three_token.txt
@@ -0,0 +1 @@
+**foo **nested** bar**
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		[{"kind": "BOLD", "value": "x "}, {"kind": "TEXT", "value": "y"}, {"kind": "BOLD", "value": "{{ref:x}}"}]
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		[{"kind": "BOLD", "value": "foo "}, {"kind": "TEXT", "value": "nested"}, {"kind": "BOLD", "value": " bar"}]