feat(validate): ontology policy check (#69 Session 2)#97
Merged
stackbilt-admin merged 2 commits intomainfrom Apr 9, 2026
Merged
feat(validate): ontology policy check (#69 Session 2)#97stackbilt-admin merged 2 commits intomainfrom
stackbilt-admin merged 2 commits intomainfrom
Conversation
added 2 commits
April 9, 2026 15:38
…ed-data-access (#69) Session 2 of charter#69. Ships the enforcement half of the typed data access policy: a deterministic commit-time check that loads a data- registry YAML file and flags non-canonical alias usage in the current diff. ## What this lands ### Pure-logic core (@stackbilt/validate) - **`parseOntologyRegistry(yamlText)`** — minimal YAML subset parser tailored to the stackbilt_llc/policies/data-registry.yaml shape. Handles: 2-level nested maps, inline flow-sequences, # comments, blank lines, bare string values, table: null for derived concepts. No external dependencies (keeps validate zero-dep beyond @stackbilt/types). - **`checkOntologyDiff(changedLines, registry)`** — scans each line's identifiers against the registry's canonical + alias indexes. Returns two outputs: informational references (what concepts were touched) and violations (WARN on non-canonical alias usage in new code). - **`extractIdentifiersFromLine`** + **`stripCommentsAndStrings`** — language-agnostic token extraction that strips JS/C `//`, YAML/shell `#`, SQL `--`, and all string literals before tokenizing. Prevents false positives on alias words appearing in comments or user-facing copy (e.g., "usage" in a TODO comment). Guards against stripping URLs (`http://`) and C-style preprocessor directives (`#include`). - **`normalizeToken`** — lowercases, strips underscores/hyphens/spaces so `tenant_id`, `tenantId`, `TENANT-ID`, and `tenant id` all normalize to the same token. - Six sensitivity tiers typed as `OntologySensitivityTier`. ### CLI surface (@stackbilt/cli) - **`charter validate --policy typed-data-access`** — new policy dispatch in validate.ts. When `--policy typed-data-access` is provided, routes to runOntologyPolicyCheck in validate-ontology.ts instead of the default trailer validation. Unknown policies throw CLIError. - **`validate-ontology.ts`** — CLI wrapper: - Loads registry via --registry flag, .charter/config.json ontology.registry, or default path (.charter/data-registry.yaml) - Extracts added lines from `git diff --unified=0 <range>` - Calls the validate package's checker - Formats output in text or JSON (per --format) - Exit codes: 0 on PASS, 0 on WARN without --ci, 1 on WARN+--ci or FAIL - **`.charter/config.json` ontology section** — new optional config field: `{ "ontology": { "registry": "path/to/registry.yaml" } }`. Path is resolved relative to .charter/ directory if not absolute. ### Tests (41 new, 395 total) Covers registry parsing (scalar fields, flow sequences, comments, null tables, malformed input), identifier extraction (comment stripping for JS/#/SQL, string literal stripping for ', ", \`), URL guard, alias detection, canonical vs alias reference counts, dedup within a line, ignoreAliasViolations suppression, and violation metadata propagation. ## Validation ### Local e2e against the real stackbilt_llc registry (21 concepts) Added a handler.ts file with 4 alias usages (credits, tier), 3 canonical usages (quota, user, subscription). The check correctly reported: - 4 non-canonical aliases (credits × 2, tier × 2) - Correctly flagged each with file:line, canonical form, owner service, sensitivity tier - 3 clean canonical references surfaced informationally - Actionable suggestions: "credits → quota, tier → subscription" - False-positive check: the first iteration extracted "usage" from // comments, the stripCommentsAndStrings pass eliminated that noise ### Test suite - 41 new ontology.test.ts unit tests, all passing - 395/395 total charter tests passing (no regressions, was 354) - pnpm run build: clean - pnpm run verify:adf: PASS on all metrics ## Not yet Session 3+ will add: - FAIL-severity detection for direct D1 access to other services' tables - Unregistered-concept heuristic (flag identifiers that look like business terms but aren't in the registry) - Charter doctor integration for standing registry health checks - Auto-sync of the registry across consumer repos ## References - Closes part of #69 (Session 2 of 4) - Session 1 (policy module): charter#96 - Source registry: Stackbilt-dev/stackbilt_llc/policies/data-registry.yaml - Related: codebeast#9 (DATA_AUTHORITY), aegis#344 (disambiguation firewall)
…check Dogfooding Session 2 of charter#69 against charter's own source tree surfaced two classes of false positives: ## JSDoc interior lines Multi-line block comments (/** ... */) weren't stripped because the per-line comment logic only handled //, #, --, and inline /* */. JSDoc continuation lines look like ` * some prose here` and were tokenized as if they were code. Fix: stripCommentsAndStrings now treats lines whose first non-whitespace token is `*` or `/**` as full-line comments. If `*/` appears later on the line, content after it is preserved. Distinct from multiplication because the `*` must be the first non-whitespace character, not an operator between operands. 4 new test cases cover JSDoc interior, JSDoc opener, multiplication preservation (`2 * tier`), and closer-on-same-line edge case. ## Generic alias collisions in programming vocabulary Several registry aliases (token, key, usage, tier, audit) collide with common programming vocabulary. Every charter file that mentions `token` (lexer token, API token) or `key` (map key, lookup key) was getting flagged against the api_key/quota concepts, producing dozens of noise-warnings per file. Fix: new `ontology.ignoreAliases` config field in .charter/config.json. Accepts an array of normalized alias tokens that the check should silently skip (still reports canonical references). Implemented as a per-repo override — the registry stays authoritative ecosystem-wide, but individual repos can opt out of noisy terms for their codebase. Wired into checkOntologyDiff via `options.ignoredAliasTokens: Set<string>`. Refactored runOntologyPolicyCheck to load config once and pass it to resolveRegistryPath (was loading twice). Charter's own .charter/config.json now ignores: token, tokens, key, keys, usage, audit, tier, plan, limit, limits — all programming-vocabulary collisions with the current registry. The ignore list should stay small and case-specific; prefer narrowing the registry's own aliases upstream when a term is globally noisy. ## Validation - Charter main..HEAD dogfood: PASS — 916 added lines across 5 files, 0 violations, 0 false positives - 45/45 ontology tests passing (+4 from JSDoc cases) - 399/399 total charter tests passing (+4 from 395) - Build clean, typecheck clean ## Follow-up signals The dogfood surfaced a real finding worth reporting upstream: the stackbilt_llc data-registry.yaml aliases list is overclaimed. Words like `token`, `key`, `usage`, `tier`, `audit`, `plan`, `limit` are too generic to be reliable aliases — they match ordinary programming vocabulary. A registry cleanup PR in stackbilt_llc would remove these from the aliases lists, making the per-repo ignore list unnecessary for most downstream consumers. Filing as follow-up. Part of #69 (Session 2 tuning).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Session 2 of #69 ships the enforcement half of the typed data access policy:
```
charter validate --policy typed-data-access
```
A deterministic commit-time check that loads a data-registry YAML file, scans the current diff for business-concept references, and flags non-canonical alias usage in new code. Built on the Session 1 policy module (#96) and the canonical registry in stackbilt_llc.
What it does
Given this diff:
```ts
export async function handler(tenantId: string) {
const credits = await checkCredits(tenantId);
const quota = await getQuota(tenantId);
const tier = subscription.tier;
}
```
Charter validate reports:
```
[warn] Ontology policy: WARN
4 non-canonical aliases found in 1 changed file.
Registry: .charter/data-registry.yaml (default) | 21 concepts
Referenced concepts:
Violations (4):
Uses alias 'credits' for concept 'quota' (edge-auth, cross_service_rpc).
Prefer the canonical form in new code.
Uses alias 'tier' for concept 'subscription' (edge-auth, billing_critical).
Suggestions:
```
Architecture
Pure-logic core (@stackbilt/validate)
CLI surface (@stackbilt/cli)
Config (`.charter/config.json`)
```jsonc
{
"ontology": {
"registry": ".charter/data-registry.yaml",
"ignoreAliases": ["token", "key", "usage", "audit"]
}
}
```
Two iterations after initial ship
Dogfooding the first iteration against charter's own source tree surfaced two classes of false positives:
1. JSDoc continuation lines
Multi-line block comments (`/** ... /`) weren't stripped because each line was processed independently. Lines like ` tiers and sensitivity levels` in JSDoc prose were tokenized as code. Fixed by treating lines starting with `*` or `/**` as full-line comments (with multiplication guard).
2. Generic alias collisions
Aliases like `token`, `key`, `usage`, `tier`, `audit` collide with common programming vocabulary. Charter's own codebase uses these words in their programming sense (lexer tokens, map keys, resource usage). Every file containing the word was getting flagged.
Fix: added `ontology.ignoreAliases` per-repo config field. Registry stays authoritative ecosystem-wide, but individual repos can opt out of specific noisy alias tokens without touching the shared source of truth.
Charter's own `.charter/config.json` ignores: `token, tokens, key, keys, usage, audit, tier, plan, limit, limits`.
Validation
Unit tests (45 new, 399 total)
End-to-end dogfood
Full suite
What's NOT in this PR (deferred to Session 3+)
References
Governed-By: #69
🤖 Generated with Claude Code