Catch semantic breaking changes in dbt metrics before they land in production.
dbt-semguard is a CLI-first semantic change detector for dbt Semantic Layer definitions. It compares two versions of the semantic contract, classifies changes as breaking, risky, or safe, and renders local or GitHub-friendly output without requiring warehouse access or dbt runtime internals.
dbt-semguard is a semantic PR guard for dbt metrics and semantic models.
It answers one question:
What changed in the meaning of this metric?
That matters because many dbt changes are valid from a parser or build point of view, but still dangerous for downstream consumers.
For example, a PR may:
- change
gross_revenuefromsum(order_total)toavg(order_total) - remove a dimension people use to slice a KPI
- change a ratio metric denominator
- widen or narrow a metric filter
- change entity or time-grain semantics
In all of those cases, dbt may still parse successfully and CI may still be green. But the business meaning of the metric has changed, and dashboards, notebooks, reverse ETL jobs, or APIs may silently start returning different answers.
dbt-semguard exists to catch that class of change before it reaches production.
dbt-semguard does not lint YAML style and it does not validate warehouse execution.
Instead, it:
- reads the dbt Semantic Layer definition from two inputs
- extracts only the semantic parts that affect meaning
- builds a canonical contract for each side
- diffs those contracts
- classifies each change as
breaking,risky, orsafe - renders the result for local CLI use or GitHub Actions
In practical terms, it helps teams review semantic changes the same way they already review code changes.
The tool reduces dbt semantic definitions into a normalized contract that is easier to compare than raw YAML.
It keeps fields that affect meaning, such as:
- semantic model identity
- backing model name
- entities and entity types
- dimensions and time granularity
- measures and measure expressions
- metric type
- aggregation and expression
- filters
- ratio numerator and denominator
It intentionally ignores noise such as:
- descriptions
- docs blocks
- YAML ordering
- whitespace and comments
That means the output is focused on semantic drift, not formatting drift.
python -m pip install dbt-semguarddbt-semguard requires Python 3.11 or newer.
python -m pip install "git+https://github.com/yeaight7/dbt-semguard.git@v0.5.4"Use the GitHub install path when you need to pin directly to a repository tag.
git clone https://github.com/yeaight7/dbt-semguard.git
cd dbt-semguard
python -m pip install .Use this when you want to sanity-check semantic changes while you are still developing:
semguard diff --base-ref main --head-ref HEAD --project-dir .
semguard check --base-ref main --head-ref HEAD --project-dir . --fail-on breakingTypical use:
diffwhen you want to inspect what changedcheckwhen you want a blocking exit code for automation or local scripts
For monorepos, always point --project-dir at the dbt project root you want to analyze:
semguard diff --base-ref main --head-ref HEAD --project-dir analytics/dbtGit ref mode and local YAML mode now both scope discovery to this directory.
Use this when you want to compare two precomputed semantic contracts:
semguard diff --base-contract base-contract.json --head-contract head-contract.json --format markdownUse this when your workflow already has dbt semantic_manifest.json artifacts available:
semguard diff --base-manifest base-semantic-manifest.json --head-manifest head-semantic-manifest.json --format jsonUse this when you want a stable machine-readable snapshot of semantic meaning:
semguard extract --source yaml --project-dir examples/ecommerce_dbt_project --output base-contract.json
semguard extract --source manifest --manifest semantic_manifest.json --output manifest-contract.jsonCreate .semguard.yml in your dbt project root to control which YAML files are scanned:
include:
- models/**/*.yml
- models/**/*.yaml
- metrics/**/*.yml
- metrics/**/*.yaml
- semantic_models/**/*.yml
- semantic_models/**/*.yaml
exclude:
- target/**
- dbt_packages/**
- .venv/**
- .github/**If the file is not present, these defaults are applied automatically.
- A developer changes a metric or semantic model in dbt.
dbt-semguard diffcompares the base branch and the current branch.- The tool reports semantic changes only.
- The team decides whether the change is acceptable, needs migration planning, or should be blocked.
- In CI,
semguard check --fail-on breakingcan fail the PR automatically.
breaking: the semantic meaning changed in a way that should usually block by defaultrisky: the change may be legitimate, but downstream consumers should review itsafe: cosmetic-only changes that do not appear in the semantic diff
diff and check emit one of:
textmarkdownjson
JSON reports contain:
summaryhighest_severityblockingchangesmetadata
## dbt-semguard report
### Breaking changes
#### Metric `gross_revenue`
- Metric `gross_revenue` changed aggregation from `sum` to `avg`.
Status: blocking{
"summary": {
"breaking": 3,
"risky": 1,
"safe": 0
},
"highest_severity": "breaking",
"blocking": true
}dbt-semguard currently covers the highest-value semantic changes in the latest dbt Semantic Layer spec.
Covered extractors and inputs:
- Latest-spec YAML projects
- Legacy top-level
semantic_models/metricsYAML projects - Explicit dbt
semantic_manifest.jsoninput - Canonical contract JSON emitted by
semguard extract
Covered semantic comparisons:
- Semantic model add/remove and backing model changes
- Semantic model default aggregation time dimension changes
- Entity add/remove, type changes, and expression changes
- Dimension add/remove, type changes, expression changes, and time granularity changes
- Measure add/remove, aggregation, expression, aggregation-time, and non-additive changes
- Simple metric aggregation, expression, label, filter, ownership, aggregation-time, and non-additive changes
- Ratio metric numerator and denominator changes
- Derived metric expression and input metric changes
- Cumulative metric input, window, grain-to-date, and period-aggregation changes
- Conversion metric entity, calculation, base metric, conversion metric, and constant-property changes
- Additive changes such as new entities, new dimensions, new measures, and new metrics
Current automated coverage:
- YAML extraction for the latest spec
- Manifest normalization
- Semantic diff severity mapping for breaking and risky changes
- Declarative field-coverage policy so contract fields are explicitly diffed, nested, or intentionally excluded
- Source diagnostics in extracted YAML contracts and change reports
- CLI
extract,diff, andcheck - Sticky PR comment delivery through the GitHub Action
- Checkout-free git ref mode
- Pre-release local action smoke coverage in CI, plus post-release published action smoke coverage in both git-ref and manifest modes, including spaced manifest paths
Known v0.5.4 limitations are intentionally narrow:
- There is no allowlist for intentional semantic changes yet.
- Manifest parsing expects dbt
semantic_manifest.json, not the general-purpose dbtmanifest.jsonartifact. - Legacy YAML support covers top-level
semantic_models,measures, andtype_params, but cross-project ref semantics are still normalized conservatively into the singlemodel_namecontract field. - Rename handling is intentionally conservative: a rename is treated as a removal plus an addition.
- Source diagnostics are best-effort and currently strongest for YAML extraction; manifest-derived contracts may still lack file/line detail.
- GitHub integration supports sticky PR comments and inline annotations for pull_request workflows, but does not yet manage review-thread lifecycles.
Use the included composite action from this repository:
jobs:
semguard:
runs-on: ubuntu-latest
permissions:
contents: read
issues: write
pull-requests: read
checks: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: yeaight7/dbt-semguard@v0.5.4
id: semguard
with:
base-ref: ${{ github.event.pull_request.base.sha }}
head-ref: ${{ github.sha }}
fail-on: breaking
pr-comment: true
pr-comment-mode: sticky
github-token: ${{ github.token }}
- name: Inspect semguard outputs
run: |
echo "Highest severity: ${{ steps.semguard.outputs.highest-severity }}"
echo "Blocking: ${{ steps.semguard.outputs.blocking }}"The action now exposes structured outputs so downstream CI can branch on semantic severity without reparsing JSON:
steps.semguard.outputs.highest-severitysteps.semguard.outputs.blockingsteps.semguard.outputs.breaking-countsteps.semguard.outputs.risky-countsteps.semguard.outputs.safe-count
pr-comment-mode accepts:
sticky: update the previous dbt-semguard PR comment when one already existscreate: always publish a new PR comment instead of updating the previous one
The action writes:
- a Markdown summary to the workflow summary
- a JSON artifact named
semguard-report - structured step outputs for severity and counts
- an optional sticky PR comment when
pr-comment: true - inline check-run annotations when source diagnostics are available
- a failing status when the configured threshold is reached
The action requires Python 3.11 or newer. GitHub API calls for PR comments and annotations use a 30-second timeout so stalled API responses do not hold CI indefinitely.
When there are zero semantic changes, the Markdown artifact and workflow summary explicitly include No semantic changes detected. followed by Status: passing.
This is the recommended setup when you want the semantic review to happen automatically on every PR.
If you enable pr-comment: true, the workflow needs:
contents: readissues: writepull-requests: readchecks: write
Missing checks: write can prevent inline annotations and check runs from appearing even when the semantic diff succeeds.
For forked pull requests, the standard pull_request event usually does not get a write-capable GITHUB_TOKEN, so sticky PR comments and check-run annotations may be unavailable unless you adopt a separate trusted workflow pattern.
Common CI and configuration issues are covered in docs/troubleshooting.md.
- Severity handling now uses an internal enum while preserving the same JSON strings (
breaking,risky,safe). - SQL filter diffs preserve case and quote semantics while still ignoring insignificant operator spacing.
- GitHub workflow examples now scope write access to PR comments and check annotations only.
- Extractor internals are split into YAML, manifest, and normalization modules behind the same public facade.
- Native measure diffing, sub-day granularity severity, 30-second GitHub API timeouts, and git ref validation are included in the release surface.
- Git ref extraction now scopes strictly to
--project-dirfor monorepos. - YAML discovery now uses safe default include/exclude patterns.
- Optional
.semguard.ymlinclude/exclude rules are applied in both local and git-ref YAML extraction. - Invalid semantic YAML now raises user-facing errors with source context instead of raw
KeyErrortracebacks. - Composite action shell steps now read user-controlled values from environment variables instead of embedding GitHub expressions directly in Bash.
- Composite action now generates JSON, Markdown, summary text, and step outputs in a single pass before enforcing the blocking threshold.
- Composite action report files now live in an isolated runner temp directory derived from
artifact-name, which avoids workspace filename collisions in matrix-style CI jobs. - The repository now documents security reporting, contribution setup, and common action troubleshooting paths.
An example latest-spec dbt project lives in examples/ecommerce_dbt_project.
- Contract spec
- How to use and explain dbt-semguard
- Severity rules
- Troubleshooting
- Roadmap
- Changelog
- Contributing
- Security policy
This project is open source under the MIT License. See LICENSE.