dbt-semguard

Catch semantic breaking changes in dbt metrics before they land in production.

dbt-semguard is a CLI-first semantic change detector for dbt Semantic Layer definitions. It compares two versions of the semantic contract, classifies changes as breaking, risky, or safe, and renders local or GitHub-friendly output without requiring warehouse access or dbt runtime internals.

What Is This For?

dbt-semguard is a semantic PR guard for dbt metrics and semantic models.

It answers one question:

What changed in the meaning of this metric?

That matters because many dbt changes are valid from a parser or build point of view, but still dangerous for downstream consumers.

For example, a PR may:

change gross_revenue from sum(order_total) to avg(order_total)
remove a dimension people use to slice a KPI
change a ratio metric denominator
widen or narrow a metric filter
change entity or time-grain semantics

In all of those cases, dbt may still parse successfully and CI may still be green. But the business meaning of the metric has changed, and dashboards, notebooks, reverse ETL jobs, or APIs may silently start returning different answers.

dbt-semguard exists to catch that class of change before it reaches production.

What It Does Exactly

dbt-semguard does not lint YAML style and it does not validate warehouse execution.

Instead, it:

reads the dbt Semantic Layer definition from two inputs
extracts only the semantic parts that affect meaning
builds a canonical contract for each side
diffs those contracts
classifies each change as breaking, risky, or safe
renders the result for local CLI use or GitHub Actions

In practical terms, it helps teams review semantic changes the same way they already review code changes.

How It Works

The tool reduces dbt semantic definitions into a normalized contract that is easier to compare than raw YAML.

It keeps fields that affect meaning, such as:

semantic model identity
backing model name
entities and entity types
dimensions and time granularity
measures and measure expressions
metric type
aggregation and expression
filters
ratio numerator and denominator

It intentionally ignores noise such as:

descriptions
docs blocks
YAML ordering
whitespace and comments

That means the output is focused on semantic drift, not formatting drift.

Install From PyPI

python -m pip install dbt-semguard

dbt-semguard requires Python 3.11 or newer.

Install From GitHub

python -m pip install "git+https://github.com/yeaight7/dbt-semguard.git@v0.5.4"

Use the GitHub install path when you need to pin directly to a repository tag.

Install From Source

git clone https://github.com/yeaight7/dbt-semguard.git
cd dbt-semguard
python -m pip install .

How To Use It

Run locally before opening a PR

Use this when you want to sanity-check semantic changes while you are still developing:

semguard diff --base-ref main --head-ref HEAD --project-dir .
semguard check --base-ref main --head-ref HEAD --project-dir . --fail-on breaking

Typical use:

diff when you want to inspect what changed
check when you want a blocking exit code for automation or local scripts

For monorepos, always point --project-dir at the dbt project root you want to analyze:

semguard diff --base-ref main --head-ref HEAD --project-dir analytics/dbt

Git ref mode and local YAML mode now both scope discovery to this directory.

Compare exported contracts directly

Use this when you want to compare two precomputed semantic contracts:

semguard diff --base-contract base-contract.json --head-contract head-contract.json --format markdown

Compare manifests explicitly

Use this when your workflow already has dbt semantic_manifest.json artifacts available:

semguard diff --base-manifest base-semantic-manifest.json --head-manifest head-semantic-manifest.json --format json

Extract a contract

Use this when you want a stable machine-readable snapshot of semantic meaning:

semguard extract --source yaml --project-dir examples/ecommerce_dbt_project --output base-contract.json
semguard extract --source manifest --manifest semantic_manifest.json --output manifest-contract.json

Configure YAML discovery with `.semguard.yml`

Create .semguard.yml in your dbt project root to control which YAML files are scanned:

include:
  - models/**/*.yml
  - models/**/*.yaml
  - metrics/**/*.yml
  - metrics/**/*.yaml
  - semantic_models/**/*.yml
  - semantic_models/**/*.yaml
exclude:
  - target/**
  - dbt_packages/**
  - .venv/**
  - .github/**

If the file is not present, these defaults are applied automatically.

Example Review Flow

A developer changes a metric or semantic model in dbt.
dbt-semguard diff compares the base branch and the current branch.
The tool reports semantic changes only.
The team decides whether the change is acceptable, needs migration planning, or should be blocked.
In CI, semguard check --fail-on breaking can fail the PR automatically.

How To Read The Result

breaking: the semantic meaning changed in a way that should usually block by default
risky: the change may be legitimate, but downstream consumers should review it
safe: cosmetic-only changes that do not appear in the semantic diff

Output

diff and check emit one of:

text
markdown
json

JSON reports contain:

summary
highest_severity
blocking
changes
metadata

Example Markdown report

## dbt-semguard report

### Breaking changes
#### Metric `gross_revenue`
- Metric `gross_revenue` changed aggregation from `sum` to `avg`.

Status: blocking

Example JSON report

{
  "summary": {
    "breaking": 3,
    "risky": 1,
    "safe": 0
  },
  "highest_severity": "breaking",
  "blocking": true
}

Coverage

dbt-semguard currently covers the highest-value semantic changes in the latest dbt Semantic Layer spec.

Covered extractors and inputs:

Latest-spec YAML projects
Legacy top-level semantic_models / metrics YAML projects
Explicit dbt semantic_manifest.json input
Canonical contract JSON emitted by semguard extract

Covered semantic comparisons:

Semantic model add/remove and backing model changes
Semantic model default aggregation time dimension changes
Entity add/remove, type changes, and expression changes
Dimension add/remove, type changes, expression changes, and time granularity changes
Measure add/remove, aggregation, expression, aggregation-time, and non-additive changes
Simple metric aggregation, expression, label, filter, ownership, aggregation-time, and non-additive changes
Ratio metric numerator and denominator changes
Derived metric expression and input metric changes
Cumulative metric input, window, grain-to-date, and period-aggregation changes
Conversion metric entity, calculation, base metric, conversion metric, and constant-property changes
Additive changes such as new entities, new dimensions, new measures, and new metrics

Current automated coverage:

YAML extraction for the latest spec
Manifest normalization
Semantic diff severity mapping for breaking and risky changes
Declarative field-coverage policy so contract fields are explicitly diffed, nested, or intentionally excluded
Source diagnostics in extracted YAML contracts and change reports
CLI extract, diff, and check
Sticky PR comment delivery through the GitHub Action
Checkout-free git ref mode
Pre-release local action smoke coverage in CI, plus post-release published action smoke coverage in both git-ref and manifest modes, including spaced manifest paths

Current Limitations

Known v0.5.4 limitations are intentionally narrow:

There is no allowlist for intentional semantic changes yet.
Manifest parsing expects dbt semantic_manifest.json, not the general-purpose dbt manifest.json artifact.
Legacy YAML support covers top-level semantic_models, measures, and type_params, but cross-project ref semantics are still normalized conservatively into the single model_name contract field.
Rename handling is intentionally conservative: a rename is treated as a removal plus an addition.
Source diagnostics are best-effort and currently strongest for YAML extraction; manifest-derived contracts may still lack file/line detail.
GitHub integration supports sticky PR comments and inline annotations for pull_request workflows, but does not yet manage review-thread lifecycles.

Use As A GitHub Action

Use the included composite action from this repository:

jobs:
  semguard:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      issues: write
      pull-requests: read
      checks: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: yeaight7/dbt-semguard@v0.5.4
        id: semguard
        with:
          base-ref: ${{ github.event.pull_request.base.sha }}
          head-ref: ${{ github.sha }}
          fail-on: breaking
          pr-comment: true
          pr-comment-mode: sticky
          github-token: ${{ github.token }}

      - name: Inspect semguard outputs
        run: |
          echo "Highest severity: ${{ steps.semguard.outputs.highest-severity }}"
          echo "Blocking: ${{ steps.semguard.outputs.blocking }}"

The action now exposes structured outputs so downstream CI can branch on semantic severity without reparsing JSON:

steps.semguard.outputs.highest-severity
steps.semguard.outputs.blocking
steps.semguard.outputs.breaking-count
steps.semguard.outputs.risky-count
steps.semguard.outputs.safe-count

pr-comment-mode accepts:

sticky: update the previous dbt-semguard PR comment when one already exists
create: always publish a new PR comment instead of updating the previous one

The action writes:

a Markdown summary to the workflow summary
a JSON artifact named semguard-report
structured step outputs for severity and counts
an optional sticky PR comment when pr-comment: true
inline check-run annotations when source diagnostics are available
a failing status when the configured threshold is reached

The action requires Python 3.11 or newer. GitHub API calls for PR comments and annotations use a 30-second timeout so stalled API responses do not hold CI indefinitely.

When there are zero semantic changes, the Markdown artifact and workflow summary explicitly include No semantic changes detected. followed by Status: passing.

This is the recommended setup when you want the semantic review to happen automatically on every PR.

If you enable pr-comment: true, the workflow needs:

contents: read
issues: write
pull-requests: read
checks: write

Missing checks: write can prevent inline annotations and check runs from appearing even when the semantic diff succeeds.

For forked pull requests, the standard pull_request event usually does not get a write-capable GITHUB_TOKEN, so sticky PR comments and check-run annotations may be unavailable unless you adopt a separate trusted workflow pattern.

Troubleshooting

Common CI and configuration issues are covered in docs/troubleshooting.md.

Migration notes (`v0.5.4`)

Severity handling now uses an internal enum while preserving the same JSON strings (breaking, risky, safe).
SQL filter diffs preserve case and quote semantics while still ignoring insignificant operator spacing.
GitHub workflow examples now scope write access to PR comments and check annotations only.
Extractor internals are split into YAML, manifest, and normalization modules behind the same public facade.
Native measure diffing, sub-day granularity severity, 30-second GitHub API timeouts, and git ref validation are included in the release surface.
Git ref extraction now scopes strictly to --project-dir for monorepos.
YAML discovery now uses safe default include/exclude patterns.
Optional .semguard.yml include/exclude rules are applied in both local and git-ref YAML extraction.
Invalid semantic YAML now raises user-facing errors with source context instead of raw KeyError tracebacks.
Composite action shell steps now read user-controlled values from environment variables instead of embedding GitHub expressions directly in Bash.
Composite action now generates JSON, Markdown, summary text, and step outputs in a single pass before enforcing the blocking threshold.
Composite action report files now live in an isolated runner temp directory derived from artifact-name, which avoids workspace filename collisions in matrix-style CI jobs.
The repository now documents security reporting, contribution setup, and common action troubleshooting paths.

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.github/workflows		.github/workflows
docs		docs
examples/ecommerce_dbt_project/models		examples/ecommerce_dbt_project/models
src/dbt_semguard		src/dbt_semguard
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
action.yml		action.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt-semguard

What Is This For?

What It Does Exactly

How It Works

Install From PyPI

Install From GitHub

Install From Source

How To Use It

Run locally before opening a PR

Compare exported contracts directly

Compare manifests explicitly

Extract a contract

Configure YAML discovery with `.semguard.yml`

Example Review Flow

How To Read The Result

Output

Example Markdown report

Example JSON report

Coverage

Current Limitations

Use As A GitHub Action

Troubleshooting

Migration notes (`v0.5.4`)

Example project

Documentation

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dbt-semguard

What Is This For?

What It Does Exactly

How It Works

Install From PyPI

Install From GitHub

Install From Source

How To Use It

Run locally before opening a PR

Compare exported contracts directly

Compare manifests explicitly

Extract a contract

Configure YAML discovery with .semguard.yml

Example Review Flow

How To Read The Result

Output

Example Markdown report

Example JSON report

Coverage

Current Limitations

Use As A GitHub Action

Troubleshooting

Migration notes (v0.5.4)

Example project

Documentation

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Configure YAML discovery with `.semguard.yml`

Migration notes (`v0.5.4`)

Packages