Skip to content

[Feature]: sst diff - Preview Changes Before Deployment #68

@mluizzi-whoop

Description

@mluizzi-whoop

Problem Statement

Currently, SST provides limited visibility into what will change before running destructive operations:

Existing commands:

  • sst generate --dry-run - Shows which views would be generated, but not the actual SQL
  • sst deploy - Runs full workflow (validate → extract → generate → deploy) with no preview
  • sst validate - Only checks for errors, doesn't show what will change

Problems this creates:

  1. No SQL preview - Can't review the generated SQL before deploying to Snowflake
  2. Blind deployments - Running sst deploy is a black box until it executes
  3. Difficult debugging - When a view fails, can't inspect the SQL that would be generated
  4. CI/CD risk - No way to review changes in PR before merging
  5. Incomplete --dry-run - Shows view names but not view definitions

Real-world scenario:

# Developer wants to preview changes before deploying to production
sst generate --all --target prod --dry-run

# Output:
Generation would create:
  - customer_analytics
  - product_performance  
  - sales_overview

# But... what's actually IN those views?
# What tables do they reference?
# What metrics are computed?
# Are there any breaking changes?

# Developer has no choice but to either:
# 1. Deploy blindly and hope it works
# 2. Run `sst generate` to a dev schema first (requires cleanup)
# 3. Manually inspect YAML files and mentally compile SQL (error-prone)

Proposed Solution

Add a new sst diff command that shows exactly what will change in Snowflake:

sst diff [OPTIONS]

Core functionality:

  1. Compare current state vs. proposed changes

    • Query Snowflake for existing semantic views
    • Generate new SQL (same as sst generate but don't execute)
    • Show side-by-side diff
  2. Multiple diff modes:

    • --summary - High-level: new/modified/deleted/unchanged views
    • --names-only - Just list view names that would change
    • --full - Complete SQL diff for each changed view (default)
    • --view <name> - Diff a specific view only
  3. Actionable output:

    • Color-coded diff (red for deletions, green for additions)
    • Summary stats (X views added, Y modified, Z unchanged)
    • Option to save diff to file for review
  4. Integration with defer:

    • sst diff --defer-target prod --only-modified
    • Shows only views that would change due to dbt model updates

Command Behavior

Mode 1: Summary diff (default)

sst diff --target dev

Output:

Comparing: DEV.SST_JAFFLE_SHOP vs. proposed changes

📊 Summary:
  ✨ New:        2 views
  📝 Modified:   3 views  
  🗑️  Deleted:    0 views
  ✓  Unchanged: 4 views

New views:
  + customer_lifetime_value
  + product_category_metrics

Modified views:
  ~ customer_analytics (metric definition changed)
  ~ sales_overview (added new measure)
  ~ product_performance (table reference changed)

Use --full to see complete SQL diff

Mode 2: Full SQL diff

sst diff --full --view customer_analytics

Output:

View: customer_analytics
Status: MODIFIED

Database: DEV.SST_JAFFLE_SHOP
Last modified: 2026-01-02 15:30:00

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SQL Diff:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  CREATE OR REPLACE SEMANTIC VIEW customer_analytics AS
    SELECT 
      customer_id,
-     SUM(order_count) as total_orders,
+     COUNT(DISTINCT order_id) as total_orders,
      AVG(order_value) as avg_order_value,
+     SUM(order_value) as lifetime_value
    FROM PROD.ANALYTICS.CUSTOMERS
    WHERE is_active = true;

Changes:
  • Metric 'total_orders' aggregation changed: SUM → COUNT DISTINCT
  • Added new metric: 'lifetime_value'

Mode 3: Names only (for scripting)

sst diff --names-only

Output:

customer_analytics
sales_overview
product_performance

Mode 4: Export for review

sst diff --full --output diff_report.md

Creates markdown file with complete diff for CI/CD review.

Alternatives Considered

  1. Enhanced --dry-run mode

    • Alternative: Improve sst generate --dry-run to show SQL
    • Downside: Doesn't compare against existing views, just shows what would be created
  2. Manual inspection workflow

    • Alternative: Deploy to dev schema, manually compare using SQL
    • Downside: Requires cleanup, time-consuming, error-prone
  3. Git diff on generated SQL files

    • Alternative: Commit generated SQL to git, use git diff
    • Downside: Adds clutter to repo, SQL is generated anyway (not source of truth)
  4. Snowflake history queries

    • Alternative: Query Snowflake's query history after deployment
    • Downside: Too late - changes already deployed
  5. Separate --show-sql flag on generate

    • Alternative: sst generate --show-sql --dry-run
    • Downside: Conflates two separate concerns (generation vs. inspection)

Priority

High - Would significantly improve workflow

This is important for:

  • Production deployments (reduce risk)
  • CI/CD pipelines (enable PR review of SQL changes)
  • Debugging (inspect generated SQL without deploying)
  • Collaborative review (share diffs with team)

Impact

Who benefits:

  • Production users - Preview changes before deploying to prod
  • DevOps/CI engineers - Add diff reports to PR comments
  • Developers - Debug semantic view generation without trial-and-error deployments
  • Team leads - Review semantic view changes during code review
  • Compliance teams - Audit what SQL is being deployed

Scope:

  • Any user deploying semantic views to Snowflake
  • Particularly valuable for production environments
  • Essential for mature CI/CD workflows

Comparable tools:

  • terraform plan - Shows infrastructure changes before apply
  • dbt --defer + dbt run --dry-run - Preview which models would run
  • kubectl diff - Shows Kubernetes resource changes
  • git diff - Universal diff standard
  • All major deployment tools have preview/plan modes

Technical Considerations

Implementation approach:

  1. Create: snowflake_semantic_tools/interfaces/cli/commands/diff.py

    @click.command()
    @target_option
    @database_schema_options
    @defer_options
    @click.option("--summary", is_flag=True, help="Show high-level summary only")
    @click.option("--full", is_flag=True, help="Show complete SQL diff (default)")
    @click.option("--names-only", is_flag=True, help="Output changed view names only")
    @click.option("--view", help="Diff specific view only")
    @click.option("--output", "-o", help="Save diff to file")
    def diff(dbt_target, db, schema, defer_target, state, only_modified, 
             no_defer, summary, full, names_only, view, output):
        """Preview semantic view changes before deployment."""
        # Implementation
  2. Create: snowflake_semantic_tools/services/diff_service.py

    • DiffService class with methods:
      • get_existing_views() - Query Snowflake for current views
      • generate_proposed_sql() - Generate new SQL (without executing)
      • compare_views() - Diff algorithm (line-by-line comparison)
      • format_diff() - Pretty-print with colors
      • export_diff() - Save to markdown/text file
  3. Leverage existing services:

    • Use SemanticViewGenerationService in dry-run mode
    • Use SnowflakeClient to query existing views
    • Reuse defer logic from defer.py
  4. Diff algorithm:

    from difflib import unified_diff
    
    def compare_sql(existing: str, proposed: str) -> List[str]:
        """Generate unified diff between SQL statements."""
        return list(unified_diff(
            existing.splitlines(),
            proposed.splitlines(),
            fromfile="existing",
            tofile="proposed",
            lineterm=""
        ))
  5. Output formatting:

    • Use rich for colorized terminal output
    • Support markdown export for CI/CD (GitHub PR comments)
    • Support JSON export for programmatic use

Edge cases:

  • View doesn't exist in Snowflake → Mark as "NEW"
  • View exists but SST won't generate it → Mark as "DELETED"
  • Normalized SQL comparison (whitespace, case-insensitive keywords)
  • Views with dynamic SQL → Diff the template, not resolved SQL

Performance:

  • For large projects (100+ views), parallelize Snowflake queries
  • Cache existing view definitions to avoid repeated queries
  • Option to diff only modified views (--only-modified)

Testing:

  • Unit tests for diff algorithm
  • Integration tests: Create view, modify, diff
  • Snapshot tests for diff output formatting

Example Usage

Scenario 1: Preview before production deploy

# Show summary of changes
sst diff --target prod --summary

# Review specific view
sst diff --target prod --view customer_analytics --full

# Export full diff for team review
sst diff --target prod --full --output prod_deploy_preview.md

Scenario 2: CI/CD integration

# .github/workflows/sst-preview.yml
- name: Generate diff preview
  run: |
    sst diff --target prod --full --output diff.md
    
- name: Post diff to PR
  uses: actions/github-script@v6
  with:
    script: |
      const fs = require('fs');
      const diff = fs.readFileSync('diff.md', 'utf8');
      github.rest.issues.createComment({
        issue_number: context.issue.number,
        owner: context.repo.owner,
        repo: context.repo.repo,
        body: `## Semantic View Changes\n\n${diff}`
      });

Scenario 3: Debug view generation

# See what SQL would be generated without deploying
sst diff --view problematic_view --full

# Compare output to expectations, iterate on YAML
# No cleanup needed (no Snowflake objects created)

Scenario 4: Selective review with defer

# Show only views impacted by modified dbt models
sst diff --defer-target prod --only-modified --summary

# Result: Focused diff on what actually changed

Additional Context

Output examples:

Summary mode:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Semantic View Diff: DEV.SST_JAFFLE_SHOP
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Status Summary:
  ✨ New:        2 views
  📝 Modified:   3 views
  🗑️  Deleted:    0 views  
  ✓  Unchanged: 4 views

Modified Views:
  📝 customer_analytics
     • Metric 'total_orders' changed (SUM → COUNT)
     • Added metric 'lifetime_value'
  
  📝 sales_overview
     • Added measure 'discount_amount'
  
  📝 product_performance
     • Table reference changed (PROD → DEV)

New Views:
  ✨ customer_lifetime_value
  ✨ product_category_metrics

To deploy these changes, run:
  sst deploy --target dev

Names-only mode (for scripting):

# Get list of changed views
CHANGED_VIEWS=$(sst diff --names-only)

# Pass to other commands
for view in $CHANGED_VIEWS; do
  echo "Reviewing $view..."
  sst diff --view "$view" --full
done

JSON export (for tooling):

sst diff --format json --output diff.json
{
  "summary": {
    "new": 2,
    "modified": 3,
    "deleted": 0,
    "unchanged": 4
  },
  "changes": [
    {
      "view": "customer_analytics",
      "status": "modified",
      "database": "DEV",
      "schema": "SST_JAFFLE_SHOP",
      "changes": [
        {
          "type": "metric_changed",
          "metric": "total_orders",
          "old_agg": "SUM",
          "new_agg": "COUNT"
        },
        {
          "type": "metric_added",
          "metric": "lifetime_value"
        }
      ]
    }
  ]
}

Integration with existing commands:

  • sst diff → Review changes
  • sst validate → Check for errors
  • sst deploy → Execute changes

This creates a complete pre-deployment workflow:

  1. sst diff - Preview what will change
  2. sst validate - Ensure no errors
  3. sst deploy - Execute with confidence

Pre-submission Checklist

  • I have searched existing issues to avoid duplicates
  • I have described a clear problem and solution
  • I have considered alternatives and workarounds

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions