Skip to content

[Feature]: Support Verified Query Repository (VQR) in SST Generate #122

@sophiascaglioni-whoop

Description

@sophiascaglioni-whoop

Problem Statement

Snowflake's Cortex Analyst Verified Query Repository (VQR) improves accuracy by providing curated question-SQL pairs that Cortex Analyst leverages when answering similar questions. Currently, SST has no support for defining or managing VQR entries — users must manually write verified_queries YAML blocks with raw SQL inline, which is error-prone, hard to review, and disconnected from the dbt project.

Proposed Solution

Add VQR support to SST generation with two authoring modes:

Option A: .sql file references (recommended)

Users create a folder of .sql files containing verified queries, then reference them in the semantic model config:

snowflake_semantic_models/
  verified_queries/
    domain/
      california_profit.sql
      monthly_revenue.sql
verified_queries:
  - name: california_profit
    question: "What was the profit from California last month?"
    sql: {{ ref('california_profit') }}
    verified_by: Jane Doe
    use_as_onboarding_question: true

Benefits:

  • SQL files get syntax highlighting, linting, and proper diff visibility in PRs
  • Reusable across semantic views
  • Easier to test independently
  • Consistent with dbt's file-per-query philosophy

Option B: Inline SQL (fallback)

Users write SQL directly in the YAML for simple one-liners:

verified_queries:
  - name: total_revenue
    question: "What is the total revenue?"
    sql: "SELECT SUM(profit) FROM __sales_data"
    verified_by: Jane Doe

SST responsibilities:

  1. Validation — Verify that {{ ref() }} paths resolve to actual .sql files, validate required fields (name, question, sql), and warn on missing optional fields (verified_by, verified_at)
  2. Generation — Inline the .sql file contents into the verified_queries block of the generated semantic view DDL
  3. Template resolution — Support {{ table() }} and {{ column() }} references inside VQR .sql files so queries use logical names (as required by Snowflake)

Alternatives Considered

  • Manual YAML editing — Current state; no visibility into SQL, hard to review in PRs
  • Snowflake's semantic model generator Streamlit app — Good for exploration but doesn't integrate with dbt/SST workflow or version control
  • Storing queries in Snowflake stage only — Loses git history and PR review workflow

Priority

High - Would significantly improve workflow

Impact

  • Any team using Cortex Analyst with SST-generated semantic views
  • Analysts who want to curate trusted queries for their domain
  • Teams adopting Cortex Analyst where answer accuracy is critical

Technical Considerations

  • VQR SQL must use logical table/column names (prefixed with __), not physical names — SST's existing template system ({{ table() }}, {{ column() }}) maps naturally to this
  • verified_at could be auto-populated from git commit timestamp if not specified
  • sst validate should check VQR SQL for template resolution and basic syntax
  • sst generate should inline resolved SQL into the semantic view DDL's verified_queries block

Example Usage

# Validate including VQR entries
sst validate

# Generate semantic views with VQR
sst generate --all --defer-target prod

Output:

✓ Resolved 3 verified queries for semantic view 'sales_analytics'
  - california_profit.sql → inline
  - monthly_revenue.sql → inline  
  - top_customers.sql → inline
✓ Generated semantic view: DB.schema.table

Additional Context

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions