Problem Statement
The Challenge
Users frequently need to join tables on transformed columns without modifying their underlying dbt models. A common scenario is joining a timestamp column to a date column:
Example Scenario:
snowflake_relationships:
- name: transactions_to_daily_summary
left_table: {{ table('transactions') }}
right_table: {{ table('daily_summary') }}
relationship_conditions:
- "{{ column('transactions', 'user_id') }} = {{ column('daily_summary', 'user_id') }}"
- "{{ column('transactions', 'transaction_timestamp') }} = {{ column('daily_summary', 'summary_date') }}"
The problem: If transactions.transaction_timestamp is a TIMESTAMP but daily_summary.summary_date is a DATE, users need to join on transaction_timestamp::DATE = summary_date.
Current limitation: SST explicitly rejects SQL transformations in relationship conditions:
# From snowflake_semantic_tools/core/validation/rules/references.py (lines 401-426)
patterns = [
(r"::", "type casting (::)"),
(r"\bCAST\s*\(", "CAST function"),
(r"\bTO_DATE\s*\(", "TO_DATE function"),
(r"\bDATE\s*\(", "DATE function"),
(r"\bDATE_TRUNC\s*\(", "DATE_TRUNC function"), # Not currently listed but relevant
# ... many more patterns
]
Current workaround: Add a computed column to the dbt model (e.g., transaction_date), which:
- Requires modifying the upstream model
- Adds redundant columns to tables
- Increases maintenance burden
- May not be feasible for models owned by other teams
Use Cases
- Timestamp to Date joins:
DATE(event_timestamp) = calendar_date
- Monthly aggregation joins:
DATE_TRUNC('month', order_date) = month_start_date
- Quarterly joins:
DATE_TRUNC('quarter', transaction_date) = quarter_date
- Case-insensitive joins:
UPPER(email) = UPPER(normalized_email) (less common)
Proposed Solution
Overview
Allow SQL expressions in relationship conditions by automatically generating "join key dimensions" during semantic view generation. The user writes natural SQL expressions, and SST handles the complexity behind the scenes.
User Experience (No New Syntax Required!)
Users write intuitive expressions:
snowflake_relationships:
- name: events_to_calendar
left_table: {{ table('events') }}
right_table: {{ table('calendar') }}
relationship_conditions:
- "{{ column('events', 'user_id') }} = {{ column('calendar', 'user_id') }}"
# NEW: Expressions now supported!
- "DATE({{ column('events', 'event_timestamp') }}) = {{ column('calendar', 'calendar_date') }}"
Or with DATE_TRUNC:
relationship_conditions:
- "DATE_TRUNC('month', {{ column('events', 'event_timestamp') }}) = {{ column('calendar', 'month_date') }}"
How It Works
Step 1: Detection (during template resolution/parsing)
- Parse each relationship condition
- Detect SQL transformation patterns around
{{ column() }} templates
- Extract: table name, base column, full expression
Step 2: Join Key Dimension Generation
- Generate a synthetic dimension name:
_JK_{TABLE}_{COLUMN}_{HASH}
- Example:
_JK_EVENTS_EVENT_TIMESTAMP_A1B2
- Store mapping: expression → dimension name
Step 3: SQL Generation
- Add synthetic dimensions to
DIMENSIONS clause
- Reference dimension names (not raw columns) in
RELATIONSHIPS clause
Generated SQL Example
Input:
relationship_conditions:
- "DATE({{ column('events', 'event_timestamp') }}) = {{ column('calendar', 'calendar_date') }}"
Output:
CREATE SEMANTIC VIEW my_semantic_view
TABLES (
EVENTS AS DB.SCHEMA.EVENTS
PRIMARY KEY (EVENT_ID)
COMMENT = 'Events table',
CALENDAR AS DB.SCHEMA.CALENDAR
PRIMARY KEY (CALENDAR_DATE)
COMMENT = 'Calendar dimension'
)
DIMENSIONS (
-- Regular dimensions from dbt model metadata...
EVENTS.EVENT_ID AS EVENT_ID
COMMENT = 'Event identifier',
EVENTS.USER_ID AS USER_ID
COMMENT = 'User identifier',
EVENTS.EVENT_TIMESTAMP AS EVENT_TIMESTAMP
COMMENT = 'Event timestamp',
CALENDAR.CALENDAR_DATE AS CALENDAR_DATE
COMMENT = 'Calendar date',
-- AUTO-GENERATED JOIN KEY DIMENSION
EVENTS._JK_EVENT_TIMESTAMP_DATE AS DATE(EVENT_TIMESTAMP)
COMMENT = 'Auto-generated join key: DATE(EVENT_TIMESTAMP)'
)
RELATIONSHIPS (
EVENTS_TO_CALENDAR AS
EVENTS (_JK_EVENT_TIMESTAMP_DATE) REFERENCES CALENDAR (CALENDAR_DATE)
)
COMMENT = 'Semantic view with expression-based join';
Key Insight: Snowflake Dimensions Support Expressions
Research confirmed that Snowflake semantic views support expressions in the DIMENSIONS clause:
DIMENSIONS (
TABLE_NAME.DIM_NAME AS <expression>
)
The critical question is: Can RELATIONSHIPS reference dimension aliases instead of raw table columns?
Alternatives Considered
Alternative 1: Add Computed Column to dbt Model ❌ (Current Workaround)
-- In dbt model
SELECT
*,
transaction_timestamp::DATE AS transaction_date
FROM ...
Pros:
- Works today
- Explicit and visible
Cons:
- Requires modifying upstream models
- Adds redundant columns to physical tables
- May not be feasible for shared/external models
- Increases table storage and maintenance burden
Alternative 2: New Template Function {{ join_key() }} ❌
relationship_conditions:
- "{{ join_key('events', 'event_date', 'DATE(event_timestamp)') }} = {{ column('calendar', 'calendar_date') }}"
Pros:
- Explicit about creating a join key dimension
- Clear separation of concerns
Cons:
- More verbose syntax
- Requires learning new template function
- Breaks the intuitive SQL-like pattern users expect
Alternative 3: Explicit Join Key Dimension Definition ❌
snowflake_relationships:
- name: events_to_calendar
join_key_dimensions:
- table: events
name: event_date
expr: DATE(event_timestamp)
relationship_conditions:
- "{{ column('events', 'event_date') }} = {{ column('calendar', 'calendar_date') }}"
Pros:
- Very explicit
- Separates dimension definition from usage
Cons:
- Verbose
- Two places to maintain (dimension + condition)
- Not inline with relationship definition
Alternative 4: Auto-Detect and Generate (Proposed Solution) ✅
Pros:
- No new syntax to learn
- Users write natural SQL expressions
- Complexity handled automatically
- Inline with relationship definition
- Matches how users think about joins
Cons:
- Magic happening behind the scenes (mitigated by logging/comments)
- Need robust expression parsing
- Potential edge cases with complex expressions
Priority
High - Would significantly improve workflow
This is a frequently requested pattern that currently requires modifying dbt models, which is often not desirable or possible.
Impact
Who Benefits
- All SST users who need to join tables on transformed columns
- Teams with shared models where modifying upstream tables isn't an option
- Data engineers working with timestamp/date mismatches (very common)
- Organizations with strict model ownership where adding columns requires approval
Common Patterns This Enables
| Pattern |
Expression |
Use Case |
| Timestamp → Date |
DATE(timestamp_col) |
Join events to daily calendar |
| Date Truncation |
DATE_TRUNC('month', date_col) |
Monthly aggregation joins |
| Type Casting |
col::DATE or CAST(col AS DATE) |
Type alignment |
| Quarterly |
DATE_TRUNC('quarter', date_col) |
Quarterly reporting |
Technical Considerations
Research Completed
1. Current Validation Blocks Transformations
File: snowflake_semantic_tools/core/validation/rules/references.py (lines 389-460)
The has_sql_transformation() function explicitly rejects:
- Type casting (
::)
CAST(), CONVERT(), TO_DATE(), TO_TIMESTAMP(), TO_CHAR(), TO_NUMBER()
DATE(), TRIM(), UPPER(), LOWER(), SUBSTRING()
COALESCE(), NVL(), IFNULL()
CASE WHEN statements
- Arithmetic operations (
+, -, *, /)
- String concatenation (
||)
2. Relationship SQL Generation
File: snowflake_semantic_tools/core/parsing/join_condition_parser.py (lines 247-285)
The generate_sql_references() method generates:
TABLE1 (col1, col2) REFERENCES TABLE2 (col1, col2)
Currently only uses column names from ParsedCondition.left_column and .right_column.
3. Dimension SQL Generation Supports Expressions
File: snowflake_semantic_tools/core/generation/semantic_view_builder.py (line 800)
dim_def = f" {table_name}.{dim_name} AS {expression}"
This confirms dimensions CAN have expressions in the generated SQL.
Hypothesis to Validate
Critical Question: Can Snowflake semantic view RELATIONSHIPS reference dimension aliases (which can contain expressions) instead of raw table columns?
-- If this works, the feature is viable:
DIMENSIONS (
EVENTS.EVENT_DATE AS DATE(EVENT_TIMESTAMP)
)
RELATIONSHIPS (
REL_NAME AS
EVENTS (EVENT_DATE) REFERENCES CALENDAR (CALENDAR_DATE)
-- ^^^^^^^^^^^ dimension alias, not raw column
)
Validation SQL Script
Run this in Snowflake to test if dimension aliases work in RELATIONSHIPS:
-- ============================================================================
-- TEST: Can Snowflake semantic views join on dimension expressions?
-- ============================================================================
--
-- HYPOTHESIS: You can define a dimension with an expression (like DATE(timestamp_col))
-- and then reference that dimension name in a RELATIONSHIPS clause.
--
-- WHY THIS MATTERS: If this works, users can join a timestamp column to a date column
-- without adding a new column to their dbt model. They could write:
--
-- relationship_conditions:
-- - "DATE_TRUNC('month', {{ column('events', 'timestamp') }}) = {{ column('calendar', 'month_date') }}"
--
-- And the tool would auto-generate a dimension for the expression.
--
-- INSTRUCTIONS:
-- 1. Set your database/schema context below
-- 2. Run this script to test each hypothesis
-- ============================================================================
-- Set context (adjust to your environment)
USE ROLE your_role;
USE WAREHOUSE your_warehouse;
USE DATABASE your_database;
USE SCHEMA your_schema;
-- ============================================================================
-- Step 1: Create test tables
-- ============================================================================
CREATE OR REPLACE TABLE SST_TEST_EVENTS (
event_id INT,
user_id INT,
event_timestamp TIMESTAMP_NTZ,
event_type VARCHAR(50)
);
CREATE OR REPLACE TABLE SST_TEST_CALENDAR (
calendar_date DATE,
day_name VARCHAR(20),
is_weekend BOOLEAN
);
-- Insert sample data
INSERT INTO SST_TEST_EVENTS VALUES
(1, 100, '2025-01-15 10:30:00', 'login'),
(2, 100, '2025-01-15 11:45:00', 'purchase'),
(3, 101, '2025-01-16 09:00:00', 'login');
INSERT INTO SST_TEST_CALENDAR VALUES
('2025-01-15', 'Wednesday', FALSE),
('2025-01-16', 'Thursday', FALSE),
('2025-01-17', 'Friday', FALSE);
-- ============================================================================
-- TEST 1: Define dimension with expression, reference in relationship
-- ============================================================================
CREATE OR REPLACE SEMANTIC VIEW SST_TEST_DIMENSION_EXPR_JOIN
TABLES (
SST_TEST_EVENTS AS CURRENT_DATABASE().CURRENT_SCHEMA().SST_TEST_EVENTS
PRIMARY KEY (EVENT_ID)
COMMENT = 'Events with timestamps',
SST_TEST_CALENDAR AS CURRENT_DATABASE().CURRENT_SCHEMA().SST_TEST_CALENDAR
PRIMARY KEY (CALENDAR_DATE)
COMMENT = 'Calendar dimension'
)
DIMENSIONS (
-- Regular dimension (simple column reference)
SST_TEST_EVENTS.EVENT_ID AS EVENT_ID
COMMENT = 'Event identifier',
SST_TEST_EVENTS.USER_ID AS USER_ID
COMMENT = 'User identifier',
SST_TEST_EVENTS.EVENT_TYPE AS EVENT_TYPE
COMMENT = 'Type of event',
-- EXPRESSION DIMENSION: This is the key test!
-- Can we define a dimension that truncates timestamp to date?
SST_TEST_EVENTS.EVENT_DATE AS DATE(EVENT_TIMESTAMP)
COMMENT = 'Event date derived from timestamp',
-- Calendar dimensions
SST_TEST_CALENDAR.CALENDAR_DATE AS CALENDAR_DATE
COMMENT = 'Calendar date',
SST_TEST_CALENDAR.DAY_NAME AS DAY_NAME
COMMENT = 'Day of week name'
)
RELATIONSHIPS (
-- THE KEY TEST: Reference the dimension alias (EVENT_DATE)
-- not the raw column (EVENT_TIMESTAMP)
EVENTS_TO_CALENDAR AS
SST_TEST_EVENTS (EVENT_DATE) REFERENCES SST_TEST_CALENDAR (CALENDAR_DATE)
)
COMMENT = 'Test: Can relationships reference dimension expressions?';
-- If the above succeeds, print success
SELECT 'TEST 1 SUCCESS: Dimension expression joins work!' AS result;
-- Verify the semantic view
SHOW SEMANTIC VIEWS LIKE 'SST_TEST_DIMENSION_EXPR_JOIN';
DESCRIBE SEMANTIC VIEW SST_TEST_DIMENSION_EXPR_JOIN;
-- ============================================================================
-- TEST 2: DATE_TRUNC as join key (the actual use case)
-- ============================================================================
CREATE OR REPLACE TABLE SST_TEST_MONTHLY_CALENDAR (
month_date DATE, -- First day of month
month_name VARCHAR(20),
quarter INT
);
INSERT INTO SST_TEST_MONTHLY_CALENDAR VALUES
('2025-01-01', 'January', 1),
('2025-02-01', 'February', 1);
CREATE OR REPLACE SEMANTIC VIEW SST_TEST_MONTHLY_JOIN
TABLES (
SST_TEST_EVENTS AS CURRENT_DATABASE().CURRENT_SCHEMA().SST_TEST_EVENTS
PRIMARY KEY (EVENT_ID)
COMMENT = 'Events with timestamps',
SST_TEST_MONTHLY_CALENDAR AS CURRENT_DATABASE().CURRENT_SCHEMA().SST_TEST_MONTHLY_CALENDAR
PRIMARY KEY (MONTH_DATE)
COMMENT = 'Monthly calendar'
)
DIMENSIONS (
SST_TEST_EVENTS.EVENT_ID AS EVENT_ID,
SST_TEST_EVENTS.USER_ID AS USER_ID,
-- DATE_TRUNC expression dimension - simulating auto-generated join key
SST_TEST_EVENTS.EVENT_MONTH AS DATE_TRUNC('month', EVENT_TIMESTAMP)
COMMENT = 'Auto-generated: Event month for joining to monthly calendar',
SST_TEST_MONTHLY_CALENDAR.MONTH_DATE AS MONTH_DATE,
SST_TEST_MONTHLY_CALENDAR.MONTH_NAME AS MONTH_NAME
)
RELATIONSHIPS (
-- Use the expression-based dimension in the relationship
EVENTS_TO_MONTHLY AS
SST_TEST_EVENTS (EVENT_MONTH) REFERENCES SST_TEST_MONTHLY_CALENDAR (MONTH_DATE)
)
COMMENT = 'Test: DATE_TRUNC expression as join key';
SELECT 'TEST 2 SUCCESS: DATE_TRUNC join key works!' AS result;
-- ============================================================================
-- CLEANUP (run when done testing)
-- ============================================================================
-- DROP SEMANTIC VIEW IF EXISTS SST_TEST_DIMENSION_EXPR_JOIN;
-- DROP SEMANTIC VIEW IF EXISTS SST_TEST_MONTHLY_JOIN;
-- DROP TABLE IF EXISTS SST_TEST_EVENTS;
-- DROP TABLE IF EXISTS SST_TEST_CALENDAR;
-- DROP TABLE IF EXISTS SST_TEST_MONTHLY_CALENDAR;
Implementation Plan
Phase 1: Validate Hypothesis
Phase 2: Expression Detection
Modify JoinConditionParser to detect SQL transformations:
# Suggested patterns to support initially:
EXPRESSION_PATTERNS = [
r"DATE\s*\(\s*{{\s*column\([^)]+\)\s*}}\s*\)", # DATE({{ column(...) }})
r"DATE_TRUNC\s*\([^,]+,\s*{{\s*column\([^)]+\)\s*}}\)", # DATE_TRUNC('month', {{ column(...) }})
r"{{\s*column\([^)]+\)\s*}}::DATE", # {{ column(...) }}::DATE
r"CAST\s*\(\s*{{\s*column\([^)]+\)\s*}}\s+AS\s+DATE\)", # CAST({{ column(...) }} AS DATE)
]
Phase 3: Join Key Dimension Generation
New class: JoinKeyDimensionGenerator
import hashlib
from typing import Dict, Any
class JoinKeyDimensionGenerator:
"""Generates synthetic dimensions for join key expressions."""
def __init__(self):
self.generated_dimensions: Dict[tuple, Dict[str, Any]] = {}
def generate_dimension_name(self, table: str, column: str, expression: str) -> str:
"""Generate unique dimension name for join key."""
hash_suffix = hashlib.md5(expression.encode()).hexdigest()[:4].upper()
return f"_JK_{table.upper()}_{column.upper()}_{hash_suffix}"
def register_join_key(self, table: str, column: str, expression: str) -> str:
"""
Register a join key expression and return dimension name.
Args:
table: Source table name
column: Base column name (extracted from expression)
expression: Full SQL expression (e.g., "DATE(EVENT_TIMESTAMP)")
Returns:
Generated dimension name to use in RELATIONSHIPS
"""
key = (table.lower(), expression)
if key not in self.generated_dimensions:
dim_name = self.generate_dimension_name(table, column, expression)
self.generated_dimensions[key] = {
'name': dim_name,
'table': table.upper(),
'expression': expression,
'base_column': column.upper(),
'comment': f'Auto-generated join key: {expression}'
}
return self.generated_dimensions[key]['name']
def get_all_dimensions(self) -> list:
"""Return all generated dimensions for inclusion in DIMENSIONS clause."""
return list(self.generated_dimensions.values())
def clear(self):
"""Clear all generated dimensions (call between semantic views)."""
self.generated_dimensions.clear()
Phase 4: Update SQL Generation
Modify SemanticViewBuilder:
_build_dimensions_clause() - Include join key dimensions:
def _build_dimensions_clause(self, conn, table_names: List[str], join_key_generator: JoinKeyDimensionGenerator = None) -> str:
"""Build the DIMENSIONS clause of the CREATE SEMANTIC VIEW statement."""
# ... existing code ...
# Add auto-generated join key dimensions
if join_key_generator:
for jk_dim in join_key_generator.get_all_dimensions():
dim_def = f" {jk_dim['table']}.{jk_dim['name']} AS {jk_dim['expression']}"
dim_def += f"\n COMMENT = '{jk_dim['comment']}'"
dim_definitions.append(dim_def)
return ",\n".join(dim_definitions)
_build_relationships_clause() - Use dimension names from generator
Phase 5: Update Validation
- Remove hard block on SQL transformations (or make it conditional)
- Add validation that base column exists in dbt catalog
- Add warning for ASOF joins with expressions (see Caveats)
Caveats & Edge Cases
ASOF Joins
ASOF joins require UNIQUE constraints on the right table's join columns. If an expression destroys uniqueness (e.g., DATE_TRUNC groups multiple timestamps to same month), ASOF joins may fail.
Recommendation:
- Warn users when expressions are used with
>= (ASOF) operators
- Document this limitation
Expression on Both Sides
If both sides have expressions:
- "DATE({{ column('events', 'timestamp') }}) = DATE({{ column('calendar', 'created_at') }})"
Generate two join key dimensions, one for each side.
Same Expression Multiple Times
If the same expression appears in multiple relationships, reuse the dimension.
Complex/Nested Expressions
Initially, support only single-function expressions. Complex nested expressions can be added later:
# Phase 1: Support these
- "DATE({{ column('events', 'timestamp') }})"
- "DATE_TRUNC('month', {{ column('events', 'timestamp') }})"
# Future: Consider supporting these
- "DATE_TRUNC('month', DATE({{ column('events', 'timestamp') }}))"
Files to Modify
| File |
Changes |
core/parsing/join_condition_parser.py |
Add expression detection, extract base column from expressions |
core/validation/rules/references.py |
Conditionally allow expressions when auto-generating dimensions |
core/generation/semantic_view_builder.py |
Add join key dimensions to DIMENSIONS clause, use in RELATIONSHIPS |
core/parsing/parsers/semantic_parser.py |
Handle expression metadata during relationship parsing |
New: core/generation/join_key_generator.py |
JoinKeyDimensionGenerator class |
Example Usage
Before (Current - Fails Validation)
snowflake_relationships:
- name: events_to_calendar
left_table: {{ table('events') }}
right_table: {{ table('calendar') }}
relationship_conditions:
- "{{ column('events', 'user_id') }} = {{ column('calendar', 'user_id') }}"
- "DATE({{ column('events', 'event_timestamp') }}) = {{ column('calendar', 'calendar_date') }}"
# ❌ ERROR: SQL transformation (DATE function) not allowed
After (Proposed - Works)
snowflake_relationships:
- name: events_to_calendar
left_table: {{ table('events') }}
right_table: {{ table('calendar') }}
relationship_conditions:
- "{{ column('events', 'user_id') }} = {{ column('calendar', 'user_id') }}"
- "DATE({{ column('events', 'event_timestamp') }}) = {{ column('calendar', 'calendar_date') }}"
# ✅ Auto-generates join key dimension: EVENTS._JK_EVENT_TIMESTAMP_DATE AS DATE(EVENT_TIMESTAMP)
Generated Output (Verbose Mode)
$ sst generate --all --verbose
Parsing relationships...
events_to_calendar: Detected expression in join condition
Left side: DATE({{ column('events', 'event_timestamp') }})
Creating join key dimension: EVENTS._JK_EVENT_TIMESTAMP_A1B2 AS DATE(EVENT_TIMESTAMP)
Generating semantic view 'analytics_view'...
Tables: 2
Dimensions: 5 (including 1 auto-generated join key)
Relationships: 1
✓ Created successfully
Additional Context
Related Code References
| Component |
File |
Lines |
Purpose |
| Expression blocking |
core/validation/rules/references.py |
389-460 |
Currently rejects transformations |
| Join condition parsing |
core/parsing/join_condition_parser.py |
47-285 |
Parses relationship conditions |
| Dimension SQL generation |
core/generation/semantic_view_builder.py |
772-820 |
Builds DIMENSIONS clause |
| Relationship SQL generation |
core/generation/semantic_view_builder.py |
700-739 |
Builds RELATIONSHIPS clause |
| Relationship parsing |
core/parsing/parsers/semantic_parser.py |
147-203 |
Parses relationship YAML |
Snowflake Documentation References
Pre-submission Checklist
Summary
Problem: Users can't join on transformed columns without modifying dbt models.
Solution: Auto-generate "join key dimensions" when SQL expressions are detected in relationship conditions.
Key Benefits:
- No new syntax required
- No dbt model changes needed
- Complexity handled automatically
- Intuitive SQL-like expressions
Next Step: Validate that Snowflake semantic views allow dimension aliases in RELATIONSHIPS by running the test SQL script above.
Problem Statement
The Challenge
Users frequently need to join tables on transformed columns without modifying their underlying dbt models. A common scenario is joining a timestamp column to a date column:
Example Scenario:
The problem: If
transactions.transaction_timestampis aTIMESTAMPbutdaily_summary.summary_dateis aDATE, users need to join ontransaction_timestamp::DATE = summary_date.Current limitation: SST explicitly rejects SQL transformations in relationship conditions:
Current workaround: Add a computed column to the dbt model (e.g.,
transaction_date), which:Use Cases
DATE(event_timestamp) = calendar_dateDATE_TRUNC('month', order_date) = month_start_dateDATE_TRUNC('quarter', transaction_date) = quarter_dateUPPER(email) = UPPER(normalized_email)(less common)Proposed Solution
Overview
Allow SQL expressions in relationship conditions by automatically generating "join key dimensions" during semantic view generation. The user writes natural SQL expressions, and SST handles the complexity behind the scenes.
User Experience (No New Syntax Required!)
Users write intuitive expressions:
Or with
DATE_TRUNC:How It Works
Step 1: Detection (during template resolution/parsing)
{{ column() }}templatesStep 2: Join Key Dimension Generation
_JK_{TABLE}_{COLUMN}_{HASH}_JK_EVENTS_EVENT_TIMESTAMP_A1B2Step 3: SQL Generation
DIMENSIONSclauseRELATIONSHIPSclauseGenerated SQL Example
Input:
Output:
CREATE SEMANTIC VIEW my_semantic_view TABLES ( EVENTS AS DB.SCHEMA.EVENTS PRIMARY KEY (EVENT_ID) COMMENT = 'Events table', CALENDAR AS DB.SCHEMA.CALENDAR PRIMARY KEY (CALENDAR_DATE) COMMENT = 'Calendar dimension' ) DIMENSIONS ( -- Regular dimensions from dbt model metadata... EVENTS.EVENT_ID AS EVENT_ID COMMENT = 'Event identifier', EVENTS.USER_ID AS USER_ID COMMENT = 'User identifier', EVENTS.EVENT_TIMESTAMP AS EVENT_TIMESTAMP COMMENT = 'Event timestamp', CALENDAR.CALENDAR_DATE AS CALENDAR_DATE COMMENT = 'Calendar date', -- AUTO-GENERATED JOIN KEY DIMENSION EVENTS._JK_EVENT_TIMESTAMP_DATE AS DATE(EVENT_TIMESTAMP) COMMENT = 'Auto-generated join key: DATE(EVENT_TIMESTAMP)' ) RELATIONSHIPS ( EVENTS_TO_CALENDAR AS EVENTS (_JK_EVENT_TIMESTAMP_DATE) REFERENCES CALENDAR (CALENDAR_DATE) ) COMMENT = 'Semantic view with expression-based join';Key Insight: Snowflake Dimensions Support Expressions
Research confirmed that Snowflake semantic views support expressions in the
DIMENSIONSclause:DIMENSIONS ( TABLE_NAME.DIM_NAME AS <expression> )The critical question is: Can
RELATIONSHIPSreference dimension aliases instead of raw table columns?Alternatives Considered
Alternative 1: Add Computed Column to dbt Model ❌ (Current Workaround)
Pros:
Cons:
Alternative 2: New Template Function
{{ join_key() }}❌Pros:
Cons:
Alternative 3: Explicit Join Key Dimension Definition ❌
Pros:
Cons:
Alternative 4: Auto-Detect and Generate (Proposed Solution) ✅
Pros:
Cons:
Priority
High - Would significantly improve workflow
This is a frequently requested pattern that currently requires modifying dbt models, which is often not desirable or possible.
Impact
Who Benefits
Common Patterns This Enables
DATE(timestamp_col)DATE_TRUNC('month', date_col)col::DATEorCAST(col AS DATE)DATE_TRUNC('quarter', date_col)Technical Considerations
Research Completed
1. Current Validation Blocks Transformations
File:
snowflake_semantic_tools/core/validation/rules/references.py(lines 389-460)The
has_sql_transformation()function explicitly rejects:::)CAST(),CONVERT(),TO_DATE(),TO_TIMESTAMP(),TO_CHAR(),TO_NUMBER()DATE(),TRIM(),UPPER(),LOWER(),SUBSTRING()COALESCE(),NVL(),IFNULL()CASE WHENstatements+,-,*,/)||)2. Relationship SQL Generation
File:
snowflake_semantic_tools/core/parsing/join_condition_parser.py(lines 247-285)The
generate_sql_references()method generates:TABLE1 (col1, col2) REFERENCES TABLE2 (col1, col2)Currently only uses column names from
ParsedCondition.left_columnand.right_column.3. Dimension SQL Generation Supports Expressions
File:
snowflake_semantic_tools/core/generation/semantic_view_builder.py(line 800)This confirms dimensions CAN have expressions in the generated SQL.
Hypothesis to Validate
Critical Question: Can Snowflake semantic view RELATIONSHIPS reference dimension aliases (which can contain expressions) instead of raw table columns?
Validation SQL Script
Run this in Snowflake to test if dimension aliases work in RELATIONSHIPS:
Implementation Plan
Phase 1: Validate Hypothesis
Phase 2: Expression Detection
Modify
JoinConditionParserto detect SQL transformations:Phase 3: Join Key Dimension Generation
New class:
JoinKeyDimensionGeneratorPhase 4: Update SQL Generation
Modify
SemanticViewBuilder:_build_dimensions_clause()- Include join key dimensions:_build_relationships_clause()- Use dimension names from generatorPhase 5: Update Validation
Caveats & Edge Cases
ASOF Joins
ASOF joins require
UNIQUEconstraints on the right table's join columns. If an expression destroys uniqueness (e.g.,DATE_TRUNCgroups multiple timestamps to same month), ASOF joins may fail.Recommendation:
>=(ASOF) operatorsExpression on Both Sides
If both sides have expressions:
- "DATE({{ column('events', 'timestamp') }}) = DATE({{ column('calendar', 'created_at') }})"Generate two join key dimensions, one for each side.
Same Expression Multiple Times
If the same expression appears in multiple relationships, reuse the dimension.
Complex/Nested Expressions
Initially, support only single-function expressions. Complex nested expressions can be added later:
Files to Modify
core/parsing/join_condition_parser.pycore/validation/rules/references.pycore/generation/semantic_view_builder.pycore/parsing/parsers/semantic_parser.pycore/generation/join_key_generator.pyJoinKeyDimensionGeneratorclassExample Usage
Before (Current - Fails Validation)
After (Proposed - Works)
Generated Output (Verbose Mode)
Additional Context
Related Code References
core/validation/rules/references.pycore/parsing/join_condition_parser.pycore/generation/semantic_view_builder.pycore/generation/semantic_view_builder.pycore/parsing/parsers/semantic_parser.pySnowflake Documentation References
Pre-submission Checklist
Summary
Problem: Users can't join on transformed columns without modifying dbt models.
Solution: Auto-generate "join key dimensions" when SQL expressions are detected in relationship conditions.
Key Benefits:
Next Step: Validate that Snowflake semantic views allow dimension aliases in RELATIONSHIPS by running the test SQL script above.