Skip to content

[Feature]: Embedded Sample dbt+SST Project for Integration Testing #95

@mluizzi-whoop

Description

@mluizzi-whoop

Problem Statement

SST requires a dbt project to function, but our current integration tests rely heavily on:

  1. Programmatic temp fixtures - Creating minimal dbt project structures on-the-fly using tmp_path
  2. Heavy mocking - Mocking Snowflake connections and service responses
  3. Fragmented YAML fixtures - Isolated validation scenarios without full project context

This approach has several limitations:

  • Incomplete coverage: Cannot test full end-to-end workflows (init → enrich → validate → extract → generate)
  • Maintenance burden: Each test recreates project structures differently, leading to inconsistent test scenarios
  • Divergence from reality: Programmatic fixtures may not accurately reflect real-world dbt+SST project structures
  • Difficult debugging: When tests fail, it's hard to inspect the "project" state because it's ephemeral
  • Missing edge cases: Hard to systematically test complex interactions (multiple models, relationships, metrics referencing each other)

Currently, the only "real" reference project is sst-jaffle-shop, which exists as an external tutorial repository and cannot be used directly in CI/CD without network dependencies.

Proposed Solution

Create an embedded minimal sample dbt+SST project within the test fixtures directory that serves as a realistic, self-contained test harness:

tests/
├── fixtures/
│   ├── sample_project/                    # NEW: Complete dbt+SST project
│   │   ├── dbt_project.yml
│   │   ├── profiles.yml.example           # Template (not real credentials)
│   │   ├── models/
│   │   │   ├── staging/
│   │   │   │   ├── stg_customers.sql
│   │   │   │   └── stg_customers.yml
│   │   │   ├── marts/
│   │   │   │   ├── customers.sql
│   │   │   │   ├── customers.yml
│   │   │   │   ├── orders.sql
│   │   │   │   └── orders.yml
│   │   ├── target/
│   │   │   └── manifest.json              # Pre-generated for testing
│   │   ├── snowflake_semantic_models/
│   │   │   ├── semantic_views.yml
│   │   │   ├── metrics/
│   │   │   │   └── core_metrics.yml
│   │   │   ├── relationships/
│   │   │   │   └── core_relationships.yml
│   │   │   ├── filters/
│   │   │   ├── verified_queries/
│   │   │   └── custom_instructions/
│   │   └── sst_config.yaml
│   ├── project_factory.py                 # NEW: Factory for creating test variants
│   └── ... (existing fixtures)

Implementation Components

1. Base Sample Project

A minimal but complete dbt+SST project with:

  • 2-3 interconnected dbt models (customers, orders)
  • Pre-enriched YAML with SST metadata
  • Valid relationships, metrics, and a semantic view
  • Pre-compiled manifest.json (no dbt execution required)

2. Project Factory

A utility class to create project variants for specific test scenarios:

# tests/fixtures/project_factory.py
from pathlib import Path
import shutil
from typing import Optional

class SampleProjectFactory:
    """Factory for creating test project variants."""
    
    BASE_PROJECT = Path(__file__).parent / "sample_project"
    
    @classmethod
    def create(cls, tmp_path: Path, name: str = "test_project") -> Path:
        """Create a clean copy of the sample project."""
        project_dir = tmp_path / name
        shutil.copytree(cls.BASE_PROJECT, project_dir)
        return project_dir
    
    @classmethod
    def create_without_sst(cls, tmp_path: Path) -> Path:
        """Create project without SST initialization (for init wizard tests)."""
        project_dir = cls.create(tmp_path)
        shutil.rmtree(project_dir / "snowflake_semantic_models")
        (project_dir / "sst_config.yaml").unlink()
        return project_dir
    
    @classmethod
    def create_with_invalid_metrics(cls, tmp_path: Path) -> Path:
        """Create project with intentionally invalid metrics."""
        project_dir = cls.create(tmp_path)
        invalid_metrics = project_dir / "snowflake_semantic_models/metrics/invalid.yml"
        invalid_metrics.write_text("""
snowflake_metrics:
  - name: broken_metric
    # Missing required fields
    expr: SELECT * FROM  # Invalid SQL
""")
        return project_dir
    
    @classmethod
    def create_with_circular_dependencies(cls, tmp_path: Path) -> Path:
        """Create project with circular metric dependencies."""
        # Implementation for specific edge case testing
        pass

3. Integration Test Base Class

# tests/integration/base.py
import pytest
from pathlib import Path
from tests.fixtures.project_factory import SampleProjectFactory

class IntegrationTestBase:
    """Base class for integration tests with sample project support."""
    
    @pytest.fixture
    def sample_project(self, tmp_path) -> Path:
        """Provide a clean sample project for each test."""
        return SampleProjectFactory.create(tmp_path)
    
    @pytest.fixture
    def uninitalized_project(self, tmp_path) -> Path:
        """Provide a dbt project without SST setup."""
        return SampleProjectFactory.create_without_sst(tmp_path)

Alternatives Considered

Alternative Pros Cons
Git submodule to sst-jaffle-shop Real-world project, shared with tutorials Network dependency in CI, version drift, tutorial-focused not test-focused
Continue with programmatic fixtures Flexible, already in place Inconsistent, hard to maintain, incomplete coverage
External test repo with CI triggers Isolation, real Snowflake testing Complex CI setup, slow feedback loop, maintenance overhead
Docker-based test environment Complete isolation, reproducible Heavy setup, overkill for parsing/validation tests

The embedded sample project approach provides the best balance of realism, maintainability, and CI/CD compatibility.

Priority

High - Would significantly improve workflow

This would significantly improve:

  • Developer confidence when making changes
  • Test coverage for complex workflows
  • Debugging experience when tests fail
  • Onboarding for new contributors

Impact

This feature would benefit:

  • SST maintainers - More comprehensive test coverage, easier debugging
  • Contributors - Clear examples of expected project structure, easier to add tests
  • CI/CD reliability - Self-contained tests that don't depend on external resources
  • Documentation - The sample project itself serves as reference documentation

Technical Considerations

Files to Create/Modify

  1. New directory: tests/fixtures/sample_project/ with full project structure
  2. New file: tests/fixtures/project_factory.py for project variants
  3. New file: tests/integration/base.py for shared test infrastructure
  4. Update: tests/README.md to document the sample project approach
  5. Update: Existing integration tests to use the sample project

Manifest Generation

The manifest.json should be pre-generated and committed to avoid requiring dbt in the test environment. It can be regenerated periodically with:

cd tests/fixtures/sample_project
dbt compile --target test

Considerations

  • Keep the sample project minimal (2-3 models) to reduce maintenance burden
  • Include commented examples of advanced features (complex metrics, multi-table relationships)
  • Add deliberate edge cases (special characters in descriptions, long expressions)
  • Maintain parity with sst-jaffle-shop structure for consistency

Example Usage

Running Full Workflow Tests

# tests/integration/test_full_workflow.py
from click.testing import CliRunner
from snowflake_semantic_tools.interfaces.cli.main import cli
from tests.integration.base import IntegrationTestBase

class TestFullWorkflow(IntegrationTestBase):
    
    def test_validate_succeeds_on_valid_project(self, sample_project):
        """Test that validate passes on a properly configured project."""
        runner = CliRunner()
        result = runner.invoke(
            cli, 
            ['validate'],
            catch_exceptions=False,
            env={'PWD': str(sample_project)}
        )
        assert result.exit_code == 0
        assert "Validation passed" in result.output
    
    def test_init_wizard_on_fresh_dbt_project(self, uninitalized_project):
        """Test init wizard creates correct structure."""
        runner = CliRunner()
        result = runner.invoke(
            cli,
            ['init', '--skip-prompts'],
            catch_exceptions=False,
            env={'PWD': str(uninitalized_project)}
        )
        assert result.exit_code == 0
        assert (uninitalized_project / "sst_config.yaml").exists()
        assert (uninitalized_project / "snowflake_semantic_models").is_dir()

Testing Specific Edge Cases

def test_validation_catches_invalid_metrics(self, tmp_path):
    """Test that validation fails on invalid metrics."""
    project = SampleProjectFactory.create_with_invalid_metrics(tmp_path)
    runner = CliRunner()
    result = runner.invoke(cli, ['validate'], env={'PWD': str(project)})
    
    assert result.exit_code != 0
    assert "broken_metric" in result.output
    assert "Missing required field" in result.output

Additional Context

Related Work

  • The sst-jaffle-shop tutorial provides a real-world example that this sample project should mirror (in structure, not size)
  • Current integration tests in tests/integration/ already create temp project structures - this proposal standardizes that approach

Success Criteria

  1. All existing integration tests can be refactored to use the sample project
  2. At least one new end-to-end test covering init → enrich → validate → extract → generate workflow
  3. CI passes without any external dependencies (no network calls for test fixtures)
  4. Test execution time does not significantly increase (< 10% slower)

Suggested Milestones

  1. Phase 1: Create base sample project structure with minimal models
  2. Phase 2: Implement SampleProjectFactory with common variants
  3. Phase 3: Refactor existing integration tests to use new fixtures
  4. Phase 4: Add new end-to-end workflow tests

Pre-submission Checklist

  • I have searched existing issues to avoid duplicates
  • I have described a clear problem and solution
  • I have considered alternatives and workarounds

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions