Data Warehouse Optimization and FinOps

A comprehensive guide to Snowflake data warehouse optimization and financial operations (FinOps) practices. This repository demonstrates practical techniques for optimizing query performance, managing costs, and monitoring resource consumption in Snowflake.

📋 Table of Contents

Overview
Prerequisites
Project Structure
Implementation Guide
FinOps Dashboards
Best Practices
Cost Optimization Strategies

🎯 Overview

This project provides hands-on examples of:

Query Performance Optimization - Techniques to improve query execution time
Partition Pruning - Reducing data scanned by leveraging filtering
Materialized Views & Summary Tables - Pre-aggregating data for faster queries
FinOps Monitoring - Tracking credit consumption and budget management
Cost Attribution - Identifying expensive queries and resource usage patterns

Platform: Snowflake Data Warehouse

📦 Prerequisites

Before running these scripts, ensure you have:

Active Snowflake account with appropriate privileges
Access to SNOWFLAKE.ACCOUNT_USAGE schema for monitoring queries
Permissions to create warehouses, databases, schemas, and tables
Basic understanding of SQL and data warehousing concepts

📁 Project Structure

├── create_data_set.sql              # Initial setup and sample data generation
├── query_performance.sql            # Query performance monitoring
├── query_pruning.sql                # Partition pruning optimization
├── materialized_optimization.sql    # Aggregation optimization techniques
├── finops_and_monitoring.sql        # Cost tracking and budget monitoring
└── images/                          # FinOps dashboard screenshots
    ├── budget.png
    ├── consumption.png
    └── cost_management_overview.png

🚀 Implementation Guide

1. Setup & Data Generation (`create_data_set.sql`)

Purpose: Initialize the Snowflake environment and create a sample sales dataset for testing optimization techniques.

What it does:

Creates a dedicated warehouse (demo_wh) with XSMALL size
Configures auto-suspend (60 seconds) and auto-resume for cost efficiency
Creates database (demo_db) and schema (sales_schema)
Generates 1 million random sales records with realistic data distribution

Key Concepts:

-- Warehouse configuration for cost optimization
AUTO_SUSPEND = 60      -- Suspends after 60 seconds of inactivity
AUTO_RESUME = TRUE     -- Automatically resumes when queries arrive
INITIALLY_SUSPENDED = TRUE  -- Starts in suspended state (no charges)

Sample Data Structure:

Field	Type	Description
order_id	INT	Unique identifier for each order
order_date	DATE	Random date between 2023-01-01 and ~2 years
region	STRING	One of: North, South, East, West, Central
product	STRING	One of: Laptop, Phone, Tablet, Monitor
amount	FLOAT	Random sales amount between 100-5000

Run Order: Execute this file first to set up your environment.

2. Query Performance Monitoring (`query_performance.sql`)

Purpose: Analyze query execution metrics to identify performance bottlenecks and optimization opportunities.

Key Metrics Tracked:

Execution Time - Total time to complete query (converted to seconds)
Bytes Scanned - Amount of data read from storage
Rows Produced - Number of rows returned
Warehouse Size - Compute resources used

Queries Included:

A. Basic Aggregation Performance

-- Aggregates sales by region
SELECT region, SUM(amount), COUNT(*), AVG(amount)
FROM sales
GROUP BY region;

B. Query History Analysis

Uses INFORMATION_SCHEMA.QUERY_HISTORY() to inspect execution metrics:

Filter queries containing SUM(amount)
Exclude monitoring queries themselves
Display bytes scanned in MB for easier interpretation
Sort by most recent executions

When to use this: After running any query to measure its performance impact and identify optimization opportunities.

3. Partition Pruning Optimization (`query_pruning.sql`)

Purpose: Demonstrate how filtering reduces data scanning and improves performance.

Core Concept: Snowflake uses micro-partitions and metadata to skip reading irrelevant data when filters are applied.

Comparison Examples:

❌ Bad Practice - Full Table Scan

-- Scans all 1 million rows
SELECT COUNT(*), SUM(amount) FROM sales;

✅ Good Practice - Filtered Query

-- Only scans relevant partitions (6 months of data)
SELECT COUNT(*), SUM(amount)
FROM sales
WHERE order_date >= '2024-01-01'
    AND order_date < '2024-07-01';

Performance Impact Measurement: The script includes a query that compares mb_scanned and execution_time between filtered and unfiltered queries.

Expected Results:

Filtered queries typically scan 30-70% less data
Execution time reduces proportionally
Lower data scanning = lower compute costs

Best Practice: Always use WHERE clauses with date or categorical columns when possible.

4. Materialized Optimization (`materialized_optimization.sql`)

Purpose: Pre-aggregate commonly queried data to avoid repeated expensive calculations.

Strategy: Create summary tables for frequently accessed aggregations.

Implementation Approach:

Option A - Materialized View (If Supported):

CREATE MATERIALIZED VIEW sales_by_region AS
SELECT region, SUM(amount), COUNT(*), AVG(amount)
FROM sales
GROUP BY region;

Option B - Physical Summary Table (Fallback):

CREATE OR REPLACE TABLE sales_by_region_summary AS
SELECT region, SUM(amount), COUNT(*), AVG(amount)
FROM sales
GROUP BY region;

Performance Comparison:

The script includes a query that compares:

Original aggregation - Scans 1M rows every time
Summary table - Scans only 5 pre-aggregated rows

Use Cases:

Dashboards with repeated aggregations
Reports that don't need real-time data (can be refreshed periodically)
Executive summaries with high-level metrics

Trade-offs:

✅ Pros: 50-90% faster query performance, reduced compute costs
⚠️ Cons: Data may be stale, requires refresh strategy, additional storage cost

Refresh Strategy:

-- Schedule periodic refresh
CREATE OR REPLACE TABLE sales_by_region_summary AS
SELECT region, SUM(amount), COUNT(*), AVG(amount)
FROM sales
GROUP BY region;

5. FinOps & Monitoring (`finops_and_monitoring.sql`)

Purpose: Track credit consumption, identify expensive queries, and monitor budget compliance.

A. Warehouse Credit Consumption

Monitors daily credit usage across all warehouses:

-- 7-day credit consumption trend
SELECT warehouse_name, usage_date, total_credits, estimated_cost_usd
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY

Key Insights:

Identify which warehouses consume the most credits
Track daily spending trends
Estimate costs (assuming $2 per credit - adjust to your rate)

FinOps Actions:

Identify idle warehouses to suspend
Right-size over-provisioned warehouses
Consolidate underutilized warehouses

B. Top 10 Most Expensive Queries

Identifies queries that consume the most credits:

Credit Calculation Formula:

Credits = (execution_seconds / 3600) × warehouse_credit_rate

Warehouse Credit Rates:
- XSMALL: 1 credit/hour
- SMALL:  2 credits/hour
- MEDIUM: 4 credits/hour
- LARGE:  8 credits/hour

Use This To:

Find queries that should be optimized first (highest ROI)
Identify inefficient user queries
Detect long-running ETL jobs

Optimization Priority:

Queries with high execution time + large warehouse size
Frequently executed expensive queries
Queries scanning excessive data

C. Monthly Budget Tracking Dashboard

Provides real-time budget compliance monitoring:

Status Categories:

Credits Used	Percentage	Status	Action Required
< 25	< 50%	On Track	Continue monitoring
25-40	50-80%	Monitor Closely	Review optimization opportunities
40-50	80-100%	Near Limit	Immediate cost reduction needed
> 50	> 100%	Over Budget	Emergency cost controls

Budget Dashboard Features:

Real-time credit consumption by warehouse
Percentage of monthly budget consumed
Credits remaining until budget limit
Alert status based on thresholds

Example Output:

warehouse_name  | credits_used | monthly_budget | pct_consumed | status          | credits_remaining
----------------|--------------|----------------|--------------|-----------------|------------------
prod_etl_wh     | 42.5         | 50             | 85.0%        | Near Limit      | 7.5
analytics_wh    | 18.2         | 50             | 36.4%        | Monitor Closely | 31.8
dev_wh          | 8.5          | 50             | 17.0%        | On Track        | 41.5

📊 FinOps Dashboards

Visual monitoring of Snowflake costs and consumption patterns:

1. Budget Overview

Monthly budget tracking showing credit consumption by warehouse and alert status

2. Consumption Analysis

Detailed view of credit consumption trends, peak usage periods, and cost drivers

3. Cost Management Overview

Comprehensive cost management dashboard with budget alerts and optimization recommendations

🎯 Best Practices

Query Optimization

Use WHERE Clauses - Always filter on date or indexed columns
Limit Result Sets - Use LIMIT when testing queries
Avoid SELECT * - Only select columns you need
Use Clustering Keys - For large tables with common filter columns
Monitor Query Profiles - Review execution plans regularly

Warehouse Management

Right-Size Warehouses - Start small, scale up if needed
Enable Auto-Suspend - Set to 60-300 seconds based on usage patterns
Use Multi-Cluster - Only for concurrent workload spikes
Separate Workloads - Different warehouses for ETL vs analytics
Query Acceleration - Enable for unpredictable workloads

Cost Optimization

Set Budget Alerts - Configure email notifications at 75%, 90%, 100%
Review Query History - Weekly analysis of expensive queries
Use Result Caching - Identical queries return cached results (free)
Schedule ETL Wisely - Avoid overlapping heavy workloads
Archive Old Data - Move to cheaper storage tiers

Data Management

Regular Vacuuming - Remove deleted data to reduce scanning
Partition Columns - Design tables with time-based partitioning
Compression - Snowflake auto-compresses but consider data types
Time Travel Settings - Reduce retention period for non-critical tables
Materialized Views - For expensive aggregations used in dashboards

💰 Cost Optimization Strategies

Immediate Actions (Quick Wins)

✅ Enable auto-suspend on all warehouses
✅ Reduce time travel retention from 90 to 7 days for dev tables
✅ Suspend unused warehouses
✅ Scale down over-provisioned warehouses

Short-Term (1-4 Weeks)

📊 Analyze and optimize top 10 expensive queries
📊 Create summary tables for dashboard aggregations
📊 Implement budget alerts for all warehouses
📊 Review and consolidate similar warehouses

Long-Term (1-3 Months)

🎯 Implement clustering keys on large tables
🎯 Migrate infrequently accessed data to cheaper storage
🎯 Establish query governance policies
🎯 Automate cost reporting and anomaly detection
🎯 Train team on optimization best practices

ROI Calculation

Monthly Savings = (Before Credits - After Credits) × Credit Rate
ROI % = (Monthly Savings / Implementation Cost) × 100

Example:

Before: 500 credits/month at $2/credit = $1,000
After optimization: 350 credits/month = $700
Monthly savings: $300
Annual savings: $3,600

🔍 Monitoring & Alerts

Set Up Budget Alerts

-- Create resource monitor
CREATE RESOURCE MONITOR monthly_limit WITH
    CREDIT_QUOTA = 500
    FREQUENCY = MONTHLY
    START_TIMESTAMP = IMMEDIATELY
    TRIGGERS
        ON 75 PERCENT DO NOTIFY
        ON 90 PERCENT DO NOTIFY
        ON 100 PERCENT DO SUSPEND
        ON 110 PERCENT DO SUSPEND_IMMEDIATE;

Weekly Review Checklist

Review top 10 expensive queries
Check warehouse credit consumption trends
Identify queries with declining performance
Verify auto-suspend settings are working
Review failed query patterns
Update summary tables/materialized views

📚 Additional Resources

🎓 Learning Path

Beginner:

Start with create_data_set.sql to set up your environment
Run queries and observe execution times
Review query_performance.sql to understand metrics

Intermediate: 4. Experiment with query_pruning.sql to see filtering impact 5. Implement summary tables using materialized_optimization.sql 6. Set up basic monitoring with finops_and_monitoring.sql

Advanced: 7. Create custom FinOps dashboards 8. Implement automated optimization workflows 9. Establish organization-wide cost governance policies

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
README.md		README.md
create_data_set.sql		create_data_set.sql
finops_and_monitoring.sql		finops_and_monitoring.sql
materialized_optimization.sql		materialized_optimization.sql
query_performance.sql		query_performance.sql
query_pruning.sql		query_pruning.sql

Folders and files

Latest commit

History

Repository files navigation

Data Warehouse Optimization and FinOps

📋 Table of Contents

🎯 Overview

📦 Prerequisites

📁 Project Structure

🚀 Implementation Guide

1. Setup & Data Generation (create_data_set.sql)

2. Query Performance Monitoring (query_performance.sql)

A. Basic Aggregation Performance

B. Query History Analysis

3. Partition Pruning Optimization (query_pruning.sql)

❌ Bad Practice - Full Table Scan

✅ Good Practice - Filtered Query

4. Materialized Optimization (materialized_optimization.sql)

Implementation Approach:

Performance Comparison:

5. FinOps & Monitoring (finops_and_monitoring.sql)

A. Warehouse Credit Consumption

B. Top 10 Most Expensive Queries

C. Monthly Budget Tracking Dashboard

📊 FinOps Dashboards

1. Budget Overview

2. Consumption Analysis

3. Cost Management Overview

🎯 Best Practices

Query Optimization

Warehouse Management

Cost Optimization

Data Management

💰 Cost Optimization Strategies

Immediate Actions (Quick Wins)

Short-Term (1-4 Weeks)

Long-Term (1-3 Months)

ROI Calculation

🔍 Monitoring & Alerts

Set Up Budget Alerts

Weekly Review Checklist

📚 Additional Resources

🎓 Learning Path

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

1. Setup & Data Generation (`create_data_set.sql`)

2. Query Performance Monitoring (`query_performance.sql`)

3. Partition Pruning Optimization (`query_pruning.sql`)

4. Materialized Optimization (`materialized_optimization.sql`)

5. FinOps & Monitoring (`finops_and_monitoring.sql`)

Packages