A comprehensive guide to Snowflake data warehouse optimization and financial operations (FinOps) practices. This repository demonstrates practical techniques for optimizing query performance, managing costs, and monitoring resource consumption in Snowflake.
- Overview
- Prerequisites
- Project Structure
- Implementation Guide
- FinOps Dashboards
- Best Practices
- Cost Optimization Strategies
This project provides hands-on examples of:
- Query Performance Optimization - Techniques to improve query execution time
- Partition Pruning - Reducing data scanned by leveraging filtering
- Materialized Views & Summary Tables - Pre-aggregating data for faster queries
- FinOps Monitoring - Tracking credit consumption and budget management
- Cost Attribution - Identifying expensive queries and resource usage patterns
Platform: Snowflake Data Warehouse
Before running these scripts, ensure you have:
- Active Snowflake account with appropriate privileges
- Access to
SNOWFLAKE.ACCOUNT_USAGEschema for monitoring queries - Permissions to create warehouses, databases, schemas, and tables
- Basic understanding of SQL and data warehousing concepts
├── create_data_set.sql # Initial setup and sample data generation
├── query_performance.sql # Query performance monitoring
├── query_pruning.sql # Partition pruning optimization
├── materialized_optimization.sql # Aggregation optimization techniques
├── finops_and_monitoring.sql # Cost tracking and budget monitoring
└── images/ # FinOps dashboard screenshots
├── budget.png
├── consumption.png
└── cost_management_overview.png
Purpose: Initialize the Snowflake environment and create a sample sales dataset for testing optimization techniques.
What it does:
- Creates a dedicated warehouse (
demo_wh) with XSMALL size - Configures auto-suspend (60 seconds) and auto-resume for cost efficiency
- Creates database (
demo_db) and schema (sales_schema) - Generates 1 million random sales records with realistic data distribution
Key Concepts:
-- Warehouse configuration for cost optimization
AUTO_SUSPEND = 60 -- Suspends after 60 seconds of inactivity
AUTO_RESUME = TRUE -- Automatically resumes when queries arrive
INITIALLY_SUSPENDED = TRUE -- Starts in suspended state (no charges)Sample Data Structure:
| Field | Type | Description |
|---|---|---|
| order_id | INT | Unique identifier for each order |
| order_date | DATE | Random date between 2023-01-01 and ~2 years |
| region | STRING | One of: North, South, East, West, Central |
| product | STRING | One of: Laptop, Phone, Tablet, Monitor |
| amount | FLOAT | Random sales amount between 100-5000 |
Run Order: Execute this file first to set up your environment.
Purpose: Analyze query execution metrics to identify performance bottlenecks and optimization opportunities.
Key Metrics Tracked:
- Execution Time - Total time to complete query (converted to seconds)
- Bytes Scanned - Amount of data read from storage
- Rows Produced - Number of rows returned
- Warehouse Size - Compute resources used
Queries Included:
-- Aggregates sales by region
SELECT region, SUM(amount), COUNT(*), AVG(amount)
FROM sales
GROUP BY region;Uses INFORMATION_SCHEMA.QUERY_HISTORY() to inspect execution metrics:
- Filter queries containing
SUM(amount) - Exclude monitoring queries themselves
- Display bytes scanned in MB for easier interpretation
- Sort by most recent executions
When to use this: After running any query to measure its performance impact and identify optimization opportunities.
Purpose: Demonstrate how filtering reduces data scanning and improves performance.
Core Concept: Snowflake uses micro-partitions and metadata to skip reading irrelevant data when filters are applied.
Comparison Examples:
-- Scans all 1 million rows
SELECT COUNT(*), SUM(amount) FROM sales;-- Only scans relevant partitions (6 months of data)
SELECT COUNT(*), SUM(amount)
FROM sales
WHERE order_date >= '2024-01-01'
AND order_date < '2024-07-01';Performance Impact Measurement:
The script includes a query that compares mb_scanned and execution_time between filtered and unfiltered queries.
Expected Results:
- Filtered queries typically scan 30-70% less data
- Execution time reduces proportionally
- Lower data scanning = lower compute costs
Best Practice: Always use WHERE clauses with date or categorical columns when possible.
Purpose: Pre-aggregate commonly queried data to avoid repeated expensive calculations.
Strategy: Create summary tables for frequently accessed aggregations.
Option A - Materialized View (If Supported):
CREATE MATERIALIZED VIEW sales_by_region AS
SELECT region, SUM(amount), COUNT(*), AVG(amount)
FROM sales
GROUP BY region;Option B - Physical Summary Table (Fallback):
CREATE OR REPLACE TABLE sales_by_region_summary AS
SELECT region, SUM(amount), COUNT(*), AVG(amount)
FROM sales
GROUP BY region;The script includes a query that compares:
- Original aggregation - Scans 1M rows every time
- Summary table - Scans only 5 pre-aggregated rows
Use Cases:
- Dashboards with repeated aggregations
- Reports that don't need real-time data (can be refreshed periodically)
- Executive summaries with high-level metrics
Trade-offs:
- ✅ Pros: 50-90% faster query performance, reduced compute costs
⚠️ Cons: Data may be stale, requires refresh strategy, additional storage cost
Refresh Strategy:
-- Schedule periodic refresh
CREATE OR REPLACE TABLE sales_by_region_summary AS
SELECT region, SUM(amount), COUNT(*), AVG(amount)
FROM sales
GROUP BY region;Purpose: Track credit consumption, identify expensive queries, and monitor budget compliance.
Monitors daily credit usage across all warehouses:
-- 7-day credit consumption trend
SELECT warehouse_name, usage_date, total_credits, estimated_cost_usd
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORYKey Insights:
- Identify which warehouses consume the most credits
- Track daily spending trends
- Estimate costs (assuming $2 per credit - adjust to your rate)
FinOps Actions:
- Identify idle warehouses to suspend
- Right-size over-provisioned warehouses
- Consolidate underutilized warehouses
Identifies queries that consume the most credits:
Credit Calculation Formula:
Credits = (execution_seconds / 3600) × warehouse_credit_rate
Warehouse Credit Rates:
- XSMALL: 1 credit/hour
- SMALL: 2 credits/hour
- MEDIUM: 4 credits/hour
- LARGE: 8 credits/hour
Use This To:
- Find queries that should be optimized first (highest ROI)
- Identify inefficient user queries
- Detect long-running ETL jobs
Optimization Priority:
- Queries with high execution time + large warehouse size
- Frequently executed expensive queries
- Queries scanning excessive data
Provides real-time budget compliance monitoring:
Status Categories:
| Credits Used | Percentage | Status | Action Required |
|---|---|---|---|
| < 25 | < 50% | On Track | Continue monitoring |
| 25-40 | 50-80% | Monitor Closely | Review optimization opportunities |
| 40-50 | 80-100% | Near Limit | Immediate cost reduction needed |
| > 50 | > 100% | Over Budget | Emergency cost controls |
Budget Dashboard Features:
- Real-time credit consumption by warehouse
- Percentage of monthly budget consumed
- Credits remaining until budget limit
- Alert status based on thresholds
Example Output:
warehouse_name | credits_used | monthly_budget | pct_consumed | status | credits_remaining
----------------|--------------|----------------|--------------|-----------------|------------------
prod_etl_wh | 42.5 | 50 | 85.0% | Near Limit | 7.5
analytics_wh | 18.2 | 50 | 36.4% | Monitor Closely | 31.8
dev_wh | 8.5 | 50 | 17.0% | On Track | 41.5
Visual monitoring of Snowflake costs and consumption patterns:
Monthly budget tracking showing credit consumption by warehouse and alert status
Detailed view of credit consumption trends, peak usage periods, and cost drivers
Comprehensive cost management dashboard with budget alerts and optimization recommendations
- Use WHERE Clauses - Always filter on date or indexed columns
- Limit Result Sets - Use
LIMITwhen testing queries - Avoid SELECT * - Only select columns you need
- Use Clustering Keys - For large tables with common filter columns
- Monitor Query Profiles - Review execution plans regularly
- Right-Size Warehouses - Start small, scale up if needed
- Enable Auto-Suspend - Set to 60-300 seconds based on usage patterns
- Use Multi-Cluster - Only for concurrent workload spikes
- Separate Workloads - Different warehouses for ETL vs analytics
- Query Acceleration - Enable for unpredictable workloads
- Set Budget Alerts - Configure email notifications at 75%, 90%, 100%
- Review Query History - Weekly analysis of expensive queries
- Use Result Caching - Identical queries return cached results (free)
- Schedule ETL Wisely - Avoid overlapping heavy workloads
- Archive Old Data - Move to cheaper storage tiers
- Regular Vacuuming - Remove deleted data to reduce scanning
- Partition Columns - Design tables with time-based partitioning
- Compression - Snowflake auto-compresses but consider data types
- Time Travel Settings - Reduce retention period for non-critical tables
- Materialized Views - For expensive aggregations used in dashboards
- ✅ Enable auto-suspend on all warehouses
- ✅ Reduce time travel retention from 90 to 7 days for dev tables
- ✅ Suspend unused warehouses
- ✅ Scale down over-provisioned warehouses
- 📊 Analyze and optimize top 10 expensive queries
- 📊 Create summary tables for dashboard aggregations
- 📊 Implement budget alerts for all warehouses
- 📊 Review and consolidate similar warehouses
- 🎯 Implement clustering keys on large tables
- 🎯 Migrate infrequently accessed data to cheaper storage
- 🎯 Establish query governance policies
- 🎯 Automate cost reporting and anomaly detection
- 🎯 Train team on optimization best practices
Monthly Savings = (Before Credits - After Credits) × Credit Rate
ROI % = (Monthly Savings / Implementation Cost) × 100
Example:
- Before: 500 credits/month at $2/credit = $1,000
- After optimization: 350 credits/month = $700
- Monthly savings: $300
- Annual savings: $3,600
-- Create resource monitor
CREATE RESOURCE MONITOR monthly_limit WITH
CREDIT_QUOTA = 500
FREQUENCY = MONTHLY
START_TIMESTAMP = IMMEDIATELY
TRIGGERS
ON 75 PERCENT DO NOTIFY
ON 90 PERCENT DO NOTIFY
ON 100 PERCENT DO SUSPEND
ON 110 PERCENT DO SUSPEND_IMMEDIATE;- Review top 10 expensive queries
- Check warehouse credit consumption trends
- Identify queries with declining performance
- Verify auto-suspend settings are working
- Review failed query patterns
- Update summary tables/materialized views
- Snowflake Cost Optimization Guide
- Query Performance Optimization
- Account Usage Views
- Resource Monitors
Beginner:
- Start with
create_data_set.sqlto set up your environment - Run queries and observe execution times
- Review
query_performance.sqlto understand metrics
Intermediate: 4. Experiment with query_pruning.sql to see filtering impact 5. Implement summary tables using materialized_optimization.sql 6. Set up basic monitoring with finops_and_monitoring.sql
Advanced: 7. Create custom FinOps dashboards 8. Implement automated optimization workflows 9. Establish organization-wide cost governance policies