-
Notifications
You must be signed in to change notification settings - Fork 1
Testing Guide
Comprehensive testing strategies and guidelines for DataProfiler development.
DataProfiler employs a multi-layered testing approach:
- Fast feedback with unit tests
- Integration confidence with component tests
- Real-world validation with end-to-end tests
- Performance assurance with benchmarks
- Security validation with specialized tests
tests/
βββ fixtures/ # Test data and utilities
β βββ standard_datasets/ # Standard CSV/JSON test files
β βββ dataset_generator.rs # Dynamic test data generation
β βββ domain_datasets.rs # Domain-specific test data
βββ *_test.rs # Integration tests
βββ *_tests.rs # Test suites
βββ data/ # Large test datasets
benches/
βββ unified_benchmarks.rs # General performance tests
βββ domain_benchmarks.rs # Domain-specific benchmarks
βββ statistical_benchmark.rs # Statistical operations benchmarks
src/
βββ **/*_test.rs # Unit tests (co-located with source)
Location: Inline with source code (#[cfg(test)] modules)
Purpose: Test individual functions and methods in isolation
# Run unit tests only
just test # Fast unit tests
cargo test --lib # Library unit tests only
cargo test --bin dataprof-cli # CLI unit tests only
# Run specific unit tests
cargo test engine_selection # Test specific functionality
cargo test --package dataprof quality # Test specific moduleExample Unit Test Structure:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_engine_selection_small_file() {
// Test automatic engine selection logic
}
#[test]
fn test_memory_estimation() {
// Test memory usage calculations
}
}Location: tests/ directory
Purpose: Test component interactions and feature integration
# Run all integration tests
just test-all # All tests including integration
cargo test # All tests
# Run specific integration test files
cargo test --test integration_tests
cargo test --test data_quality_simple
cargo test --test error_handling_simpleKey Integration Test Files:
-
integration_tests.rs- Core functionality integration -
data_quality_simple.rs- Data quality analysis tests -
error_handling_simple.rs- Error handling and recovery -
v03_comprehensive.rs- Version 0.3 feature validation -
adaptive_engine_tests.rs- Engine selection and fallback
Location: tests/database_integration.rs
Purpose: Test database connectors and data operations
# Setup databases first
just db-setup # Start PostgreSQL, MySQL, Redis
# Run database tests
just test-db # All database tests
just test-postgres # PostgreSQL-specific tests
just test-mysql # MySQL-specific tests
just test-sqlite # SQLite tests
just test-duckdb # DuckDB tests
# Run all database tests with setup
just test-all-db # Setup + test + teardownDatabase Test Requirements:
- Docker must be running
- Test databases are automatically configured
- Connection pooling and cleanup tested
- SQL injection prevention validated
Location: tests/cli_basic_tests.rs
Purpose: Test CLI interface and user workflows
# Run CLI tests (slower)
just test-cli # CLI integration tests
cargo test --test cli_basic_tests
# Debug CLI behavior
just debug-run examples/sample.csv
cargo run -- --help # Test help outputCLI Test Coverage:
- Command-line argument parsing
- File format detection
- Output formatting
- Error message clarity
- Performance metrics display
Location: tests/security_tests.rs
Purpose: Validate security properties and prevent vulnerabilities
# Run security tests
just test-security # Security-focused tests
cargo audit # Dependency vulnerability scan
# Memory safety tests
RUSTFLAGS="-Zsanitizer=address" cargo testSecurity Test Areas:
- Input validation and sanitization
- SQL injection prevention
- Memory safety verification
- Unsafe code block validation
- Dependency vulnerability checks
Location: tests/memory_leak_tests.rs
Purpose: Ensure memory efficiency and detect leaks
# Memory tests
cargo test --test memory_leak_tests
just profile-memory examples/large.csv
# Performance tests
cargo test --test arrow_performance_testLocation: Various test files Purpose: Test specific features and integrations
# Apache Arrow integration
just test-arrow # Arrow feature tests
cargo test --features arrow --test arrow_integration_test
# Feature flag combinations
cargo test --features database
cargo test --features all-db
cargo test --no-default-features# Run all benchmarks
just bench # cargo bench
cargo bench # Direct cargo command
# Run specific benchmarks
cargo bench unified # General performance
cargo bench domain # Domain-specific operations
cargo bench statistical # Statistical computations- File processing performance
- Engine selection overhead
- Memory allocation patterns
- Cross-platform performance
- Financial data processing
- Scientific dataset analysis
- Log file analysis
- Geospatial data handling
- Statistical computation performance
- Large dataset processing
- Memory-efficient algorithms
- SIMD optimization validation
# Baseline performance measurement
cargo bench > baseline.txt
# After changes, compare performance
cargo bench > current.txt
# Compare baseline.txt vs current.txtLocation: tests/fixtures/standard_datasets/
# View available test datasets
ls tests/fixtures/standard_datasets/
cat tests/fixtures/standard_datasets/README.mdStandard Datasets Include:
- Small CSV files (< 1MB) for unit tests
- Medium files (1-10MB) for integration tests
- Large files (10-100MB) for performance tests
- Malformed files for error handling tests
- Unicode and special character files
Location: tests/fixtures/dataset_generator.rs
// Generate test data programmatically
use tests::fixtures::dataset_generator::*;
let csv_data = generate_csv(1000, vec!["name", "age", "salary"]);
let json_data = generate_json_array(500, field_types);Location: tests/fixtures/domain_datasets.rs
Pre-generated datasets for specific domains:
- Financial data (stock prices, transactions)
- Scientific data (measurements, experiments)
- Web logs (access logs, error logs)
- Geospatial data (coordinates, boundaries)
# Fast feedback loop (< 30 seconds)
just test # Unit tests only
# Pre-commit validation (< 2 minutes)
just quality # Format + lint + test
# Full validation (< 10 minutes)
just test-all # All tests including integration# CI test matrix
cargo test --all-features # All features enabled
cargo test --no-default-features # Minimal features
cargo test --features database # Database features only# Comprehensive validation
just test-all-db # All tests with databases
just bench # Performance validation
cargo audit # Security audit
just coverage # Code coverage report# Install coverage tool
cargo install cargo-tarpaulin
# Generate HTML coverage report
just coverage # Uses tarpaulin
open coverage/tarpaulin-report.html
# Generate different format reports
cargo tarpaulin --out Xml # For CI systems
cargo tarpaulin --out Json # For tooling- Unit tests: >90% line coverage
- Integration tests: >80% feature coverage
- Critical paths: 100% coverage required
- Error paths: >70% coverage
// Exclude from coverage
#[cfg(not(tarpaulin_include))]
fn platform_specific_function() {
// Platform-specific code
}use proptest::prelude::*;
proptest! {
#[test]
fn test_column_analysis_properties(
data in prop::collection::vec(any::<String>(), 0..1000)
) {
// Test properties that should always hold
let analysis = analyze_column(&data);
assert!(analysis.count >= 0);
assert!(analysis.null_count <= analysis.count);
}
}// Common test utilities
use tests::fixtures::*;
#[test]
fn test_with_standard_data() {
let test_data = load_standard_dataset("financial_sample.csv");
let result = process_data(test_data);
assert_validation_rules(result);
}// Slow tests (excluded from default test run)
#[test]
#[ignore = "slow"]
fn test_large_file_processing() {
// Test with multi-GB files
}
// Database tests (require database setup)
#[test]
#[cfg(feature = "database")]
fn test_database_connection() {
// Database-specific tests
}- Set breakpoints in test code
- Use "Debug unit tests" configuration
- Run specific test with debugger attached
# Debug specific test
cargo test test_name -- --nocapture
# Show test output
cargo test -- --show-output
# Run single test with logging
RUST_LOG=debug cargo test test_name -- --nocapture
# Test with memory debugging
RUSTFLAGS="-Zsanitizer=address" cargo test test_name# Useful environment variables for testing
export RUST_LOG=debug # Enable debug logging
export RUST_BACKTRACE=1 # Show backtraces on panic
export DATAPROF_TEST_DB_URL=... # Override test database
export DATAPROF_TEST_DATA_DIR=... # Override test data location- Arrange-Act-Assert pattern in tests
- One assertion per test when possible
- Descriptive test names that explain the scenario
- Test both success and failure paths
- Deterministic test data for consistent results
- Representative data that matches real usage
- Edge cases coverage (empty, null, malformed)
- Size-appropriate data for test performance
- Baseline measurements before optimization
- Consistent test environment for comparisons
- Multiple iterations to account for variance
- Memory profiling alongside CPU profiling
- Isolated test databases per test run
- Automatic cleanup after tests
- Transaction rollback for test isolation
- Connection pool testing under load
# Test different feature combinations
cargo test --features database
cargo test --features arrow
cargo test --features all-db
cargo test --no-default-features --features minimal# Cross-platform test validation
cargo test --target x86_64-unknown-linux-gnu
cargo test --target x86_64-pc-windows-msvc
cargo test --target x86_64-apple-darwin# In Cargo.toml
[[test]]
name = "integration_tests"
path = "tests/integration_tests.rs"
required-features = ["database"]- Test execution time: Monitor for performance regression
- Test coverage percentage: Maintain >80% overall coverage
- Test failure rate: Target <1% flaky tests
- Database test isolation: Ensure no cross-test contamination
# Generate test reports for CI
cargo test --message-format json > test_results.json
cargo tarpaulin --out Json > coverage.json// Fuzz testing with arbitrary data
#[cfg(fuzzing)]
mod fuzz_tests {
use super::*;
#[test]
fn fuzz_csv_parser() {
// Generate random CSV-like data and test parsing
}
}#[test]
#[ignore = "load_test"]
fn test_concurrent_processing() {
// Test system under concurrent load
use std::thread;
let handles: Vec<_> = (0..10)
.map(|i| thread::spawn(move || process_large_file(i)))
.collect();
for handle in handles {
handle.join().unwrap();
}
}# Test against real databases
export DATABASE_URL=postgresql://real_server/test_db
cargo test --features postgres test_real_database_integrationDataProfiler implements comprehensive security testing to prevent SQL injection, credential exposure, and other vulnerabilities. Security tests are automatically run in CI/CD and should be executed before any database-related changes.
# Run all security tests
cargo test --test security_tests --features database
# Run specific security categories
cargo test sql_injection_tests
cargo test error_sanitization_tests
cargo test integration_security_testsTest Coverage Areas:
- Union-based injection attacks
- Boolean-based blind attacks
- Time-based blind attacks
- Error-based attacks
- Stacked queries
- Comment injection
#[test]
fn test_credential_sanitization() {
let error_with_creds = "Connection failed: postgresql://user:secret@host/db";
let sanitized = sanitize_error_message(error_with_creds);
assert!(!sanitized.contains("secret"));
assert!(sanitized.contains("[REDACTED]"));
}#[test]
fn test_sql_identifier_validation() {
// Valid identifiers
assert!(validate_sql_identifier("users").is_ok());
assert!(validate_sql_identifier("\"quoted table\"").is_ok());
// Malicious attempts
assert!(validate_sql_identifier("users; DROP TABLE").is_err());
assert!(validate_sql_identifier("users' OR 1=1--").is_err());
}Test against realistic attack patterns:
let attack_patterns = vec![
"users'; DROP TABLE users; --",
"products' UNION SELECT password FROM admin",
"orders' AND (SELECT SLEEP(5))",
"customers'; EXEC xp_cmdshell('rm -rf /')",
];
for pattern in attack_patterns {
assert!(validate_sql_identifier(pattern).is_err());
}# Test credential loading from environment
export POSTGRES_USER=testuser
export POSTGRES_PASSWORD=testpass
cargo test test_env_credential_loading#[test]
fn test_ssl_enforcement() {
let config = SslConfig::production();
assert!(config.require_ssl);
assert!(config.verify_server_cert);
}// Use non-sensitive test credentials
const TEST_USER: &str = "dataprof_test";
const TEST_PASS: &str = "test_password_123";
const TEST_DB: &str = "dataprof_test_db";const INJECTION_PATTERNS: &[&str] = &[
"'; DROP TABLE users; --",
"' UNION SELECT * FROM passwords --",
"'; WAITFOR DELAY '00:00:05'; --",
"' AND 1=1 --",
"'; EXEC sp_configure; --",
];# Setup isolated test environment
docker run -d --name dataprof-security-test \
-e POSTGRES_DB=security_test \
-e POSTGRES_USER=test_user \
-e POSTGRES_PASSWORD=test_pass \
-p 5433:5432 postgres:15
# Run security tests against isolated instance
export TEST_DATABASE_URL=postgresql://test_user:test_pass@localhost:5433/security_test
cargo test --test security_testsSecurity tests are automatically run in multiple CI workflows:
-
Basic CI:
cargo auditfor dependency vulnerabilities - Advanced Security: Comprehensive scanning with multiple tools
- Production Pipeline: Enhanced validation before deployment
#[test]
fn test_security_audit_logging() {
let result = profile_database(config, "sensitive_table").await?;
// Verify security warnings are captured
assert!(result.security_warnings.len() > 0);
// Verify no sensitive data in logs
for warning in &result.security_warnings {
assert!(!warning.contains("password"));
assert!(!warning.contains("secret"));
}
}#[test]
fn test_connection_security_validation() {
let warnings = validate_connection_security(
"postgresql://user:pass@localhost:5432/db",
&SslConfig::default(),
"postgresql"
).unwrap();
assert!(warnings.iter().any(|w| w.contains("Password embedded")));
assert!(warnings.iter().any(|w| w.contains("localhost")));
}Security features should not significantly impact performance:
#[test]
fn test_validation_performance() {
let start = Instant::now();
for _ in 0..1000 {
validate_sql_identifier("test_table").unwrap();
}
let duration = start.elapsed();
assert!(duration.as_millis() < 100); // Should be very fast
}- Security functions: 100% test coverage required
- SQL validation: All injection patterns tested
- Error sanitization: All sensitive patterns covered
- SSL configuration: All modes validated
# Run comprehensive security scan
just security-scan
# Individual security tools
cargo audit # Dependency vulnerabilities
cargo deny check # License and security policy
semgrep --config=p/security # Static analysis