This test plan covers comprehensive testing of the DeCube distributed system, including CRDT-based catalog synchronization, gossip protocols, snapshot lifecycle management, and consensus mechanisms. The plan includes unit, integration, and chaos testing scenarios.
-
Test Merkle Tree Construction
- Verify correct root hash calculation for various data sizes
- Test with empty data, single leaf, and large datasets
- Validate tree structure and hash propagation
-
Test Merkle Proof Generation
- Generate proofs for different leaf indices
- Verify proof validation against root hash
- Test proof size and efficiency
-
Test Merkle Proof Verification
- Valid proofs should verify successfully
- Invalid proofs (tampered data, wrong indices) should fail
- Test edge cases: first leaf, last leaf, middle leaves
-
OR-Set Operations
- Test add operations with unique tags
- Test remove operations and conflict resolution
- Verify concurrent add/remove scenarios
-
LWW-Register Operations
- Test timestamp-based conflict resolution
- Verify later writes win semantics
- Test merge operations with different node IDs
-
Vector Clock Operations
- Test increment operations
- Test merge operations for concurrent events
- Verify causality comparison logic
-
Key Generation and Management
- Test cryptographic key pair generation
- Verify key rotation procedures
- Test key persistence and recovery
-
Data Encryption/Decryption
- Test symmetric encryption of data chunks
- Verify decryption with correct/incorrect keys
- Test encryption performance with large data
-
Digital Signatures
- Test transaction signing and verification
- Verify signature validation for tampered data
- Test multi-signature scenarios
-
Create Test Environment
- Set up mock etcd instance with test data
- Create test volume data structure
- Configure object store and GCL endpoints
-
Snapshot Creation Phase
- Execute
decub-snapshot createcommand - Verify etcd snapshot creation
- Verify volume data compression
- Validate combined snapshot file integrity
- Execute
-
Upload Phase
- Test chunking into 64MB pieces
- Verify SHA256 hash calculation for each chunk
- Test upload to object store with progress tracking
- Validate uploaded chunk integrity
-
Registration Phase
- Test metadata extraction from snapshot
- Verify GCL transaction creation
- Test transaction submission and confirmation
- Validate metadata storage in catalog
-
Restore Phase
- Test metadata retrieval from GCL
- Verify chunk download with hash verification
- Test snapshot reconstruction from chunks
- Validate restore to target directories
-
Catalog-Gossip Integration
- Test delta generation from catalog changes
- Verify gossip message broadcasting
- Test delta application on receiving nodes
- Validate eventual consistency across nodes
-
Consensus Integration
- Test hybrid 2PC with RAFT and BFT
- Verify prepare/commit phases
- Test failure scenarios and recovery
-
Partition Scenarios
- Split network into isolated segments
- Test gossip message delivery during partition
- Verify anti-entropy sync after partition healing
- Test Merkle root comparison and full sync
-
Expected Behavior
- Nodes in different partitions continue independent operation
- Upon reconnection, Merkle root mismatch triggers sync
- Full state transfer for diverged partitions
- Eventual consistency restoration
-
Single Validator Failure
- Simulate validator node crash/failure
- Test consensus continuation with remaining validators
- Verify transaction processing during failure
- Test recovery when validator returns
-
Multiple Validator Failures
- Test BFT tolerance up to f faulty nodes (f = (n-1)/3)
- Verify system halts when too many validators fail
- Test recovery procedures for failed validators
-
Expected Behavior
- Consensus continues if < 1/3 validators fail
- Failed validators are detected and replaced
- Transaction finality maintained
- State recovery from surviving validators
-
Upload Corruption
- Simulate chunk corruption during upload
- Test hash verification failure detection
- Verify automatic retry mechanisms
- Test corruption handling in object store
-
Download Corruption
- Simulate chunk corruption in object store
- Test hash verification during download
- Verify chunk redownload on corruption detection
- Test snapshot reconstruction with corrupted chunks
-
Expected Behavior
- Corruption detected via SHA256 mismatch
- Automatic retry with exponential backoff
- Failed operations logged and reported
- Snapshot integrity maintained through verification
-
Cold Start Recovery
- New node joins existing cluster
- Downloads current catalog state via gossip
- Verifies Merkle root consistency
- Applies missing deltas in causal order
-
Crash Recovery
- Node restarts after failure
- Replays RAFT log for incomplete transactions
- Checks BFT state for global transaction status
- Completes or rolls back pending operations
-
Merkle-Based Anti-Entropy
- Periodic Merkle root exchange between peers
- Root mismatch triggers full state comparison
- Efficient delta transfer for small differences
- Full sync for major divergence
-
CRDT State Reconciliation
- Vector clock comparison for causality
- Application of newer operations only
- Conflict resolution via CRDT semantics
- Eventual consistency guarantee
-
Incomplete Transactions
- Identify prepared but uncommitted transactions
- Query BFT network for commit status
- Apply commit or trigger rollback accordingly
- Ensure atomicity across domains
-
Network Healing
- Detect partition resolution
- Exchange state summaries
- Synchronize diverged operations
- Validate consistency across all nodes
- Multi-node test cluster (minimum 4 nodes for BFT)
- Isolated network segments for partition testing
- Mock services for external dependencies (etcd, object store)
- Monitoring and logging infrastructure
- All unit tests pass with >95% coverage
- Integration tests complete end-to-end workflows
- Chaos tests demonstrate system resilience
- Recovery completes within specified time bounds
- No data loss or inconsistency after failures
- Snapshot creation: <5 minutes for 100GB data
- Gossip propagation: <1 second for delta broadcast
- Recovery time: <30 seconds for node restart
- Consensus latency: <2 seconds for transaction commit