A high-performance, SIMD-optimized Cuckoo filter implementation in Go with native assembly for AMD64 (AVX2) and ARM64 (NEON).
- SIMD Acceleration: Native assembly implementations for maximum performance
- AMD64: AVX2 instructions (32-byte parallel processing)
- ARM64: NEON instructions (16-byte parallel processing)
- Concurrent-Safe: Thread-safe operations with mutex protection
- Batch Operations: Optimized batch insert, lookup, and delete
- Configurable: Customizable fingerprint size, bucket size, and relocation parameters
- Zero Dependencies: Pure Go with assembly optimizations
SIMD optimizations provide significant speedups for bucket operations:
- ARM64 NEON: ~2-3x faster than scalar for buckets ≥16 bytes
- AMD64 AVX2: ~3-4x faster than scalar for buckets ≥32 bytes
- Batch Operations: Additional speedup through parallel hash computation
go get github.com/shaia/simdcuckoofilterpackage main
import (
"fmt"
"github.com/shaia/simdcuckoofilter"
)
func main() {
// Create a filter with capacity for 10,000 items
filter, err := cuckoofilter.New(10000)
if err != nil {
panic(err)
}
// Insert items
filter.Insert([]byte("apple"))
filter.Insert([]byte("banana"))
// Lookup items
found := filter.Lookup([]byte("apple")) // true
notFound := filter.Lookup([]byte("grape")) // false (probably)
// Delete items
filter.Delete([]byte("banana"))
// Check statistics
fmt.Printf("Count: %d\n", filter.Count())
fmt.Printf("Load Factor: %.2f%%\n", filter.LoadFactor()*100)
}filter, err := cuckoofilter.New(10000,
cuckoofilter.WithFingerprintSize(8), // 8-bit fingerprints (max supported)
cuckoofilter.WithBucketSize(32), // Larger buckets for better SIMD utilization
cuckoofilter.WithMaxKicks(500), // Relocation attempts before failure
cuckoofilter.WithXXHash(), // Use XXHash64 for better performance
)| Option | Description | Default | Notes |
|---|---|---|---|
WithFingerprintSize(bits) |
Bits per fingerprint | 8 | Valid: 1, 2, 4, 8 |
WithBucketSize(size) |
Fingerprints per bucket | 4 | Valid: 2, 4, 8, 16, 32, 64 |
WithMaxKicks(kicks) |
Relocation attempts | 500 | Range: 1-1000 |
WithFNVHash() |
Use FNV-1a hash (default) | ✓ | Moderate speed, good distribution |
WithXXHash() |
Use XXHash64 | Fast, excellent distribution | |
WithCRC32Hash() |
Use CRC32C | Fastest, hardware-accelerated | |
WithBatchSize(size) |
Batch processing size | 32 | Range: 1-256 |
Batch operations provide better performance through parallel hash computation:
items := [][]byte{
[]byte("item1"),
[]byte("item2"),
[]byte("item3"),
}
// Batch insert
results := filter.InsertBatch(items)
// Batch lookup
found := filter.LookupBatch(items)
// Batch delete
deleted := filter.DeleteBatch(items)New(capacity uint, opts ...Option) (*CuckooFilter, error)- Create a new filter
Insert(item []byte) bool- Insert an itemLookup(item []byte) bool- Check if item existsDelete(item []byte) bool- Remove an itemInsertBatch(items [][]byte) []bool- Batch insertLookupBatch(items [][]byte) []bool- Batch lookupDeleteBatch(items [][]byte) []bool- Batch delete
Count() uint- Number of items in filterCapacity() uint- Maximum capacityLoadFactor() float64- Current load (0.0 to 1.0)OptimalBatchSize() int- Recommended batch sizeReset()- Clear all items
AMD64 (AVX2)
- Bucket lookup: 32 bytes processed in parallel
- Batch hashing: 4 items processed simultaneously
- File:
internal/lookup/bucket_lookup_avx2_amd64.s
ARM64 (NEON)
- Bucket lookup: 16 bytes processed in parallel
- Optimized assembly hashing: ~32% faster than Go
- File:
internal/lookup/bucket_lookup_neon_arm64.s
| Strategy | Speed | Quality | Use Case |
|---|---|---|---|
| FNV-1a | Moderate | Good | Default, compatibility |
| XXHash64 | Fast | Excellent | General purpose, better distribution |
| CRC32C | Fastest | Good | High-throughput scenarios |
Memory usage depends on configuration:
Memory = numBuckets × bucketSize × (fingerprintBits / 8)
Example for 10,000 item capacity:
- 8-bit fingerprints, 4-entry buckets: ~10 KB
- 8-bit fingerprints, 32-entry buckets: ~80 KB
False positive probability depends on fingerprint size and load factor:
| Fingerprint Bits | Load 50% | Load 75% | Load 90% |
|---|---|---|---|
| 1 bit | ~25% | ~37% | ~45% |
| 2 bits | ~12% | ~19% | ~25% |
| 4 bits | ~3% | ~6% | ~9% |
| 8 bits | ~0.4% | ~0.8% | ~1.2% |
See the examples/ directory for complete examples:
examples/basic_usage/- Simple insert/lookup/delete operationsexamples/custom_config/- Advanced configuration options
Run the comprehensive test suite:
# All tests
go test ./...
# With race detection
go test -race ./...
# Benchmarks
go test -bench=. ./...See TESTING.md for detailed testing documentation.
- AMD64: Requires AVX2 support (Intel Haswell 2013+, AMD Excavator 2015+)
- ARM64: Requires NEON support (all ARM64 processors)
- Other platforms: Falls back to Go implementation (not recommended)
- Fingerprint size: Maximum 8 bits (1 byte)
- No resizing: Filter capacity is fixed at creation
- False positives: Small probability of false positives (no false negatives)
- Delete caveat: Deleting non-existent items may cause false negatives
Contributions are welcome! Please ensure:
- All tests pass:
go test ./... - Code is formatted:
go fmt ./... - Assembly changes include tests and benchmarks
[License information to be added]