All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Serialization/Deserialization for all perfect hash algorithms
serialize()method returnsstd::vector<std::byte>for persistencedeserialize(std::span<const std::byte>)static method to restore from bytes- Format includes magic number, version, and algorithm type validation
- Supports RecSplit, CHD, BBHash, and FCH hashers
- Parallel Construction for RecSplit
- New
with_threads(n)builder method for multi-threaded construction - Automatically uses parallel processing for >100 buckets with >1 thread
- Linear speedup for large key sets
- New
- SIMD-Optimized Overflow Lookup
- AVX2-accelerated fingerprint search for overflow keys
- Processes 4 fingerprints per cycle on supported hardware
- Automatic fallback to scalar code on non-AVX2 systems
- RecSplit lookup: ~15ns (with SIMD overflow)
- CHD lookup: ~12ns (with SIMD overflow)
- Parallel construction up to 4x faster with 4 threads
- Hybrid Perfect Hash with Overflow Handling: All perfect hash algorithms (RecSplit, CHD, BBHash, PTHash, FCH) now support graceful overflow handling
- Keys that cannot be perfectly hashed fall back to fingerprint-based linear search
- Build operations never fail - they always produce a valid hasher
- New
perfect_count()andoverflow_count()statistics to track placement efficiency - Fingerprint verification ensures correctness for both perfect and overflow keys
- Comprehensive perfect hash test suite (96 test cases across all algorithms)
- Extended perfect hash tests with edge cases and stress testing
- Perfect hash builders now always succeed, placing overflow keys in a fallback structure
- Unified v3 references in documentation - removed obsolete version prefixes from docstrings
- Improved RecSplit bits-per-key threshold for small key sets (accounts for fixed overhead)
- FCH algorithm stability with large bucket sizes
- RecSplit and CHD
hash()methods now correctly handle overflow keys - Test file GENERATE macro issues with Catch2 random generators
- RecSplit lookup: ~75ns (1000 keys)
- BBHash lookup: ~32ns (1000 keys)
- CHD lookup: ~12ns (1000 keys)
- FCH lookup: ~38ns (1000 keys)
- PTHash lookup: ~39ns (1000 keys)
- Complete architectural rewrite using C++20/23
- Concept-based design with
hasher,perfect_hasher,storage_backendconcepts - Strong types:
slot_index,hash_value,slot_countreplacing raw integers - Modern error handling with
std::expected<T, error> - Policy-based perfect hash implementations (RecSplit, CHD, BBHash, PTHash, FCH)
- Composable architecture: mix any hasher with any storage backend
- Comprehensive API documentation with Doxygen-style comments in
include/maph.hpp - Complete architecture documentation in
docs/ARCHITECTURE.md - User guide with examples and best practices in
docs/USER_GUIDE.md - CLI documentation with all commands and examples in
docs/CLI.md - REST API documentation with endpoint reference in
integrations/rest_api/API.md - Performance benchmarking methodology documentation
- Header-only library design for easier integration
- Enhanced inline documentation throughout the codebase
- Improved README with better examples and quick start guide
- Documentation inconsistencies and outdated references
1.0.0 - 2024-01-15
- Initial release of maph (Memory-mapped Approximate Perfect Hash)
- Core key-value storage engine with sub-microsecond lookups
- Memory-mapped I/O for automatic persistence
- Command-line interface (
maph_cli) with full CRUD operations - REST API server with HTTP/WebSocket support
- Batch operations (mget, mset, parallel operations)
- SIMD acceleration with AVX2 for batch hashing
- Durability manager for configurable sync intervals
- Database statistics and monitoring capabilities
- Web interface for visual database management
- Docker and Kubernetes deployment configurations
- Renamed project from
rd_ph_filtertomaphfor clarity - Consolidated multiple implementations into single unified codebase
- Extracted magic numbers to named constants
- Improved error handling with Result type and error codes
- Optimized slot structure for cache-line alignment
remove()function implementation for proper key deletion- Batch operation race conditions
- Statistics tracking accuracy
- Memory alignment issues on certain architectures
- File descriptor leaks in error paths
- Achieved <200ns lookup latency for cached data
- 5M+ ops/sec single-threaded throughput
- 20M+ ops/sec with 8-thread parallel operations
- Reduced memory overhead by 30% through slot optimization
0.9.0 - 2023-12-01
- Python bindings via pybind11
- Approximate map generalization for arbitrary mappings
- Custom decoder support for specialized applications
- Lazy iterator implementations
- Builder pattern for fluent configuration
- Template-based design for flexibility
- Header-only library architecture
- Improved perfect hash function interface
- Original
rd_ph_filterclass (useapproximate_mapinstead) - Old Python module name (use
approximate_filters)
- False positive rate calculations
- Memory leaks in Python bindings
- Thread safety issues in concurrent access
0.8.0 - 2023-10-15
- Set membership filter implementation
- Threshold filter with configurable FPR
- Compact lookup tables
- CMake build system
- Catch2 unit tests
- Moved from Makefile to CMake
- Reorganized project structure
- Improved documentation
- Compilation warnings on GCC 11+
- Undefined behavior in hash functions
0.7.0 - 2023-08-01
- Initial proof of concept
- Basic perfect hash filter
- Simple benchmarking tools
- Limited to 8-bit storage
- No persistence support
- Single-threaded only
- Maph advantages: 10x faster lookups, zero maintenance, automatic persistence
- Redis advantages: Richer data types, replication, Lua scripting
- Maph advantages: Persistence, faster lookups, no network overhead
- Memcached advantages: Distributed caching, LRU eviction, larger values
- Maph advantages: 25x faster reads, simpler operation, no compaction
- RocksDB advantages: Unbounded size, transactions, compression
-
Database Format Change:
- Old databases are not compatible
- Export data using old version, import with new
-
API Changes:
// Old rd_ph_filter<PH> filter(elements); // New auto db = maph::create("data.maph", 1000000);
-
Python Module:
# Old import rd_ph_filter # New import maph
-
Configuration:
- Load factor now split into static/dynamic ratios
- Error rate configuration replaced with storage size selection
- Cuckoo hashing for better collision handling
- Value compression support
- Authentication and access control
- Replication support
- Multi-file sharding
- Learned index structures
- GPU acceleration
- Persistent memory support
- ACID transactions
- SQL-like query language
- Distributed clustering
- Cloud-native features
Please see CONTRIBUTING.md for guidelines on how to contribute to this project.
This project is licensed under the MIT License - see the LICENSE file for details.
- FNV hash algorithm by Glenn Fowler, Landon Curt Noll, and Kiem-Phong Vo
- Inspired by papers on perfect hashing and approximate data structures
- Community contributors and early adopters