Skip to content

Latest commit

 

History

History
256 lines (200 loc) · 8.68 KB

File metadata and controls

256 lines (200 loc) · 8.68 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[3.2.0] - 2026-01-04

Added

  • Serialization/Deserialization for all perfect hash algorithms
    • serialize() method returns std::vector<std::byte> for persistence
    • deserialize(std::span<const std::byte>) static method to restore from bytes
    • Format includes magic number, version, and algorithm type validation
    • Supports RecSplit, CHD, BBHash, and FCH hashers
  • Parallel Construction for RecSplit
    • New with_threads(n) builder method for multi-threaded construction
    • Automatically uses parallel processing for >100 buckets with >1 thread
    • Linear speedup for large key sets
  • SIMD-Optimized Overflow Lookup
    • AVX2-accelerated fingerprint search for overflow keys
    • Processes 4 fingerprints per cycle on supported hardware
    • Automatic fallback to scalar code on non-AVX2 systems

Performance

  • RecSplit lookup: ~15ns (with SIMD overflow)
  • CHD lookup: ~12ns (with SIMD overflow)
  • Parallel construction up to 4x faster with 4 threads

[3.1.0] - 2024-12-14

Added

  • Hybrid Perfect Hash with Overflow Handling: All perfect hash algorithms (RecSplit, CHD, BBHash, PTHash, FCH) now support graceful overflow handling
    • Keys that cannot be perfectly hashed fall back to fingerprint-based linear search
    • Build operations never fail - they always produce a valid hasher
    • New perfect_count() and overflow_count() statistics to track placement efficiency
    • Fingerprint verification ensures correctness for both perfect and overflow keys
  • Comprehensive perfect hash test suite (96 test cases across all algorithms)
  • Extended perfect hash tests with edge cases and stress testing

Changed

  • Perfect hash builders now always succeed, placing overflow keys in a fallback structure
  • Unified v3 references in documentation - removed obsolete version prefixes from docstrings
  • Improved RecSplit bits-per-key threshold for small key sets (accounts for fixed overhead)

Fixed

  • FCH algorithm stability with large bucket sizes
  • RecSplit and CHD hash() methods now correctly handle overflow keys
  • Test file GENERATE macro issues with Catch2 random generators

Performance

  • RecSplit lookup: ~75ns (1000 keys)
  • BBHash lookup: ~32ns (1000 keys)
  • CHD lookup: ~12ns (1000 keys)
  • FCH lookup: ~38ns (1000 keys)
  • PTHash lookup: ~39ns (1000 keys)

[3.0.0] - 2024-09-15

Added

  • Complete architectural rewrite using C++20/23
  • Concept-based design with hasher, perfect_hasher, storage_backend concepts
  • Strong types: slot_index, hash_value, slot_count replacing raw integers
  • Modern error handling with std::expected<T, error>
  • Policy-based perfect hash implementations (RecSplit, CHD, BBHash, PTHash, FCH)
  • Composable architecture: mix any hasher with any storage backend
  • Comprehensive API documentation with Doxygen-style comments in include/maph.hpp
  • Complete architecture documentation in docs/ARCHITECTURE.md
  • User guide with examples and best practices in docs/USER_GUIDE.md
  • CLI documentation with all commands and examples in docs/CLI.md
  • REST API documentation with endpoint reference in integrations/rest_api/API.md
  • Performance benchmarking methodology documentation

Changed

  • Header-only library design for easier integration
  • Enhanced inline documentation throughout the codebase
  • Improved README with better examples and quick start guide

Fixed

  • Documentation inconsistencies and outdated references

1.0.0 - 2024-01-15

Added

  • Initial release of maph (Memory-mapped Approximate Perfect Hash)
  • Core key-value storage engine with sub-microsecond lookups
  • Memory-mapped I/O for automatic persistence
  • Command-line interface (maph_cli) with full CRUD operations
  • REST API server with HTTP/WebSocket support
  • Batch operations (mget, mset, parallel operations)
  • SIMD acceleration with AVX2 for batch hashing
  • Durability manager for configurable sync intervals
  • Database statistics and monitoring capabilities
  • Web interface for visual database management
  • Docker and Kubernetes deployment configurations

Changed

  • Renamed project from rd_ph_filter to maph for clarity
  • Consolidated multiple implementations into single unified codebase
  • Extracted magic numbers to named constants
  • Improved error handling with Result type and error codes
  • Optimized slot structure for cache-line alignment

Fixed

  • remove() function implementation for proper key deletion
  • Batch operation race conditions
  • Statistics tracking accuracy
  • Memory alignment issues on certain architectures
  • File descriptor leaks in error paths

Performance

  • Achieved <200ns lookup latency for cached data
  • 5M+ ops/sec single-threaded throughput
  • 20M+ ops/sec with 8-thread parallel operations
  • Reduced memory overhead by 30% through slot optimization

0.9.0 - 2023-12-01

Added

  • Python bindings via pybind11
  • Approximate map generalization for arbitrary mappings
  • Custom decoder support for specialized applications
  • Lazy iterator implementations
  • Builder pattern for fluent configuration

Changed

  • Template-based design for flexibility
  • Header-only library architecture
  • Improved perfect hash function interface

Deprecated

  • Original rd_ph_filter class (use approximate_map instead)
  • Old Python module name (use approximate_filters)

Fixed

  • False positive rate calculations
  • Memory leaks in Python bindings
  • Thread safety issues in concurrent access

0.8.0 - 2023-10-15

Added

  • Set membership filter implementation
  • Threshold filter with configurable FPR
  • Compact lookup tables
  • CMake build system
  • Catch2 unit tests

Changed

  • Moved from Makefile to CMake
  • Reorganized project structure
  • Improved documentation

Fixed

  • Compilation warnings on GCC 11+
  • Undefined behavior in hash functions

0.7.0 - 2023-08-01

Added

  • Initial proof of concept
  • Basic perfect hash filter
  • Simple benchmarking tools

Known Issues

  • Limited to 8-bit storage
  • No persistence support
  • Single-threaded only

Comparison with Alternatives

vs Redis

  • Maph advantages: 10x faster lookups, zero maintenance, automatic persistence
  • Redis advantages: Richer data types, replication, Lua scripting

vs memcached

  • Maph advantages: Persistence, faster lookups, no network overhead
  • Memcached advantages: Distributed caching, LRU eviction, larger values

vs RocksDB

  • Maph advantages: 25x faster reads, simpler operation, no compaction
  • RocksDB advantages: Unbounded size, transactions, compression

Upgrade Guide

From 0.x to 1.0.0

  1. Database Format Change:

    • Old databases are not compatible
    • Export data using old version, import with new
  2. API Changes:

    // Old
    rd_ph_filter<PH> filter(elements);
    
    // New
    auto db = maph::create("data.maph", 1000000);
  3. Python Module:

    # Old
    import rd_ph_filter
    
    # New
    import maph
  4. Configuration:

    • Load factor now split into static/dynamic ratios
    • Error rate configuration replaced with storage size selection

Roadmap

Version 1.1.0 (Q2 2024)

  • Cuckoo hashing for better collision handling
  • Value compression support
  • Authentication and access control
  • Replication support

Version 1.2.0 (Q3 2024)

  • Multi-file sharding
  • Learned index structures
  • GPU acceleration
  • Persistent memory support

Version 2.0.0 (Q4 2024)

  • ACID transactions
  • SQL-like query language
  • Distributed clustering
  • Cloud-native features

Contributing

Please see CONTRIBUTING.md for guidelines on how to contribute to this project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • FNV hash algorithm by Glenn Fowler, Landon Curt Noll, and Kiem-Phong Vo
  • Inspired by papers on perfect hashing and approximate data structures
  • Community contributors and early adopters