Skip to content

refactor: Consolidate duplicate LinearPool/SearchBuffer and Bitset implementations #60

@huanglune

Description

@huanglune

Summary

There are duplicate implementations for search buffer and visited set structures across the codebase that should be consolidated.

Duplicate Implementations

1. LinearPool vs SearchBuffer

Feature include/utils/query_utils.hpp::LinearPool include/utils/rabitq_utils/search_utils/buffer.hpp::SearchBuffer
Purpose Sorted linear buffer for graph-based ANN search Sorted linear buffer for graph-based ANN search
Binary search find_bsearch() binary_search()
Insert method insert() with memmove insert() with memmove
Pop method pop() with set_checked pop() with set_checked
Checked flag Uses top bit (bit 31) Uses top bit (bit 31)
has_next
is_full

Both classes use the same algorithm and design pattern, with minor implementation differences.

2. DynamicBitset vs HashBasedBooleanSet

Feature include/utils/query_utils.hpp::DynamicBitset include/utils/rabitq_utils/search_utils/hashset.hpp::HashBasedBooleanSet
Purpose Track visited vertices Track visited vertices
Implementation Bit vector (uint64_t array) Hash table + unordered_set fallback
Interface set(), get(), reset() set(), get(), clear()

Both serve the same purpose of tracking visited nodes during graph search.

Suggested Refactoring

  1. Unify SearchBuffer/LinearPool: Keep one implementation (suggest SearchBuffer as it's cleaner) and deprecate the other
  2. Create a unified visited set interface: Abstract the visited tracking with a common interface, allowing different implementations (bitset for dense, hash for sparse) based on use case
  3. Move shared utilities to a common location: The set_checked/is_checked pattern using bit 31 could be a shared utility

Files Involved

  • include/utils/query_utils.hpp
  • include/utils/rabitq_utils/search_utils/buffer.hpp
  • include/utils/rabitq_utils/search_utils/hashset.hpp

Benefits

  • Reduced code duplication and maintenance burden
  • Consistent behavior across different index types
  • Easier to optimize and fix bugs in one place

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions