Summary
There are duplicate implementations for search buffer and visited set structures across the codebase that should be consolidated.
Duplicate Implementations
1. LinearPool vs SearchBuffer
| Feature |
include/utils/query_utils.hpp::LinearPool |
include/utils/rabitq_utils/search_utils/buffer.hpp::SearchBuffer |
| Purpose |
Sorted linear buffer for graph-based ANN search |
Sorted linear buffer for graph-based ANN search |
| Binary search |
find_bsearch() |
binary_search() |
| Insert method |
insert() with memmove |
insert() with memmove |
| Pop method |
pop() with set_checked |
pop() with set_checked |
| Checked flag |
Uses top bit (bit 31) |
Uses top bit (bit 31) |
| has_next |
✅ |
✅ |
| is_full |
✅ |
✅ |
Both classes use the same algorithm and design pattern, with minor implementation differences.
2. DynamicBitset vs HashBasedBooleanSet
| Feature |
include/utils/query_utils.hpp::DynamicBitset |
include/utils/rabitq_utils/search_utils/hashset.hpp::HashBasedBooleanSet |
| Purpose |
Track visited vertices |
Track visited vertices |
| Implementation |
Bit vector (uint64_t array) |
Hash table + unordered_set fallback |
| Interface |
set(), get(), reset() |
set(), get(), clear() |
Both serve the same purpose of tracking visited nodes during graph search.
Suggested Refactoring
- Unify SearchBuffer/LinearPool: Keep one implementation (suggest
SearchBuffer as it's cleaner) and deprecate the other
- Create a unified visited set interface: Abstract the visited tracking with a common interface, allowing different implementations (bitset for dense, hash for sparse) based on use case
- Move shared utilities to a common location: The
set_checked/is_checked pattern using bit 31 could be a shared utility
Files Involved
include/utils/query_utils.hpp
include/utils/rabitq_utils/search_utils/buffer.hpp
include/utils/rabitq_utils/search_utils/hashset.hpp
Benefits
- Reduced code duplication and maintenance burden
- Consistent behavior across different index types
- Easier to optimize and fix bugs in one place
Summary
There are duplicate implementations for search buffer and visited set structures across the codebase that should be consolidated.
Duplicate Implementations
1. LinearPool vs SearchBuffer
include/utils/query_utils.hpp::LinearPoolinclude/utils/rabitq_utils/search_utils/buffer.hpp::SearchBufferfind_bsearch()binary_search()insert()with memmoveinsert()with memmovepop()with set_checkedpop()with set_checkedBoth classes use the same algorithm and design pattern, with minor implementation differences.
2. DynamicBitset vs HashBasedBooleanSet
include/utils/query_utils.hpp::DynamicBitsetinclude/utils/rabitq_utils/search_utils/hashset.hpp::HashBasedBooleanSetset(),get(),reset()set(),get(),clear()Both serve the same purpose of tracking visited nodes during graph search.
Suggested Refactoring
SearchBufferas it's cleaner) and deprecate the otherset_checked/is_checkedpattern using bit 31 could be a shared utilityFiles Involved
include/utils/query_utils.hppinclude/utils/rabitq_utils/search_utils/buffer.hppinclude/utils/rabitq_utils/search_utils/hashset.hppBenefits