Skip to content

feat: add GPU buffer loader for IndexProvider integration#175

Closed
cluster2600 wants to merge 31 commits intoalibaba:mainfrom
cluster2600:feat/gpu-buffer-loader
Closed

feat: add GPU buffer loader for IndexProvider integration#175
cluster2600 wants to merge 31 commits intoalibaba:mainfrom
cluster2600:feat/gpu-buffer-loader

Conversation

@cluster2600
Copy link
Contributor

@cluster2600 cluster2600 commented Feb 25, 2026

Summary

  • GpuBufferLoader (gpu_buffer_loader.h): streams vectors from any IndexProvider into contiguous GPU-ready float32 buffers
  • Metal C++ docs (docs/METAL_CPP.md): architecture overview and kernel reference

Replaces #174 (now closed), which incorrectly used a standalone RocksDB store. This PR integrates with zvec's existing storage architecture via IndexProvider::Iterator.

Follow-up to #166 ("Future Work: Integration with storage").

How it works

IndexProvider (Flat/HNSW/IVF)
    |
    +-- Iterator -> GpuBufferLoader::load() -> GpuBuffer
                                                  |
                                        +---------+----------+
                                        |                    |
                                  Metal device buf      cudaMemcpy
auto provider = index->create_provider();
auto buffer = zvec::GpuBufferLoader::load(provider);

// buffer.vectors is contiguous (N x dim) float32
// Ready for Metal newBufferWithBytes or cudaMemcpy

Features

  • load() — stream all vectors into a single contiguous buffer
  • load_chunk() — chunked loading for datasets exceeding GPU memory
  • Automatic type conversion — FP16, INT8 -> FP32
  • Works with all index types — Flat, HNSW, IVF providers

Why not RocksDB?

zvec already has a complete storage stack: IndexProvider -> Iterator -> block-based segments with mmap/buffer pool backends. A parallel RocksDB store would duplicate this. GpuBufferLoader sits on top of the existing pipeline instead.

Merge order

This PR shares a common base with #172, #173, #176. Recommended merge order: #172#173#175#176. Merging any one brings in the shared base commits; the rest then apply cleanly.

Test plan

  • Header compiles with clang++ C++17
  • Integrates with existing IndexProvider / IndexHolder::Iterator interfaces
  • End-to-end: Flat provider -> GpuBufferLoader -> Metal compute pipeline
  • Benchmark: load throughput for 1M+ vectors
  • Test FP16 and INT8 conversion paths

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants