Skip to content

Mgepahmge/T-BLAEQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

T-BLAEQ

T-BLAEQ (Tensor BLAS-based CAE Query) is the official implementation for our submitted paper to VLDB 2026 titled:
"T-BLAEQ: A Tensor-Based Multigrid Index for GPU-Accelerated Multi-dimensional Query Processing"

The code is fully implemented in CUDA C/C++.

Overview

T-BLAEQ is a GPU-accelerated hierarchical mesh index designed for fast range and K-nearest-neighbour (KNN) queries over large high-dimensional datasets.

It organises the dataset into a coarsening hierarchy of meshes connected by sparse prolongation tensors (P-tensors). Each query traverses the hierarchy from coarsest to finest, pruning irrelevant clusters at every level before expanding survivors via a sparse matrix-vector multiply (SpTSpM). A greedy memory-policy scheduler automatically decides how aggressively to cache index data on the GPU, adapting to whatever device memory is available.

Requirements

Hardware

  • NVIDIA GPU with CUDA Compute Capability 7.0 or higher (Volta+)

Software

  • CUDA Toolkit: 12.0 or higher
  • GCC / G++: 12.0 or higher
  • CMake: 3.20 or higher
  • cuVS / RAFT: CUDA Vector Search library (https://github.com/rapidsai/cuvs)
  • Doxygen (optional, for building documentation): 1.9 or higher
  • Graphviz (optional, for call/class graphs): any recent version

Building

To build the project, use CMake. It is highly recommended to build in Release mode for optimal GPU performance:

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

Usage

Command-Line Interface (CLI)

After building, run the program from the build directory:

./T-BLAEQ [OPTIONS]

Key Options

  • --build-index - Build and save index for the specified dataset
  • --test-query - Test queries on the specified dataset
  • -d, --dataset TEXT - Path to the dataset file
  • -i, --index-path TEXT - Path to save/load index (default: indexes/)
  • -f, --query-file TEXT - Path to the query file
  • -q, --max-queries INT - Maximum number of queries to process (default: 10)
  • -t, --query-type INT - Type of query to test: 0 for Range Query, 1 for KNN Query
  • -k, --K INT - Number of neighbors for KNN query (default: 10)
  • -r TEXT - Path to save query results (e.g., results.csv)
  • --force-policy TEXT - Force a specific memory policy for benchmarking (e.g., L0, L1, L2, L3)
  • --height INT - Hierarchy height for index building (both --build-index and --gen-index) (default: 4)
  • --ratios TEXT - Comma-separated coarsening ratios for index building (both --build-index and --gen-index) (default: 100,50,20)
  • --gen-seed INT - Random seed for RandomKmeans synthetic hierarchy (default: 12345)
  • --gen-sigma-divisor FLOAT - Random spread control in RandomKmeans: sigma = spacing/divisor (default: 3.0)

CLI Examples

Build an index:

./T-BLAEQ --build-index -d dataset.txt -i indexes/my_index/

Generate a random index with controllable hierarchy/noise:

./T-BLAEQ --gen-index --gen-N 500000 --gen-D 5 --height 4 --ratios 80,40,20 --gen-seed 20260330 --gen-sigma-divisor 2.2 -i indexes/random_index/

Run range queries:

./T-BLAEQ --test-query -i indexes/my_index/ -f queries/range_50.txt -t 0 -q 100 -r results.csv

Run KNN queries:

./T-BLAEQ --test-query -i indexes/my_index/ -f queries/knn_points.txt -t 1 --K 20 -q 100 -r results.csv

Force a specific memory policy (for benchmarking):

./T-BLAEQ --test-query -i indexes/my_index/ -f queries/range_50.txt -t 0 -r results.csv --force-policy L1

JSONL Index + T-BLAEQ Tool

Build a JSONL index (with binary payloads) and a co-located T-BLAEQ index:

./T-BLAEQ-JsonIndex build <dataset_root> <coord.csv> <index_root> --height 4 --ratios 100,50,20

Query the combined indexes (T-BLAEQ coarse, JSONL refine, payload dump):

./T-BLAEQ-JsonIndex query <index_root> --mode knn --timestamp 1715064123.123456 --seq seq0001 --knn-k 10
./T-BLAEQ-JsonIndex query <index_root> --mode range --start 1715064100.0 --end 1715064200.0 --seq 1,2

Note: The JSONL group size (N) and fixed file-name bytes are compile-time constants in src/file_index/IndexJsonTblaeqTypes.cuh (kGroupSize, kFileNameBytes).

JSONL Index HTTP Server

Start the server (JSONL index + T-BLAEQ coarse search; clients read payloads locally):

./T-BLAEQ-JsonIndex-Server --host 0.0.0.0 --port 8090 --index-cache ./tempJsonIndexCache

Build an index via HTTP:

curl -X POST http://localhost:8090/build-index \
  -H "Content-Type: application/json" \
  -d '{"dataset_name":"demo","dataset_root":"/data/demo","csv_path":"/data/demo/coord.csv","height":4,"ratios":"100,50,20"}'

Query (returns JSON index entries for matched slots):

curl -X POST http://localhost:8090/query \
  -H "Content-Type: application/json" \
  -d '{"dataset_name":"demo","mode":"knn","timestamp":1715064123.123456,"seq":"seq0001","knn_k":10}'

The response fields mirror the HTTP server (success, dataset_name, query_type, result_count, result, memory_policy, level_original_sizes, level_pruned_sizes, query_runtime). Each result entry contains the minimal JSON index fields needed to locate payloads locally: data_file, slot, index_copy_offset, binary_index_record_size, data_offsets, index_vector, timestamps, and slot_count. Payload offset can be computed as:

data_base_offset = index_copy_offset + binary_index_record_size
payload_offset = data_base_offset + data_offsets[slot]
payload_size = data_offsets[slot + 1] - data_offsets[slot]

C++ API Usage

You can also integrate T-BLAEQ directly into your own C++ applications:

#include "QueryHandler.h"

// Load a pre-built index and run a range query
QueryHandler handler("indexes/my_index/", /*loadFromIndex=*/true);

// Auto-selects the optimal memory policy based on available GPU VRAM
handler.prepareForQuery();   

// Execute query
QueryResult result = handler.performQuery("queries/range_50.txt", QueryType::RANGE);

saveQueryResult(result, "output.csv");

Architecture Overview

QueryHandler          (Public facade: build/load, prepare, query)
    └── QueryEngine   (Outer query loop, timing, inter-level grid lifetime)
            └── IQueryStrategy   (Per-level strategy interface)
                    ├── L0Strategy   (Zero cudaMalloc hot path)
                    ├── L1Strategy   (Full permanent cache)
                    ├── L2Strategy   (Selective lazy-load)
                    └── L3Strategy   (Tiled fallback)
  • IndexData is the central data container. It owns all host-side index arrays and all device buffers whose lifetimes are managed by the active strategy.
  • PolicyScheduler analyses available device memory, assigns the most aggressive feasible policy to each hierarchy level, and instantiates the appropriate strategy object.

Directory Structure

src/
  core/               IndexData, QueryEngine, QueryHandler, MemoryPolicy
    strategies/       L0–L3 strategy implementations and shared helpers
  Data_Structures/    PointCloud, SparseGrid, SparseTensorCscFormat, Query
  Kernel/             SpMSpV kernels and strategy-specific wrappers
  Kmeans/             GPU KMeans clustering (RAFT/cuVS wrapper)
  MergeSort/          Block merge sort kernel and host merge utilities
  Query/              Pruning kernels, compact, refactor, CUDA error macros
  Setup/              Index construction helpers (P-tensor, max-radius)
  utils/              NVTXProfiler, path string utilities
docs/                 Doxygen configuration and Markdown documentation pages

Documentation

To build the full Doxygen documentation (including Memory Policies and Query Pipeline details):

cd docs
./buildDoc.sh

The generated documentation will be available at docs/html/index.html.

License

See LICENSE file for details.

Third-Party Licenses

This project uses third-party libraries. See NOTICE and the THIRD_PARTY_LICENSES directory for details.

About

The code to our submitted paper to VLDB 2026 titled T-BLAEQ: A Tensor-Based Multigrid Index for GPU-Accelerated Multi-dimensional Query Processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors