A high-performance spatial indexing research codebase implementing SIMD-optimized R-tree structures using Intel AVX-512 instructions. This project explores various fanout sizes, vectorization strategies, and optimization techniques for spatial queries.
This repository contains implementations and benchmarks for two primary spatial query operations:
- Range Search (RS): Find all objects intersecting a query rectangle
- Spatial Join: Join two R-tree datasets based on spatial overlap
Multiple implementation strategies are provided for systematic performance evaluation across different hardware characteristics.
- Multiple Fanout Sizes: Support for 16, 32, 64, 128, 256, 512, 1024, 2048, and 4096-way fanouts
- SIMD Vectorization: Intel AVX-512 optimized implementations
- Vectorization Depths:
- D0 (Scalar): Non-vectorized baseline with logical and bitwise variants
- D1: Single-level SIMD vectorization with partial/full variants and prefetching
- D2: Two-level SIMD vectorization for deeper tree traversal
- Performance Monitoring: Integration with Intel CPU performance counters
- Cache-Aware Design: 64-byte aligned data structures for optimal cache utilization
- Multiple Dataset Sizes: 1M, 5M, and 10M data points
.
├── r_tree/ # R-tree node implementations
│ ├── rtreeN.h # Standard fanout N implementations (16-4096)
│ ├── rtreeN_d1.h # Depth-1 optimized variants
│ └── commonMethod.h # Shared utilities
├── rs/ # Range search implementations
│ ├── 1M/ # 1 million data points
│ ├── 5M/ # 5 million data points
│ └── 10M/ # 10 million data points
│ ├── scalar/ # Non-vectorized implementations
│ ├── vector-d1/ # Single-level SIMD vectorization
│ └── vector-d2/ # Two-level SIMD vectorization
├── join/ # Spatial join implementations
│ └── [same structure as rs/]
├── create_rtree/ # R-tree data generation scripts
│ ├── 1M/ # 1M point dataset generators
│ ├── 5M/ # 5M point dataset generators
│ ├── 10M/ # 10M point dataset generators
│ │ ├── main_{fanout}-1.py # First index scripts (seed=100)
│ │ └── main_{fanout}-2.py # Second index scripts (seed=200)
│ └── gen_qpool.py # Query pool generator for range search
├── scripts/ # Build and execution scripts
│ ├── rs.sh # Range search build script
│ └── join.sh # Join build script
├── data/ # Dataset files
│ ├── join/ # R-tree structure files
│ └── search/ # Query datasets
├── out/ # Output results
├── utils.h # Core utilities and SIMD helpers
├── PerfEvent.hpp # Performance counter interface
├── commonMethod.cpp # Common methods implementation
└── CLAUDE.md # AI assistant guidance
- CPU: Intel Ice Lake Server or newer (AVX-512 support required)
- Compiler: g++ with AVX-512 support
- OS: Linux (tested on systems with Intel Ice Lake architecture)
To verify your CPU supports AVX-512:
gcc -march=native -Q --help=target | grep march
# Should show: icelake-server or newerpip install rtreelibThe create_rtree/ directory contains Python scripts to generate R-tree data files for spatial join experiments. Each script is pre-configured for specific dataset sizes, fanout values, and index variants.
create_rtree/
├── 1M/ # 1 million point datasets
├── 5M/ # 5 million point datasets
└── 10M/ # 10 million point datasets
├── main_{fanout}-1.py # First index (seed=100)
└── main_{fanout}-2.py # Second index (seed=200)
Each dataset size folder contains scripts for fanouts: 16, 32, 64, 128, 256, 512, 1024, 2048
Naming Convention: main_{fanout}-{index}.py
{fanout}: R-tree maximum fanout (16, 32, 64, 128, 256, 512, 1024, 2048){index}: Index number (1 or 2) for generating paired join datasets
# Generate 1M dataset with fanout 64, first index
cd create_rtree/1M/
python main_64-1.py
# Generate 10M dataset with fanout 128, second index
cd create_rtree/10M/
python main_128-2.pyEach script generates point datasets (zero-width/height rectangles) with the following characteristics:
| Dataset | Objects | Coordinate Space | Rectangle Size | Seed (Index 1) | Seed (Index 2) |
|---|---|---|---|---|---|
| 1M | 1,000,000 | 1000 × 1000 | 0 × 0 (points) | 100 | 200 |
| 5M | 5,000,000 | 1000 × 1000 | 0 × 0 (points) | 100 | 200 |
| 10M | 10,000,000 | 1000 × 1000 | 0 × 0 (points) | 100 | 200 |
Key characteristics:
- half_width = 0, half_height = 0: Objects are points, not rectangles
- Different seeds: Index 1 uses seed(100), Index 2 uses seed(200) for diverse join pairs
- Sorted output: Nodes are sorted by low_x coordinate before writing
Generated files are written to the local data/join/ directory with the naming convention:
{index}-{size}-s1000_v0_fan{fanout}.txt
Examples:
data/join/1-1M-s1000_v0_fan64.txt- First index, 1M objects, fanout 64data/join/2-10M-s1000_v0_fan128.txt- Second index, 10M objects, fanout 128data/join/1-5M-s1000_v0_fan256.txt- First index, 5M objects, fanout 256
Important: These generated data files in data/join/ are the R-tree structure files that the join and range search implementations load and query during experiments.
The create_rtree/gen_qpool.py script generates query rectangles for range search experiments.
cd create_rtree/
python gen_qpool.pyThe script generates a pool of query rectangles with the following default parameters:
- Query pool size: 150,000 rectangles
- Coordinate space: 1000 × 1000
- Rectangle size: 10 × 10 (square_dim = 10, half_dim = 5)
- Random seed: 100 (for reproducibility)
Generated query pool file is written to data/search/ with the naming convention:
qs_{start}_{end}.txt
Default output: data/search/qs_0_75.txt
Where:
start: Lower bound of query rectangle size range (default: 0)end: Upper bound of query rectangle size range (default: 75)
Each line in the output file contains one query rectangle:
<min_x> <min_y> <max_x> <max_y>
Important: These query pool files in data/search/ are used by the range search implementations to perform queries against the R-tree indices.
Each line in the R-tree structure files (data/join/) represents one R-tree node:
<count_child> <id> <lx[0]...lx[n]> <ly[0]...ly[n]> <hx[0]...hx[n]> <hy[0]...hy[n]> <child_id[0]...child_id[n]> <level> <parent_id>
Where:
count_child: Number of entries in this nodeid: Node identifierlx, ly: Low x and y coordinates of bounding rectangleshx, hy: High x and y coordinates of bounding rectangleschild_id: Child node identifierslevel: Tree level (0 = root)parent_id: Parent node identifier
Build all range search variants for all dataset sizes:
cd scripts/
./rs.shBuild all join variants:
cd scripts/
./join.shNavigate to the specific implementation directory and compile:
# Example: Range search, 1M dataset, scalar, fanout 64
cd rs/1M/scalar/
g++ -g f64.cpp -march=native -funroll-loops -O3 -o rs_s_d0_f64
./rs_s_d0_f64# Example: Join, 10M dataset, vector D1, fanout 128
cd join/10M/vector-d1/
g++ -g f128_o3.cpp -march=native -funroll-loops -mcmodel=medium -O3 -o join_v_d1_o3_f128
./join_v_d1_o3_f128-march=native: Enable native CPU instruction set (AVX-512)-funroll-loops: Loop unrolling optimization-O3: Maximum optimization level-mcmodel=medium: Required for large datasets (join/nn operations)-g: Debug symbols
Format: {operation}_{implementation}_{depth}_{optimization}_f{fanout}
Examples:
rs_s_d0_f64- Range Search, Scalar, Depth 0, fanout 64join_v_d1_o3_f128- Join, Vector, Depth 1, optimization 3, fanout 128
Suffixes:
_w_bw- with bitwise operations_w_p- with partial vectorization_w_pf- with prefetching
- Vector Width: 512-bit (
__m512for 16 floats) - Mask Types:
__mmask16,__mmask32,__mmask64 - Intrinsics: AVX-512 Foundation and AVX-512 DQ
Configurable prefetch distance and hints:
int lkahead_dis = 1; // Lookahead distance
const int hint = _MM_HINT_T0; // L1 cacheControl debug output via the XDEBUG macro:
#define XDEBUG 0 // 0 = production, 1 = debugSet to 1 for detailed SIMD operation debugging, 0 for performance measurements.
The codebase uses PerfEvent.hpp for Intel CPU performance counter integration:
- Cache misses (L1, L2, L3)
- TLB misses
- Instructions per cycle (IPC)
- Branch mispredictions
R-tree structure files (in data/join/) use the format:
<count_child> <level> <lx[0]...lx[n]> <ly[0]...ly[n]> <hx[0]...hx[n]> <hy[0]...hy[n]> <child_id[0]...child_id[n]> <level> <id>
Edit the arrays at the top of build scripts to customize:
# scripts/rs.sh or scripts/join.sh
DATASETS=(1M 5M 10M)
FANOUTS=(16 32 64 128 256 512 1024 2048)Results are written to the out/ directory with performance metrics including:
- Query processing time
- Node visits
- Result counts
- Hardware counter statistics
This codebase implements the research presented in the following paper:
SIMD-ified R-tree Query Processing and Optimization Yeasir Rayhan and Walid G. Aref In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL '23) November 13–16, 2023, Hamburg, Germany
If you use this code in your research, please cite:
@inproceedings{DBLP:conf/gis/RayhanA23,
author = {Yeasir Rayhan and
Walid G. Aref},
title = {SIMD-ified R-tree Query Processing and Optimization},
booktitle = {Proceedings of the 31st {ACM} International Conference on Advances
in Geographic Information Systems, {SIGSPATIAL} 2023, Hamburg, Germany,
November 13-16, 2023},
pages = {37:1--37:10},
publisher = {{ACM}},
year = {2023},
url = {https://doi.org/10.1145/3589132.3625610},
doi = {10.1145/3589132.3625610},
}This work is licensed under a Creative Commons Attribution International 4.0 License.
Yeasir Rayhan Purdue University, West Lafayette, IN, USA Email: yrayhan@purdue.edu
This work was supported by the National Science Foundation under Grant Number IIS-1910216.