- Introduction
- Demo of RidgeWalker for URW
- Getting Started
- Acknowledging Our Work
This repository contains the source code for RidgeWalker, an FPGA accelerator for graph random walks (GRWs). GRWs are core to graph analytics and ML pipelines but are bottlenecked by irregular, random memory access and variable walk lengths, which cause low bandwidth utilization and pipeline bubbles on conventional architectures. RidgeWalker makes the repo self-contained by capturing the paper’s key ideas here: it decomposes each walk into stateless tasks (leveraging the Markov property), executes them asynchronously across multiple pipelines to hide memory latency, and uses a zero-bubble scheduler plus routing fabric to dynamically balance workload and keep pipelines fully utilized under imbalance.
RidgeWalker is a high-performance GRW FPGA accelerator. The URW implementation lives under app/urw and follows the same build flow as other applications in this repo.
src/rng_data_type/Makefile: RNG data types shared across sampling and pipeline stages.src/rw_data_type/Makefile: random-walk metadata types shared across URW/LV stages.src/switch_data_type/Makefile: switch-network framing and routing data types.src/switch_butterfly/Makefile: butterfly routing fabric for task/packet balancing.test/end_ctrl_16/Makefile: 16-lane end-control/termination helper for LV pipelines.src/irsgu/Makefile: sampling unit that generates candidate transitions.src/lv_sou/Makefile: long-vertex stream source utilities for LV pipeline injection.src/lv_rpa_router/Makefile: RPA task router between pipeline stages.src/lv_rpa_access/Makefile: row-pointer access stage for LV pipeline.src/lv_sample/Makefile: LV sampling stage that selects next-hop indices.src/lv_cla_router/Makefile: CLA task router between pipeline stages.src/lv_cla_access/Makefile: column-list access stage for LV pipeline.test/lv_host_urw/Makefile: host harness for LV/URW pipeline testing.src/lv_rpa_ws_scheduler/Makefile: zero-bubble scheduler for work-stealing and load balance.test/lv_test_rpa_loader/Makefile: RPA loader test/validation module.
This app is an example pipeline configuration for a uniform random walk build.
Module map (from app/uniform_random_walk/kernel.mk):
lib/common_host.mk: common host build flags and utilities.host/pcg/Makefile: host-side RNG seed/setup.host/xcl2/Makefile: XRT helper wrappers for device/queue setup.host/helper/Makefile: CLI and helper utilities.host/log/Makefile: host logging utilities.host/xgraph/Makefile: graph loader and CSR utilities.src/rng_data_type/Makefile: RNG data types shared by kernels.src/rw_data_type/Makefile: random-walk data types shared by kernels.src/switch_data_type/Makefile: switch network data types and framing.src/switch_butterfly/Makefile: butterfly switch fabric for routing tasks/frames.test/end_ctrl_16/Makefile: end-control/termination for 16-lane paths.src/irsgu/Makefile: sampling unit feeding traversal.src/lv_sou/Makefile: long-vertex stream source utilities.src/lv_rpa_router/Makefile: router for RPA task streams.src/lv_rpa_access/Makefile: RPA access stage for vertex reads.src/lv_sample/Makefile: sampling stage for next-hop selection.src/lv_cla_router/Makefile: router for CLA task streams.src/lv_cla_access/Makefile: CLA access stage for vertex reads.src/lv_rpa_ws_scheduler/Makefile: work-stealing scheduler for pipeline balance.test/lv_host_urw/Makefile: host test harness for LV/URW pipeline.test/lv_test_rpa_loader/Makefile: RPA loader test module.
- Asynchronous pipeline (RPA / Sampling / CLA):
- RPA (Row Pointer Access):
src/lv_rpa_access/implements row-pointer access in the LV pipeline. - Sampling:
src/urw/cal_sample_index.cpp(URW) andsrc/lv_sample/(LV pipeline) select next-hop indices. - CLA (Column List Access):
src/lv_cla_access/implements column-list access in the LV pipeline.
- RPA (Row Pointer Access):
- Asynchronous memory-access engine:
src/mem/mem_access.cppimplements the burst-read engine, whilesrc/step_loader/(query injection) andsrc/ring_manager/(active query slots) maintain outstanding work for non-blocking access. - Task Router / Interconnect:
src/switch_butterfly/implements the butterfly routing fabric used to steer tasks between pipelines and memory channels. - Zero-Bubble Scheduler:
src/lv_rpa_ws_scheduler/andsrc/lv_rpa_ws_scheduler_2x2/implement the work-stealing schedulers described in the zero-bubble design. - Query Loader (host):
host/urw/andhost/kernel/load_graph.hload queries and graph data for uniform_random_walk runs.
Example build (uniform_random_walk target):
make app=uniform_random_walk all -jExample run (URW host app + bitstream + graph):
./uniform_random_walk.app -fpga build_dir_urw/kernel.xclbin -graph /path/to/formatted/graphURW host CLI options (from host/kernel/load_graph.h):
-graph: path to a CSR-formatted graph (defaults to/data/graph_at_scale/youtube)-fpga: path to the.xclbinbitstream (defaults tobuild_dir_${app_name}/kernel.xclbin)-qsize: override query size (minimum 512)-vl-el-vw-ew-ep-al-rj: graph feature flags (labels/weights/prefix-sum/alias/rejection)
Under Construction
To ensure the smooth operation of our code, we have outlined the necessary system requirements and dependencies below. Please ensure your environment aligns with the following specifications:
| Dependency | Description |
|---|---|
| OS | Ubuntu 20.04 |
| Linux Kernel: | 5.4.x |
| FPGA Platform: | AMD Alveo U55C |
| FPGA Development: | Vitis Core Development Kit 2021.2 |
| FPGA Shell: | xilinx_u55c_gen3x16_xdma_base_3 (modified) |
| FPGA runtime: | XRT 2.18.179 |
*NOTE: Please note that the FPGA Shell has been modified to suit the specific needs of our project. Ensure you have the correct version with the following instructions.
The official U55C shell limits the number of FPGA kernels that can be implemented. To accommodate our design, we have modified the official U55C shell. The following instructions will guide you through setting up a development environment that can reproduce our implementations.
We recommend using the HACC@NUS cluster, as it already has all the required environment set up.
RidgeWalker uses the Rule-Based Accelerator Building System (RABS) on top of Vitis to build accelerators. To install the required packages:
sudo apt install graphviz libgraphviz-dev faketime opencl-headers
pip install numpy
pip install graphviz
pip install networkx
pip install networkitTo clone the repository along with its submodules, use the following command:
git clone --recurse-submodules git@github.com:Xtra-Computing/RidgeWalker.gitThe repository is structured as follows:
├── app # Contains the configurations for the accelerator project. Each subfolder corresponds to a hardware accelerator.
│ ├── test # Contains the configurations for unit tests or benchmark projects.
├── host # Contains the CPU code for preparing data and controlling the FPGA accelerator.
├── src # Contains the FPGA source code, grouped in modules.
├── test # Contains the CPU/FPGA test code.
├── lib # Contains libraries of rules to use as a subgroup of kernels.
├── misc # Contains scripts for automating tests.
└── mk # Contains the RABS submodule for building the accelerator.To build RidgeWalker, use the make command as shown below. Replace ${app_name} with the name of the application in the app directory:
make app=${app_name} TARGET=hw allHere's a quick guide to the arguments:
app: Specifies the target accelerator to be built.TARGET:hw: Builds an accelerator that can run on real FPGA hardware.hw_emu: Builds a project that can run waveform-based simulation.- The default value can be found in the corresponding
kernel.mkfiles.
all: Builds the entire project.
For example, to build a full implementation of the MetaPath random walk accelerator, use:
make app=metadata TARGET=hw allTo build a full implementation of the Node2Vec random walk accelerator, use:
make app=node2vec TARGET=hw allPlease note that the build of each accelerator may take around 15 hours, depending on the server's performance.
To build tests and benchmarks (projects in app/test/), replace app=${app_name} with test=${test_name}, where ${test_name} the name of the test project. For example:
make test=vcache_test TARGET=hw allThe above make command generate the accelerator of vcache_test unit test of DAC cache.
A successful build will generate the program and FPGA bitstream. Here is an example of the metapath build:
├── metapath.app # CPU program.
├── build_dir_metapath_metapath
│ ├── clock.log # Accelerator clock.
│ ├── git
│ ├── kernel.cfg.json.pdf # Accelerator kernel topology.
│ ├── kernel.link.ltx
│ ├── kernel.link.xclbin.info
│ ├── kernel.link.xclbin.link_summary
│ ├── kernel.xclbin # FPGA bitstream.
│ ├── kernel.xclbin.package_summary
│ ├── link # FPGA build log.
│ ├── report # FPGA report.
└── xrt.ini
RidgeWalker accepts graphs in the same format as ThunderRW. You can follow the prepare_data.sh script here to prepare the input graphs, or you can download them in the next section.
You can download the formatted graphs here. These can be used directly as the input for our accelerator.
Here, ${PCIE_ID} is the PCIE BDF id of the FPGA board.
To execute the program, use the following command, for example the ${metapath_x4}
./metapath_x4.app -fpga build_dir_metapath_metapath_x4/kernel.xclbin -graph ${path/to/formatted/graph}The arguments are as follows:
fpga: Thekernel.xclbinfile used to configure the FPGAs.graph: The path that stores the graph dataset.
We kindly request that you acknowledge our work by citing our research paper if you found our repository beneficial for your research. Below is the citation in BibTeX format for your convenience:
@inproceedings{tan2026ridgewalker,
author = {Hongshi Tan and Yao Chen and Xinyu Chen and Qizhen Zhang and Cheng Chen and Weng-Fai Wong and Bingsheng He},
title = {RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs},
booktitle = {Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA)},
year = {2026},
organization = {IEEE},
note = {20\,min talk, Main Conference, FPGA, SmartNIC, and Reconfigurable Computing Track},
url = {https://2026.hpca-conf.org/details/hpca-2026-main-conference/26/RidgeWalker-Perfectly-Pipelined-Graph-Random-Walks-on-FPGAs}
}
