This document describes how to reproduce the results presented in the paper "Soaring with TRILLI: an HW/SW Heterogeneous Accelerator for Multi-Modal Image Registration", submitted to the 33rd IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 2025). Specifically, we provide instructions on how to compile the host applications and run the experiments using the prepared bitstreams to reproduce the results presented in the paper.
The figures that can be reproduced are the following:
- Figure 6 - Geometric transformation IPE scaling
- Figure 7 - Transformation, MI and Complete Registration comparison with SoA
- Figure 8 - Registration accuracy
3D rigid image registration is a pivotal procedure in computer vision that aligns a floating volume with a reference one to correct positional and rotational distortions. It serves either as a stand-alone process or as a pre-processing step for non-rigid registration, where the rigid part dominates the computational cost. Various hardware accelerators have been proposed to optimize its compute-intensive components: geometric transformation with interpolation and similarity metric computation. However, existing solutions fail to address both components effectively, as GPUs excel at image transformation, while FPGAs in similarity metric computation. To close this gap, we propose TRILLI, a novel Versal-based accelerator for image transformation and interpolation. TRILLI optimally maps each computational step on the proper heterogeneous hardware component. TRILLI achieves between 5.32× and 36.75× speedup and between 10.04× and 104.60× energy efficiency improvement for image transformation and interpolation against the top hardware accelerated solutions. Moreover, we integrate it with an FPGA-based similarity metric from literature to complete a rigid image registration step (i.e., transformation, interpolation, and similarity metric) attaining between 18.60× and 74.04× speedup and between 36.11× and 117.65× energy efficiency improvement over the top-performing hardware-accelerated solutions.
Software Dependencies
- Vitis 2022.1 & Vivado 2022.1: To build the different designs
- XRT 2022.1: To target the accelerator
- OpenCV-3.0.0 - Static Library: To load and store images
- Python 3.8
- GCC 7.3.1
Hardware Dependencies
- Versal VCK5000 - XDMA2022.1 . PCIe 3.0
- Intel I7-4470: Other CPUs may work as well, but currently untested.
The folder bitstreams/ contains the bitstreams used for the evaluation.
Alternatively, the bitstreams can be rebuilt by following the instructions in the building section.
- Clone the repository
git clone https://github.com/necst/trilli.git
- Move into the repository
cd trilli - Source Vitis & XRT
source <YOUR_PATH_TO_XRT>/setup.sh source <YOUR_PATH_TO_VITIS>/2022.1/settings64.sh
- Build the host code for all the necessary configurations:
This will create multiple folders under
./build_hosts.sh
build/containing the various configurations that should be tested. - Move the
build/folder to the deploy machine
Note: the following operations must be performed on the deploy machine (where the build/ folder has been moved).
In figure 6, we evaluate how scaling the number of IPEs (1, 2, 4, 8, 16 and 32) impacts execution time for the geometric transformation with interpolation, for different depths (32, 64, 128, 256 and 512). The builds for this experiment are placed in subfolders under the build/ folder, named onlyTX_XXIPE, where XX is the number of IPEs. The needed builds are the following:
build/onlyTX_01IPEbuild/onlyTX_02IPEbuild/onlyTX_04IPEbuild/onlyTX_08IPEbuild/onlyTX_16IPEbuild/onlyTX_32IPE
- Move into
build/and source XRT:cd build source <YOUR_PATH_TO_XRT>/setup.sh
- For each configuration, enter its respective folder under
build/. E.g.cd onlyTX_01IPE - Run the experiment:
This will run the transformation for each depth (32, 64, 128, 256 and 512) and store the execution times in 5 different csv files.
./run_scaling_depth.sh
- After all configurations have been run, each folder will contain 5 csv files, one for each depth. For plotting, the csv files need to be copied into
paper_fig/figure6/csv/. To do so, launch the following command in thebuild/folder:cd .. ./gather_results_fig6.sh - Plot figure 6:
cd paper_fig/figure6/ python3 figure6.py
Note: the following operations must be performed on the deploy machine (where the build/ folder has been moved).
In figure 7, we compare the execution times of the transformation only (build/onlyTX_32IPE), single registration step (build/STEP_32IPE) and complete registration application (build/3DIR_Application) against the state of the art.
- Move into
build/if you haven't done it before, and source XRT:cd build source <YOUR_PATH_TO_XRT>/setup.sh
- If you have already run the configurations for Figure 6, configuration
onlyTX_32IPEhas already been ru and you can skip to the next step. Otherwise, enter the folder and run the tests:cd onlyTX_32IPE ./run_for_SoA_comparison.sh cd ..
- For the single registration step, enter the respective folder and run the tests:
cd STEP_32IPE ./run_for_SoA_comparison.sh cd ..
- Finally, run the complete registration application:
Note: to get a proper dataset contact the authors privately. Alternatively, run
cd 3DIR_Application ./exec.sh./generate_dataset.sh - Each folder will contain a csv files with the execution times. For plotting, the csv files need to be copied into
paper_fig/figure7/csv/. To do so, launch the following command in thebuild/folder:cd .. ./gather_results_fig7.sh - Plot figure 7:
cd paper_fig/figure7/ python3 figure7.py
Note: the following operations must be performed on the deploy machine (where the build/ folder has been moved).
This figure evaluate the registration correctness upon the whole 3D image registration step, to align a transformed floating volume with respect to a reference.
The paper_fig/figure8/data already contains all the material for reproducing the figure. Alternatively, the images in data folder needs to be re-created. In this latter case, please contact us and we will privately send you the dataset.
cd build/paper_fig/figure8/
python3 figure8.py- Move into
build/if you haven't done it before, and source XRT:cd build source <YOUR_PATH_TO_XRT>/setup.sh
- Apply a deformation to the floating volume:
./generate_distortion.sh 246 10 10 10
- Now, apply the 3D image registration step using the distorted volume as floating volume:
cd 3DIRG_Application ./exec.sh ../onlyTX_32IPE/dataset_output/ cd ..
- To plot this figure, the first slice from the original volume, from the distorted one and from the registered one, need to be copied into
paper_fig/figure8/data/. To do so, launch the following command in thebuild/folder:./gather_images_fig8.sh
- Now it is possible to produce the plot again
cd paper_fig/figure8/ python3 figure8.py