Skip to content

necst/ReFHE-NTT

Repository files navigation

ReFHE-NTT: Resource-Driven NTT FPGA Architecture for Fully Homomorphic Encryption

Artifact Evaluation — FCCM 2026

This repository contains the complete artifact for reproducing the two main experimental results presented in the paper:

# Result Script Output
1 FPGA resource utilization (Table) build_all.py utilization.csv
2 CKKS HW vs. SW timing (Figure) run_all_tests.py timing_results.pdf

Pre-built FPGA bitstreams for all 10 configurations are included in Pre_Built_Bitstreams/, so Result 2 can be reproduced without any Xilinx synthesis tools — only the Kria KV260 board is needed.


Repository Structure

ReFHE-NTT/
├── src/                        # HLS source code (NTT kernel + modular arithmetic)
├── Build/
│   ├── HLS/                    # Vitis HLS project (script.tcl, directives.tcl)
│   └── VIVADO/                 # Vivado block designs
│       ├── vivado_proj.tcl     #   Mersenne template (300 MHz PL clock)
│       └── vivado_proj_b.tcl   #   Barrett  template (143 MHz PL clock)
├── Pre_Built_Bitstreams/       # Pre-compiled bitstreams for all 10 configurations
│   ├── MERSENNE/               #   NTT_M_64_{12..16}_wrapper.xsa
│   └── BARRETT/                #   NTT_B_64_{12..16}_wrapper.xsa
├── Heaan_ckks/                 # CKKS integration test (HW/SW co-design)
├── builds/                     # Output directory for newly built bitstreams
│
├── configure.sh                # Generate src/parameters.h for a configuration
├── Makefile                    # Top-level build orchestration (HLS + Vivado)
├── build_all.py                # [Result 1] Build all configs, collect utilization
├── package_test.sh             # Package test + bitstreams for device deployment
├── utilization.csv             # [Result 1] Pre-generated utilization table
└── README.md                   # This file

Design Configurations

The NTT kernel is parameterized by polynomial dimension (LOGN) and modular reduction strategy (Mersenne or Barrett):

Parameter Values Description
LOGN 12, 13, 14, 15, 16 log2(N), polynomial ring dimension
MODE mersenne, barrett Modular reduction strategy
LOGQ 64 Prime bit-width (fixed)

This yields 10 configurations (5 LOGN values x 2 modes). Each mode uses a different Vivado block design with a distinct PL clock frequency:

Mode HLS Clock Target Vivado PL Frequency Vivado TCL Template
Mersenne 5 ns 300 MHz Build/VIVADO/vivado_proj.tcl
Barrett 7 ns 143 MHz Build/VIVADO/vivado_proj_b.tcl

Dependencies

Host Machine (Result 1 — building bitstreams from source)

Dependency Version Notes
AMD/Xilinx Vitis HLS 2023.2 HLS synthesis and IP export
AMD/Xilinx Vivado 2024.2 Block design, place & route
Python >= 3.6 Runs build_all.py
GNU Make any Build orchestration
bash any configure.sh and Makefile recipes
Linux x86_64 any At least 16 GB RAM recommended

Building all 10 configurations takes several hours. A pre-generated utilization.csv and pre-built bitstreams are provided.

Kria KV260 Device (Result 2 — timing measurements)

Dependency Version Notes
PYNQ SD image 3.0+ Base OS for the Kria board
XRT (Xilinx Runtime) included in PYNQ FPGA programming and buffer management
g++ C++17 support Compiles the test binary on-board
Python 3 >= 3.6 Runs run_all_tests.py
matplotlib any pip3 install matplotlib
numpy any pip3 install numpy
sudo access Required for FPGA device access

Configurable Paths

Several scripts reference environment-specific paths via Make variables. All have defaults that match common installation layouts, but must be overridden if your setup differs.

Host machine (top-level Makefile, used by build_all.py):

Variable Default Purpose
VITIS_HLS_SETTINGS /home/xilinx/Vitis_HLS/2023.2/settings64.sh Vitis HLS environment
VIVADO_SETTINGS /home/xilinx/Vivado/2024.2/settings64.sh Vivado environment

Override from the command line or environment:

make VITIS_HLS_SETTINGS=/opt/Xilinx/Vitis_HLS/2023.2/settings64.sh \
     VIVADO_SETTINGS=/opt/Xilinx/Vivado/2024.2/settings64.sh \
     LOGN=16 LOGQ=64 all

Or export before running build_all.py:

export VITIS_HLS_SETTINGS=/opt/Xilinx/Vitis_HLS/2023.2/settings64.sh
export VIVADO_SETTINGS=/opt/Xilinx/Vivado/2024.2/settings64.sh
python3 build_all.py

Kria device (heaan_test/Makefile, generated by package_test.sh):

Variable Default Purpose
XRT_SETUP /home/ubuntu/Kria-PYNQ/pynq/sdbuild/packages/xrt/xrt_setup.sh XRT runtime environment
PYNQ_VENV /usr/local/share/pynq-venv/bin/activate PYNQ Python virtual environment

If your PYNQ image uses different paths, override when running:

make XRT_SETUP=/path/to/xrt_setup.sh PYNQ_VENV=/path/to/pynq-venv/bin/activate \
     LOGN=16 MODE=mersenne run

These variables are sourced with 2>/dev/null, so if they do not exist and XRT is already in your PATH, the build and run will still succeed.


Result 1: FPGA Resource Utilization

This result produces a CSV table with LUT, FF, BRAM, DSP, and URAM counts for each of the 10 NTT configurations.

Option A: Verify Pre-Generated Results

A pre-generated utilization.csv is included at the repository root:

cat utilization.csv

Expected columns: Design, Mode, LOGN, LOGQ, LUTs, FF, BRAM, DSP, URAM

Option B: Rebuild from Source

Step 1. Set Xilinx tool paths (adjust for your installation):

export VITIS_HLS_SETTINGS=/path/to/Vitis_HLS/2023.2/settings64.sh
export VIVADO_SETTINGS=/path/to/Vivado/2024.2/settings64.sh

Step 2. Preview the build plan (dry run):

python3 build_all.py --dry-run

This prints all 10 configurations without building.

Step 3. Build all configurations and collect utilization:

python3 build_all.py

For each of the 10 configurations the script:

  1. Calls configure.sh to generate src/parameters.h with the correct defines (MERSENNE_NTT or BARRETT_RED, LOGN, LOGp, etc.)
  2. Copies sources into Build/HLS/ and runs Vitis HLS with the mode-appropriate clock period (5 ns for Mersenne, 7 ns for Barrett)
  3. Substitutes the design name into the matching Vivado TCL template (vivado_proj.tcl at 300 MHz for Mersenne, vivado_proj_b.tcl at 143 MHz for Barrett) and runs place & route
  4. Parses the Vivado post-place utilization report (*_wrapper_utilization_placed.rpt)
  5. Writes one row to utilization.csv

Built bitstreams (.xsa files) are saved to builds/. At the end, the script prints the full utilization table to the terminal.

Step 4. To build a single configuration manually:

make LOGN=14 LOGQ=64 all

Result 2: CKKS HW vs. SW Timing Comparison

This result measures the end-to-end timing of NTT, Encode, and Encrypt operations using the FPGA accelerator (HW) versus a pure software implementation (SW). The test runs all 10 configurations (5 LOGN x 2 modes), each repeated 50 times, and produces a grouped bar chart.

Pre-built bitstreams are included so that this result can be reproduced without rebuilding — only the Kria KV260 board is needed.

Step 1: Package the Test for the Device

On the host machine, run:

./package_test.sh

By default, this packages the bitstreams from builds/ — i.e., the ones generated in Result 1. If builds/ is empty or you want to skip Result 1 entirely, use --prebuilt to package the provided pre-built bitstreams instead:

./package_test.sh --prebuilt

Either way, heaan_test.zip is created containing:

  • CKKS test source code (from Heaan_ckks/)
  • All 10 bitstreams (Mersenne + Barrett, LOGN 12..16)
  • A self-contained Makefile with configurable XRT_SETUP and PYNQ_VENV paths
  • The run_all_tests.py automation script

Step 2: Deploy to the Kria KV260

Transfer the zip to the board and extract it:

scp heaan_test.zip ubuntu@<kria-ip>:~/Desktop/
ssh ubuntu@<kria-ip>
cd ~/Desktop
unzip heaan_test.zip
cd heaan_test

Step 3: Install Python Dependencies

pip3 install matplotlib numpy

Step 4: Verify Device Paths

Check that the XRT and PYNQ paths in the Makefile match your board. The defaults assume a standard Kria-PYNQ image:

make help    # shows current variable values and available bitstreams

If your paths differ, either edit the Makefile or override on every make / run_all_tests.py invocation (see Configurable Paths).

Step 5: Run All Tests

sudo python3 run_all_tests.py

This script automatically:

  1. Iterates over all 10 configurations (LOGN=12..16, MODE=mersenne+barrett)
  2. For each configuration:
    • Recompiles the test binary with the correct -DLOGN, -DCSV_PATH, -DHwXSA_PATH, and -DMM_WIDTH_64 (Mersenne only) flags
    • Loads the corresponding bitstream onto the FPGA
    • Runs a correctness check (HW decode must match SW decode)
    • Executes 50 timed runs measuring NTT, Encode, and Encrypt (HW and SW)
    • Writes per-configuration results to timing_{mode}_{logn}.csv
  3. Aggregates all results into results_summary.csv
  4. Generates timing_results.pdf and timing_results.png

Options:

# Fewer runs per configuration (faster, less precise)
sudo python3 run_all_tests.py --nruns 10

# Regenerate the plot from existing CSV files (no FPGA needed)
python3 run_all_tests.py --plot-only

Step 6 (optional): Run a Single Configuration Manually

# Compile and run Mersenne LOGN=16 with 50 runs
make LOGN=16 MODE=mersenne run

# Compile and run Barrett LOGN=14 with 20 runs
make LOGN=14 MODE=barrett NRUNS=20 run

The run target requires sudo for FPGA access and automatically sources the XRT and PYNQ environments.

Output Files

File Description
timing_{mode}_{logn}.csv Per-run timing for one configuration (microseconds)
results_summary.csv Aggregated averages across all configurations (milliseconds)
timing_results.pdf Grouped bar chart (publication quality)
timing_results.png Same chart in PNG format

Troubleshooting

Build fails with "Mersenne not available" Mersenne reduction requires pre-computed shift-based multipliers, available only for LOGQ in {52, 55, 60, 61, 63, 64}. For other bit-widths, only Barrett mode is built.

"Bitstream not found" on the device Verify that bitstreams/MERSENNE/ and bitstreams/BARRETT/ contain the .xsa files. Re-run ./package_test.sh on the host if they are missing.

XRT / PYNQ environment not found If make run fails to find XRT or PYNQ, check that the paths are correct:

ls /home/ubuntu/Kria-PYNQ/pynq/sdbuild/packages/xrt/xrt_setup.sh
ls /usr/local/share/pynq-venv/bin/activate

Override via make XRT_SETUP=... PYNQ_VENV=... run if they differ. If XRT is already in your PATH (e.g., sourced in .bashrc), the defaults will work even if the script paths do not exist.

Stack overflow / segfault on the device Large polynomial dimensions (LOGN=15, 16) require a large stack. The Makefile sets ulimit -s 1000000 automatically. If running manually:

sudo bash -c 'ulimit -s 1000000; ./main'

matplotlib font warnings The plotting script uses serif fonts with a fallback chain (DejaVu Serif, Palatino, Times New Roman). If LaTeX rendering is not available (no texlive installed), it falls back to standard matplotlib fonts automatically. No additional font packages are required.

Plot shows only some configurations Ensure all timing_{mode}_{logn}.csv files exist before running --plot-only. Missing CSVs mean the corresponding tests did not complete.

Reference

@inproceedings{guerrini2026refhe,
  title     = {ReFHE-NTT: Resource-Driven NTT FPGA Architecture for Fully Homomorphic Encryption},
  author    = {Guerrini, Valentino and Sorrentino, Giuseppe and Barenghi, Alessandro and Conficconi, Davide},
  booktitle = {Proceedings of the 34th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM)},
  year      = {2026},
  organization = {IEEE},
  note      = {To appear}
}

About

ReFHE-NTT: Resource-Driven NTT FPGA Architecture for Fully Homomorphic Encryption

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors