ReFHE-NTT: Resource-Driven NTT FPGA Architecture for Fully Homomorphic Encryption

Artifact Evaluation — FCCM 2026

This repository contains the complete artifact for reproducing the two main experimental results presented in the paper:

#	Result	Script	Output
1	FPGA resource utilization (Table)	`build_all.py`	`utilization.csv`
2	CKKS HW vs. SW timing (Figure)	`run_all_tests.py`	`timing_results.pdf`

Pre-built FPGA bitstreams for all 10 configurations are included in Pre_Built_Bitstreams/, so Result 2 can be reproduced without any Xilinx synthesis tools — only the Kria KV260 board is needed.

Repository Structure

ReFHE-NTT/
├── src/                        # HLS source code (NTT kernel + modular arithmetic)
├── Build/
│   ├── HLS/                    # Vitis HLS project (script.tcl, directives.tcl)
│   └── VIVADO/                 # Vivado block designs
│       ├── vivado_proj.tcl     #   Mersenne template (300 MHz PL clock)
│       └── vivado_proj_b.tcl   #   Barrett  template (143 MHz PL clock)
├── Pre_Built_Bitstreams/       # Pre-compiled bitstreams for all 10 configurations
│   ├── MERSENNE/               #   NTT_M_64_{12..16}_wrapper.xsa
│   └── BARRETT/                #   NTT_B_64_{12..16}_wrapper.xsa
├── Heaan_ckks/                 # CKKS integration test (HW/SW co-design)
├── builds/                     # Output directory for newly built bitstreams
│
├── configure.sh                # Generate src/parameters.h for a configuration
├── Makefile                    # Top-level build orchestration (HLS + Vivado)
├── build_all.py                # [Result 1] Build all configs, collect utilization
├── package_test.sh             # Package test + bitstreams for device deployment
├── utilization.csv             # [Result 1] Pre-generated utilization table
└── README.md                   # This file

Design Configurations

The NTT kernel is parameterized by polynomial dimension (LOGN) and modular reduction strategy (Mersenne or Barrett):

Parameter	Values	Description
LOGN	12, 13, 14, 15, 16	log2(N), polynomial ring dimension
MODE	`mersenne`, `barrett`	Modular reduction strategy
LOGQ	64	Prime bit-width (fixed)

This yields 10 configurations (5 LOGN values x 2 modes). Each mode uses a different Vivado block design with a distinct PL clock frequency:

Mode	HLS Clock Target	Vivado PL Frequency	Vivado TCL Template
Mersenne	5 ns	300 MHz	`Build/VIVADO/vivado_proj.tcl`
Barrett	7 ns	143 MHz	`Build/VIVADO/vivado_proj_b.tcl`

Dependencies

Host Machine (Result 1 — building bitstreams from source)

Dependency	Version	Notes
AMD/Xilinx Vitis HLS	2023.2	HLS synthesis and IP export
AMD/Xilinx Vivado	2024.2	Block design, place & route
Python	>= 3.6	Runs `build_all.py`
GNU Make	any	Build orchestration
bash	any	`configure.sh` and Makefile recipes
Linux x86_64	any	At least 16 GB RAM recommended

Building all 10 configurations takes several hours. A pre-generated utilization.csv and pre-built bitstreams are provided.

Kria KV260 Device (Result 2 — timing measurements)

Dependency	Version	Notes
PYNQ SD image	3.0+	Base OS for the Kria board
XRT (Xilinx Runtime)	included in PYNQ	FPGA programming and buffer management
g++	C++17 support	Compiles the test binary on-board
Python 3	>= 3.6	Runs `run_all_tests.py`
matplotlib	any	`pip3 install matplotlib`
numpy	any	`pip3 install numpy`
sudo access	—	Required for FPGA device access

Configurable Paths

Several scripts reference environment-specific paths via Make variables. All have defaults that match common installation layouts, but must be overridden if your setup differs.

Host machine (top-level Makefile, used by build_all.py):

Variable	Default	Purpose
`VITIS_HLS_SETTINGS`	`/home/xilinx/Vitis_HLS/2023.2/settings64.sh`	Vitis HLS environment
`VIVADO_SETTINGS`	`/home/xilinx/Vivado/2024.2/settings64.sh`	Vivado environment

Override from the command line or environment:

make VITIS_HLS_SETTINGS=/opt/Xilinx/Vitis_HLS/2023.2/settings64.sh \
     VIVADO_SETTINGS=/opt/Xilinx/Vivado/2024.2/settings64.sh \
     LOGN=16 LOGQ=64 all

Or export before running build_all.py:

export VITIS_HLS_SETTINGS=/opt/Xilinx/Vitis_HLS/2023.2/settings64.sh
export VIVADO_SETTINGS=/opt/Xilinx/Vivado/2024.2/settings64.sh
python3 build_all.py

Kria device (heaan_test/Makefile, generated by package_test.sh):

Variable	Default	Purpose
`XRT_SETUP`	`/home/ubuntu/Kria-PYNQ/pynq/sdbuild/packages/xrt/xrt_setup.sh`	XRT runtime environment
`PYNQ_VENV`	`/usr/local/share/pynq-venv/bin/activate`	PYNQ Python virtual environment

If your PYNQ image uses different paths, override when running:

make XRT_SETUP=/path/to/xrt_setup.sh PYNQ_VENV=/path/to/pynq-venv/bin/activate \
     LOGN=16 MODE=mersenne run

These variables are sourced with 2>/dev/null, so if they do not exist and XRT is already in your PATH, the build and run will still succeed.

Result 1: FPGA Resource Utilization

This result produces a CSV table with LUT, FF, BRAM, DSP, and URAM counts for each of the 10 NTT configurations.

Option A: Verify Pre-Generated Results

A pre-generated utilization.csv is included at the repository root:

cat utilization.csv

Expected columns: Design, Mode, LOGN, LOGQ, LUTs, FF, BRAM, DSP, URAM

Option B: Rebuild from Source

Step 1. Set Xilinx tool paths (adjust for your installation):

export VITIS_HLS_SETTINGS=/path/to/Vitis_HLS/2023.2/settings64.sh
export VIVADO_SETTINGS=/path/to/Vivado/2024.2/settings64.sh

Step 2. Preview the build plan (dry run):

python3 build_all.py --dry-run

This prints all 10 configurations without building.

Step 3. Build all configurations and collect utilization:

python3 build_all.py

For each of the 10 configurations the script:

Calls configure.sh to generate src/parameters.h with the correct defines (MERSENNE_NTT or BARRETT_RED, LOGN, LOGp, etc.)
Copies sources into Build/HLS/ and runs Vitis HLS with the mode-appropriate clock period (5 ns for Mersenne, 7 ns for Barrett)
Substitutes the design name into the matching Vivado TCL template (vivado_proj.tcl at 300 MHz for Mersenne, vivado_proj_b.tcl at 143 MHz for Barrett) and runs place & route
Parses the Vivado post-place utilization report (*_wrapper_utilization_placed.rpt)
Writes one row to utilization.csv

Built bitstreams (.xsa files) are saved to builds/. At the end, the script prints the full utilization table to the terminal.

Step 4. To build a single configuration manually:

make LOGN=14 LOGQ=64 all

Result 2: CKKS HW vs. SW Timing Comparison

This result measures the end-to-end timing of NTT, Encode, and Encrypt operations using the FPGA accelerator (HW) versus a pure software implementation (SW). The test runs all 10 configurations (5 LOGN x 2 modes), each repeated 50 times, and produces a grouped bar chart.

Pre-built bitstreams are included so that this result can be reproduced without rebuilding — only the Kria KV260 board is needed.

Step 1: Package the Test for the Device

On the host machine, run:

./package_test.sh

By default, this packages the bitstreams from builds/ — i.e., the ones generated in Result 1. If builds/ is empty or you want to skip Result 1 entirely, use --prebuilt to package the provided pre-built bitstreams instead:

./package_test.sh --prebuilt

Either way, heaan_test.zip is created containing:

CKKS test source code (from Heaan_ckks/)
All 10 bitstreams (Mersenne + Barrett, LOGN 12..16)
A self-contained Makefile with configurable XRT_SETUP and PYNQ_VENV paths
The run_all_tests.py automation script

Step 2: Deploy to the Kria KV260

Transfer the zip to the board and extract it:

scp heaan_test.zip ubuntu@<kria-ip>:~/Desktop/
ssh ubuntu@<kria-ip>
cd ~/Desktop
unzip heaan_test.zip
cd heaan_test

Step 3: Install Python Dependencies

pip3 install matplotlib numpy

Step 4: Verify Device Paths

Check that the XRT and PYNQ paths in the Makefile match your board. The defaults assume a standard Kria-PYNQ image:

make help    # shows current variable values and available bitstreams

If your paths differ, either edit the Makefile or override on every make / run_all_tests.py invocation (see Configurable Paths).

Step 5: Run All Tests

sudo python3 run_all_tests.py

This script automatically:

Iterates over all 10 configurations (LOGN=12..16, MODE=mersenne+barrett)
For each configuration:
- Recompiles the test binary with the correct -DLOGN, -DCSV_PATH, -DHwXSA_PATH, and -DMM_WIDTH_64 (Mersenne only) flags
- Loads the corresponding bitstream onto the FPGA
- Runs a correctness check (HW decode must match SW decode)
- Executes 50 timed runs measuring NTT, Encode, and Encrypt (HW and SW)
- Writes per-configuration results to timing_{mode}_{logn}.csv
Aggregates all results into results_summary.csv
Generates timing_results.pdf and timing_results.png

Options:

# Fewer runs per configuration (faster, less precise)
sudo python3 run_all_tests.py --nruns 10

# Regenerate the plot from existing CSV files (no FPGA needed)
python3 run_all_tests.py --plot-only

Step 6 (optional): Run a Single Configuration Manually

# Compile and run Mersenne LOGN=16 with 50 runs
make LOGN=16 MODE=mersenne run

# Compile and run Barrett LOGN=14 with 20 runs
make LOGN=14 MODE=barrett NRUNS=20 run

The run target requires sudo for FPGA access and automatically sources the XRT and PYNQ environments.

Output Files

File	Description
`timing_{mode}_{logn}.csv`	Per-run timing for one configuration (microseconds)
`results_summary.csv`	Aggregated averages across all configurations (milliseconds)
`timing_results.pdf`	Grouped bar chart (publication quality)
`timing_results.png`	Same chart in PNG format

Troubleshooting

Build fails with "Mersenne not available" Mersenne reduction requires pre-computed shift-based multipliers, available only for LOGQ in {52, 55, 60, 61, 63, 64}. For other bit-widths, only Barrett mode is built.

"Bitstream not found" on the device Verify that bitstreams/MERSENNE/ and bitstreams/BARRETT/ contain the .xsa files. Re-run ./package_test.sh on the host if they are missing.

XRT / PYNQ environment not found If make run fails to find XRT or PYNQ, check that the paths are correct:

ls /home/ubuntu/Kria-PYNQ/pynq/sdbuild/packages/xrt/xrt_setup.sh
ls /usr/local/share/pynq-venv/bin/activate

Override via make XRT_SETUP=... PYNQ_VENV=... run if they differ. If XRT is already in your PATH (e.g., sourced in .bashrc), the defaults will work even if the script paths do not exist.

Stack overflow / segfault on the device Large polynomial dimensions (LOGN=15, 16) require a large stack. The Makefile sets ulimit -s 1000000 automatically. If running manually:

sudo bash -c 'ulimit -s 1000000; ./main'

matplotlib font warnings The plotting script uses serif fonts with a fallback chain (DejaVu Serif, Palatino, Times New Roman). If LaTeX rendering is not available (no texlive installed), it falls back to standard matplotlib fonts automatically. No additional font packages are required.

Plot shows only some configurations Ensure all timing_{mode}_{logn}.csv files exist before running --plot-only. Missing CSVs mean the corresponding tests did not complete.

Reference

@inproceedings{guerrini2026refhe,
  title     = {ReFHE-NTT: Resource-Driven NTT FPGA Architecture for Fully Homomorphic Encryption},
  author    = {Guerrini, Valentino and Sorrentino, Giuseppe and Barenghi, Alessandro and Conficconi, Davide},
  booktitle = {Proceedings of the 34th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM)},
  year      = {2026},
  organization = {IEEE},
  note      = {To appear}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReFHE-NTT: Resource-Driven NTT FPGA Architecture for Fully Homomorphic Encryption

Repository Structure

Design Configurations

Dependencies

Host Machine (Result 1 — building bitstreams from source)

Kria KV260 Device (Result 2 — timing measurements)

Configurable Paths

Result 1: FPGA Resource Utilization

Option A: Verify Pre-Generated Results

Option B: Rebuild from Source

Result 2: CKKS HW vs. SW Timing Comparison

Step 1: Package the Test for the Device

Step 2: Deploy to the Kria KV260

Step 3: Install Python Dependencies

Step 4: Verify Device Paths

Step 5: Run All Tests

Step 6 (optional): Run a Single Configuration Manually

Output Files

Troubleshooting

Reference

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Build		Build
Heaan_ckks		Heaan_ckks
Pre_Built_Bitstreams		Pre_Built_Bitstreams
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build_all.py		build_all.py
configure.sh		configure.sh
example_utilization.csv		example_utilization.csv
package_test.sh		package_test.sh
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

ReFHE-NTT: Resource-Driven NTT FPGA Architecture for Fully Homomorphic Encryption

Repository Structure

Design Configurations

Dependencies

Host Machine (Result 1 — building bitstreams from source)

Kria KV260 Device (Result 2 — timing measurements)

Configurable Paths

Result 1: FPGA Resource Utilization

Option A: Verify Pre-Generated Results

Option B: Rebuild from Source

Result 2: CKKS HW vs. SW Timing Comparison

Step 1: Package the Test for the Device

Step 2: Deploy to the Kria KV260

Step 3: Install Python Dependencies

Step 4: Verify Device Paths

Step 5: Run All Tests

Step 6 (optional): Run a Single Configuration Manually

Output Files

Troubleshooting

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages