Skip to content

Varun2459/FPGA-Image-Processing-HLS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time FPGA Image Processing Accelerator using HLS

A hardware-accelerated Canny-style edge detection pipeline running on the DE1-SoC (Intel Cyclone V), built with Intel's High-Level Synthesis (HLS) compiler. The system processes a live 640x480 video stream at 60 Hz and outputs the edge-detected result over VGA in real time.

This project was completed as an MSc dissertation at the University of Liverpool (Telecommunications and Wireless Systems, 2024-2025).

Key Results

Metric Value
Initiation Interval (II) 1 (one pixel per clock cycle)
Pixel clock 100 MHz
Sustained throughput 100 M pixels/s (~325 fps at 640x480)
Worst-case Fmax (Slow 1100mV 85C) 138.87 MHz
ALM usage (full system) 7,398 / 32,070 (23%)
Block RAM bits 369,278 / 4,065,280 (9%)
DSP blocks 21 / 87 (24%)
Total registers 16,845
Total pins 204 / 457 (45%)
PLLs 1 / 6 (17%)

HLS Component Estimated Resources (i++ Report)

Resource Usage Available %
ALUTs 8,005 109,572 7%
Flip-Flops 13,235 219,144 6%
RAMs 30 514 6%
DSPs 3.5 112 4%

System Architecture

                        DE1-SoC VIP Pipeline (Platform Designer / Qsys)

  ┌──────────────┐    ┌──────────┐    ┌────────────┐    ┌───────────┐    ┌──────────┐
  │  Clocked     │    │  Colour  │    │Deinterlacer│    │  Chroma   │    │ Clipper  │
  │  Video Input ├───►│  Plane   ├───►│   II (4K   ├───►│ Resampler ├───►│ II (4K   │
  │  II (4K)     │    │ Seq. II  │    │HDR Passthru│    │  II (4K)  │    │  Ready)  │
  └──────────────┘    └──────────┘    └────────────┘    └───────────┘    └────┬─────┘
                                                                              │
                                                                              ▼
  ┌──────────────┐    ┌──────────┐    ┌────────────┐    ┌───────────┐    ┌──────────┐
  │  Clocked     │    │  Frame   │    │  RAW-to-   │    │  filters  │    │ VIP-to-  │
  │  Video Out   │◄───┤  Buffer  │◄───┤    VIP     │◄───┤  (HLS)   │◄───┤   RAW    │
  │  (VGA)       │    │  II (4K) │    │   Bridge   │    │          │    │  Bridge  │
  └──────────────┘    └──────────┘    └────────────┘    └───────────┘    └──────────┘
        │                  │
        ▼                  ▼
   VGA Monitor         SDRAM Controller

The HLS filters block sits between two VIP/RAW bridges. Video data enters as packetised Avalon-ST (8 bits/symbol, 3 symbols/beat for RGB 8:8:8, with SOP/EOP framing). The pipeline processes one pixel per clock cycle, maintaining II = 1 throughout.

Processing Pipeline Stages

Input (RGB 8:8:8)
      │
      ▼
  Greyscale Conversion ──► (R + G + B) / 3
      │
      ▼
  5×5 Gaussian Blur ──► Weighted kernel (sum = 159), buffered across 5 line stores
      │
      ▼
  3×3 Sobel Gradient ──► Gx and Gy computed over 3 buffered Gaussian output lines
      │                    Gradient magnitude = |Gx| + |Gy|
      │                    Direction quantised to 4 bins (0°, 45°, 90°, 135°)
      ▼
  Non-Maximum Suppression ──► 3×3 neighbourhood comparison along gradient direction
      │                        Thin edges to single-pixel width
      ▼
  Double Threshold ──► Low = 50, High = 130
      │                 Strong edges (> 130) → 255
      │                 Weak edges (50-130) → kept only if NMS direction = 1
      │                 Below low → 0
      ▼
  Output (greyscale edge map, replicated to RGB for VGA)

All line buffers are shift-register based. Each new pixel shifts into position 0, and the oldest pixel falls off the far end. This avoids any address-based memory access and lets the HLS compiler map the buffers to on-chip block RAM.

Clock Domains

Clock Frequency Source Role
CLOCK_50 50 MHz Board oscillator System reference
PLL general[0] 100 MHz Derived SDRAM (phase-shifted for external)
PLL general[1] 100 MHz Derived Pixel processing clock (VIP chain + HLS filters)
PLL general[2] 25 MHz Derived VGA pixel clock (640x480 @ 60 Hz)
PLL general[3] ~18.33 MHz Derived Audio clock
TD_CLK27 27 MHz Board oscillator TV decoder

The HLS block and VIP pipeline both run on the 100 MHz general[1] domain. VGA output uses the 25 MHz general[2] domain. A single fractional PLL (1100.11 MHz VCO) generates all derived clocks.

Fmax Summary (Slow 1100mV 85C, Worst-Case Corner)

Clock Domain Achieved Fmax Required Margin
PLL general[2] (VGA path) 138.87 MHz 25 MHz +113.87 MHz
PLL general[1] (pixel processing) 141.18 MHz 100 MHz +41.18 MHz
CLOCK_50 193.99 MHz 50 MHz +143.99 MHz
TD_CLK27 201.98 MHz 27 MHz +174.98 MHz
PLL general[3] (audio) 499.50 MHz 18.33 MHz +481.17 MHz

All clocks meet timing with positive slack. TNS = 0.000 across all domains and all timing corners (Slow 85C, Slow 0C, Fast 85C, Fast 0C).

HLS Loop Analysis

The main processing loop (filters.B2 at filters.cpp:59) is pipelined with II ~1 (approximately 1 cycle per iteration). All inner shift-register loops are fully unrolled via #pragma unroll, which eliminates loop overhead and lets the compiler schedule them as combinational logic within a single clock cycle.

Repository Structure

FPGA-Image-Processing-HLS/
├── README.md                   ← You are here
├── .gitignore
├── LICENSE
│
├── hls/
│   └── filters.cpp             ← HLS C++ kernel source (Intel i++ compiler)
│
├── quartus/
│   └── README.md               ← Notes on the Quartus project structure
│
└── docs/
    ├── build-guide.md           ← Step-by-step build and programming instructions
    ├── screenshots/             ← Place Platform Designer, Fitter, STA screenshots here
    └── reports/                 ← Place .rpt and .summary files here

Note on Quartus project files: The full Quartus project (DE1_SoC_VIP_TV_640x480_W/) and generated IP files are not included in this repository due to their size and Intel's VIP IP licensing. The hls/ directory contains the original HLS source code, and the docs/build-guide.md explains how to recreate the full project from Intel's DE1-SoC VIP demonstration design.

Hardware and Software Requirements

Hardware

  • FPGA Board: Terasic DE1-SoC (Intel Cyclone V 5CSEMA5F31C6N)
  • Video Input: TV decoder on-board, or any Avalon-ST video source
  • Video Output: VGA monitor connected to the DE1-SoC VGA port
  • Programming: USB-Blaster II (on-board, via USB cable)

Software

  • Intel Quartus Prime 18.1 Standard Edition (Build 625)
  • Intel HLS Compiler (i++) version 18.1.0 Build 625, invoked from the Nios II Command Shell
  • Visual Studio 2010 Professional (required by the Intel HLS Compiler for Windows)
  • Platform Designer (Qsys) (included with Quartus Prime)
  • OS: Windows 10/11

Quick Start

Full instructions are in docs/build-guide.md. The short version:

1. Compile the HLS kernel

# From the Nios II Command Shell
cd C:/path/to/HLS/filters
i++ -march="Cyclone V" --simulator none -v -o filters filters.cpp

This generates filters.prj/components/filters/ containing the synthesisable RTL and Platform Designer IP files.

2. Integrate into Platform Designer

Open the DE1-SoC VIP Quartus project, launch Platform Designer, and add the HLS component's IP search path:

C:/path/to/HLS/filters/filters.prj/components

Insert the filters component between the VIP-to-RAW and RAW-to-VIP bridges. Connect clock, reset, and Avalon-ST interfaces. Generate HDL.

3. Compile and program

# Full Quartus compilation
quartus_sh --flow compile DE1_SoC_VIP_TV

# Program the FPGA via JTAG
quartus_pgm -m jtag -o "p;output_files/DE1_SoC_VIP_TV.sof"

HLS Source: filters.cpp

The kernel is a single hls_always_run_component that reads from an Avalon-ST input and writes to an Avalon-ST output. The full Canny-style pipeline runs inside a single while(!end_of_packet) loop, processing one pixel per iteration.

Key implementation details:

  • Greyscale is computed as the average of R, G, B channels (integer division by 3).
  • Gaussian blur uses a 5x5 kernel with weights summing to 159. Five line buffers (line0 through line4, each 640 entries) store the pixel rows. The kernel coefficients match a standard 5x5 Gaussian approximation (centre weight 15).
  • Sobel computes horizontal and vertical gradients over a 3x3 window on the Gaussian output. Three additional line buffers (gaussian_line0 through gaussian_line2) hold the blurred rows. Gradient direction is quantised into four bins using atan approximation via threshold comparisons on the Gy/Gx ratio.
  • Non-maximum suppression compares each pixel's gradient magnitude against its two neighbours along the gradient direction. Three more line buffers (sobel_line0 through sobel_line2) store the Sobel results with direction metadata.
  • Double threshold classifies pixels as strong (> 130), weak (50-130), or suppressed (< 50). Weak pixels are kept only if the NMS pass flagged them as local maxima.
  • Output delay uses a small shift register (final_delay, depth 12) and a large RGB buffer (rgb_buffer, depth 3201) to align the processed output with the original stream timing.
  • All line buffer shifts use #pragma unroll to ensure single-cycle execution.

Quartus Compilation Settings

The following non-default settings were applied for timing closure:

Setting Value Default
Optimisation Mode Aggressive Performance Balanced
Optimisation Technique Speed Balanced
Physical Synthesis (Combo Logic) On Off
Physical Synthesis (Register Retiming) On Off
Placement Effort Multiplier 4.0 1.0
Router Timing Optimisation Level Maximum Normal
Fitter Effort Standard Fit Auto Fit

Total compilation time: approximately 10 minutes 31 seconds (Analysis & Synthesis: 3:36, Fitter: 6:03, Assembler: 0:23, Timing Analyzer: 0:29).

STA Timing Summary (All Corners, All Clocks)

All setup, hold, recovery, removal, and minimum pulse width checks pass with positive slack. Zero TNS across every timing corner.

Worst-case setup slack on the pixel processing clock (general[1], 100 MHz):

Corner Slack
Slow 1100mV 85C +0.827 ns
Slow 1100mV 0C +1.124 ns
Fast 1100mV 85C +3.224 ns
Fast 1100mV 0C +3.416 ns

Platform Designer Warnings (Benign)

  • edi_read_master / write_master "must be an export" on the Deinterlacer II. These are unused memory-mapped master interfaces in the VIP IP. They can be safely ignored or exported and left unconnected.
  • PLL "Actual settings differ from Requested settings." The PLL quantises to the nearest achievable frequency. If the VGA image is stable and timing passes, this is fine.
  • Deinterlacer "configured for YCbCr 4:2:2 colour space." This is an informational message; the colour pipeline is coherent (confirmed by correct VGA output).

Author

Varun Venkata Tej Molleti MSc Telecommunications and Wireless Systems, University of Liverpool (2024-2025)

License

This project is provided for educational and portfolio purposes. The HLS source code (filters.cpp) is original work. Intel VIP IP cores and Quartus project templates are subject to Intel's licensing terms.

About

Real-time Canny-style edge detection pipeline on Intel Cyclone V - HLS C++ kernel, II=1, 100M pixels/s, Fmax ≥ 138.87 MHz, live VGA output

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages