A hardware-accelerated Canny-style edge detection pipeline running on the DE1-SoC (Intel Cyclone V), built with Intel's High-Level Synthesis (HLS) compiler. The system processes a live 640x480 video stream at 60 Hz and outputs the edge-detected result over VGA in real time.
This project was completed as an MSc dissertation at the University of Liverpool (Telecommunications and Wireless Systems, 2024-2025).
| Metric | Value |
|---|---|
| Initiation Interval (II) | 1 (one pixel per clock cycle) |
| Pixel clock | 100 MHz |
| Sustained throughput | 100 M pixels/s (~325 fps at 640x480) |
| Worst-case Fmax (Slow 1100mV 85C) | 138.87 MHz |
| ALM usage (full system) | 7,398 / 32,070 (23%) |
| Block RAM bits | 369,278 / 4,065,280 (9%) |
| DSP blocks | 21 / 87 (24%) |
| Total registers | 16,845 |
| Total pins | 204 / 457 (45%) |
| PLLs | 1 / 6 (17%) |
| Resource | Usage | Available | % |
|---|---|---|---|
| ALUTs | 8,005 | 109,572 | 7% |
| Flip-Flops | 13,235 | 219,144 | 6% |
| RAMs | 30 | 514 | 6% |
| DSPs | 3.5 | 112 | 4% |
DE1-SoC VIP Pipeline (Platform Designer / Qsys)
┌──────────────┐ ┌──────────┐ ┌────────────┐ ┌───────────┐ ┌──────────┐
│ Clocked │ │ Colour │ │Deinterlacer│ │ Chroma │ │ Clipper │
│ Video Input ├───►│ Plane ├───►│ II (4K ├───►│ Resampler ├───►│ II (4K │
│ II (4K) │ │ Seq. II │ │HDR Passthru│ │ II (4K) │ │ Ready) │
└──────────────┘ └──────────┘ └────────────┘ └───────────┘ └────┬─────┘
│
▼
┌──────────────┐ ┌──────────┐ ┌────────────┐ ┌───────────┐ ┌──────────┐
│ Clocked │ │ Frame │ │ RAW-to- │ │ filters │ │ VIP-to- │
│ Video Out │◄───┤ Buffer │◄───┤ VIP │◄───┤ (HLS) │◄───┤ RAW │
│ (VGA) │ │ II (4K) │ │ Bridge │ │ │ │ Bridge │
└──────────────┘ └──────────┘ └────────────┘ └───────────┘ └──────────┘
│ │
▼ ▼
VGA Monitor SDRAM Controller
The HLS filters block sits between two VIP/RAW bridges. Video data enters as packetised Avalon-ST (8 bits/symbol, 3 symbols/beat for RGB 8:8:8, with SOP/EOP framing). The pipeline processes one pixel per clock cycle, maintaining II = 1 throughout.
Input (RGB 8:8:8)
│
▼
Greyscale Conversion ──► (R + G + B) / 3
│
▼
5×5 Gaussian Blur ──► Weighted kernel (sum = 159), buffered across 5 line stores
│
▼
3×3 Sobel Gradient ──► Gx and Gy computed over 3 buffered Gaussian output lines
│ Gradient magnitude = |Gx| + |Gy|
│ Direction quantised to 4 bins (0°, 45°, 90°, 135°)
▼
Non-Maximum Suppression ──► 3×3 neighbourhood comparison along gradient direction
│ Thin edges to single-pixel width
▼
Double Threshold ──► Low = 50, High = 130
│ Strong edges (> 130) → 255
│ Weak edges (50-130) → kept only if NMS direction = 1
│ Below low → 0
▼
Output (greyscale edge map, replicated to RGB for VGA)
All line buffers are shift-register based. Each new pixel shifts into position 0, and the oldest pixel falls off the far end. This avoids any address-based memory access and lets the HLS compiler map the buffers to on-chip block RAM.
| Clock | Frequency | Source | Role |
|---|---|---|---|
| CLOCK_50 | 50 MHz | Board oscillator | System reference |
| PLL general[0] | 100 MHz | Derived | SDRAM (phase-shifted for external) |
| PLL general[1] | 100 MHz | Derived | Pixel processing clock (VIP chain + HLS filters) |
| PLL general[2] | 25 MHz | Derived | VGA pixel clock (640x480 @ 60 Hz) |
| PLL general[3] | ~18.33 MHz | Derived | Audio clock |
| TD_CLK27 | 27 MHz | Board oscillator | TV decoder |
The HLS block and VIP pipeline both run on the 100 MHz general[1] domain. VGA output uses the 25 MHz general[2] domain. A single fractional PLL (1100.11 MHz VCO) generates all derived clocks.
| Clock Domain | Achieved Fmax | Required | Margin |
|---|---|---|---|
| PLL general[2] (VGA path) | 138.87 MHz | 25 MHz | +113.87 MHz |
| PLL general[1] (pixel processing) | 141.18 MHz | 100 MHz | +41.18 MHz |
| CLOCK_50 | 193.99 MHz | 50 MHz | +143.99 MHz |
| TD_CLK27 | 201.98 MHz | 27 MHz | +174.98 MHz |
| PLL general[3] (audio) | 499.50 MHz | 18.33 MHz | +481.17 MHz |
All clocks meet timing with positive slack. TNS = 0.000 across all domains and all timing corners (Slow 85C, Slow 0C, Fast 85C, Fast 0C).
The main processing loop (filters.B2 at filters.cpp:59) is pipelined with II ~1 (approximately 1 cycle per iteration). All inner shift-register loops are fully unrolled via #pragma unroll, which eliminates loop overhead and lets the compiler schedule them as combinational logic within a single clock cycle.
FPGA-Image-Processing-HLS/
├── README.md ← You are here
├── .gitignore
├── LICENSE
│
├── hls/
│ └── filters.cpp ← HLS C++ kernel source (Intel i++ compiler)
│
├── quartus/
│ └── README.md ← Notes on the Quartus project structure
│
└── docs/
├── build-guide.md ← Step-by-step build and programming instructions
├── screenshots/ ← Place Platform Designer, Fitter, STA screenshots here
└── reports/ ← Place .rpt and .summary files here
Note on Quartus project files: The full Quartus project (
DE1_SoC_VIP_TV_640x480_W/) and generated IP files are not included in this repository due to their size and Intel's VIP IP licensing. Thehls/directory contains the original HLS source code, and thedocs/build-guide.mdexplains how to recreate the full project from Intel's DE1-SoC VIP demonstration design.
- FPGA Board: Terasic DE1-SoC (Intel Cyclone V 5CSEMA5F31C6N)
- Video Input: TV decoder on-board, or any Avalon-ST video source
- Video Output: VGA monitor connected to the DE1-SoC VGA port
- Programming: USB-Blaster II (on-board, via USB cable)
- Intel Quartus Prime 18.1 Standard Edition (Build 625)
- Intel HLS Compiler (i++) version 18.1.0 Build 625, invoked from the Nios II Command Shell
- Visual Studio 2010 Professional (required by the Intel HLS Compiler for Windows)
- Platform Designer (Qsys) (included with Quartus Prime)
- OS: Windows 10/11
Full instructions are in docs/build-guide.md. The short version:
1. Compile the HLS kernel
# From the Nios II Command Shell
cd C:/path/to/HLS/filters
i++ -march="Cyclone V" --simulator none -v -o filters filters.cppThis generates filters.prj/components/filters/ containing the synthesisable RTL and Platform Designer IP files.
2. Integrate into Platform Designer
Open the DE1-SoC VIP Quartus project, launch Platform Designer, and add the HLS component's IP search path:
C:/path/to/HLS/filters/filters.prj/components
Insert the filters component between the VIP-to-RAW and RAW-to-VIP bridges. Connect clock, reset, and Avalon-ST interfaces. Generate HDL.
3. Compile and program
# Full Quartus compilation
quartus_sh --flow compile DE1_SoC_VIP_TV
# Program the FPGA via JTAG
quartus_pgm -m jtag -o "p;output_files/DE1_SoC_VIP_TV.sof"The kernel is a single hls_always_run_component that reads from an Avalon-ST input and writes to an Avalon-ST output. The full Canny-style pipeline runs inside a single while(!end_of_packet) loop, processing one pixel per iteration.
Key implementation details:
- Greyscale is computed as the average of R, G, B channels (integer division by 3).
- Gaussian blur uses a 5x5 kernel with weights summing to 159. Five line buffers (
line0throughline4, each 640 entries) store the pixel rows. The kernel coefficients match a standard 5x5 Gaussian approximation (centre weight 15). - Sobel computes horizontal and vertical gradients over a 3x3 window on the Gaussian output. Three additional line buffers (
gaussian_line0throughgaussian_line2) hold the blurred rows. Gradient direction is quantised into four bins usingatanapproximation via threshold comparisons on theGy/Gxratio. - Non-maximum suppression compares each pixel's gradient magnitude against its two neighbours along the gradient direction. Three more line buffers (
sobel_line0throughsobel_line2) store the Sobel results with direction metadata. - Double threshold classifies pixels as strong (> 130), weak (50-130), or suppressed (< 50). Weak pixels are kept only if the NMS pass flagged them as local maxima.
- Output delay uses a small shift register (
final_delay, depth 12) and a large RGB buffer (rgb_buffer, depth 3201) to align the processed output with the original stream timing. - All line buffer shifts use
#pragma unrollto ensure single-cycle execution.
The following non-default settings were applied for timing closure:
| Setting | Value | Default |
|---|---|---|
| Optimisation Mode | Aggressive Performance | Balanced |
| Optimisation Technique | Speed | Balanced |
| Physical Synthesis (Combo Logic) | On | Off |
| Physical Synthesis (Register Retiming) | On | Off |
| Placement Effort Multiplier | 4.0 | 1.0 |
| Router Timing Optimisation Level | Maximum | Normal |
| Fitter Effort | Standard Fit | Auto Fit |
Total compilation time: approximately 10 minutes 31 seconds (Analysis & Synthesis: 3:36, Fitter: 6:03, Assembler: 0:23, Timing Analyzer: 0:29).
All setup, hold, recovery, removal, and minimum pulse width checks pass with positive slack. Zero TNS across every timing corner.
Worst-case setup slack on the pixel processing clock (general[1], 100 MHz):
| Corner | Slack |
|---|---|
| Slow 1100mV 85C | +0.827 ns |
| Slow 1100mV 0C | +1.124 ns |
| Fast 1100mV 85C | +3.224 ns |
| Fast 1100mV 0C | +3.416 ns |
edi_read_master/write_master"must be an export" on the Deinterlacer II. These are unused memory-mapped master interfaces in the VIP IP. They can be safely ignored or exported and left unconnected.- PLL "Actual settings differ from Requested settings." The PLL quantises to the nearest achievable frequency. If the VGA image is stable and timing passes, this is fine.
- Deinterlacer "configured for YCbCr 4:2:2 colour space." This is an informational message; the colour pipeline is coherent (confirmed by correct VGA output).
Varun Venkata Tej Molleti MSc Telecommunications and Wireless Systems, University of Liverpool (2024-2025)
This project is provided for educational and portfolio purposes. The HLS source code (filters.cpp) is original work. Intel VIP IP cores and Quartus project templates are subject to Intel's licensing terms.