UNIVERSITY OF WEST ATTICA
SCHOOL OF ENGINEERING
DEPARTMENT OF COMPUTER ENGINEERING AND INFORMATICS
University of West Attica · Department of Computer Engineering and Informatics
Parallel Systems
Vasileios Evangelos Athanasiou
Student ID: 19390005
Supervision
Supervisor: Vasileios Mamalis, Professor
Co-supervisor: Michalis Iordanakis, Academic Scholar
Athens, February 2025
This repository implements parallel computations on 2D integer arrays using CUDA, developed as part of the Parallel Systems course at the University of West Attica. The project demonstrates GPU-accelerated matrix operations using CUDA threads, shared memory, and atomic operations.
- NVIDIA CUDA Toolkit (≥ 11.0 recommended)
Download: https://developer.nvidia.com/cuda-downloads - NVIDIA GPU with compute capability ≥ 3.0 (tested on NVIDIA TITAN RTX)
- GCC compiler (Linux/macOS) or compatible compiler on Windows
- Make / Terminal for compilation and execution
- Text editor or IDE (VSCode, CLion, Nsight)
- Spreadsheet viewer for performance analysis
git clone https://github.com/Parallel-Systems-aka-Uniwa/CUDA.gitOr download the ZIP archive and extract it.
cd CUDAFolder structure:
assign/
docs/
src/
README.mdsrc/ contains CUDA source code (cuda1.cu) and input/output directories
docs/ contains theory, exercises, and performance analysis
Compile the CUDA program using nvcc:
nvcc -o cuda1 src/cuda1.cuExplanation:
-o cuda1→ output executable named cuda1src/cuda1.cu→ CUDA source file
Ensure CUDA Toolkit paths are correctly set ($PATH and $LD_LIBRARY_PATH on Linux).
Run the program with input matrix file and output file:
./cuda1 src/A/AtoB.txt src/OutArr/OutArrB.txtArguments:
- Input file → Path to input matrix (e.g., src/A/AtoB.txt)
- Output file → Path to store result matrix (e.g., src/OutArr/OutArrB.txt)
Example Runs
./cuda1 src/A/AtoB.txt src/OutArr/OutArrB.txt
./cuda1 src/A/AtoC.txt src/OutArr/OutArrC.txt- Performs Matrix B or Matrix C operations based on input file
- Supports different matrix sizes (e.g., N=8, 512, 1024, 10000, 20000)
- Located in
src/A/ - Files contain integer matrices in text format
Typical names:
- AtoB.txt → Input for Matrix B computation
- AtoC.txt → Input for Matrix C computation
Stored in src/OutArr/ (intermediate arrays) or src/Output/ (final results)
Examples:
- OutArrB.txt → Matrix B after computation
- OutArrC.txt → Matrix C after computation
- Output512.txt → Result for N=512
- Output20000.txt → Result for N=20000
Parallel reduction with atomic operations
Parallel search for largest element
Also, to find the minimum element of B:
Experiments were conducted on various matrix sizes:
| Matrix Size | calcAvg (ms) | findMax (ms) | createB/C (ms) |
|---|---|---|---|
| 8×8 | 0.204736 | 0.015552 | 0.015040 (B) |
| 512×512 | 0.136576 | 0.016704 | 0.016576 (C) |
| 1024×1024 | 36.310913 | 0.059424 | 0.072832 (B) |
| 20000×20000 | 39.388447 | 0.015104 | 0.011424 (C) |
Observations:
- Execution times for calcAvg scale linearly with matrix size.
findMaxandcreateCremain high-performance even for large matrices due to optimized parallel kernels.
- Memory Allocation: Large arrays may cause cudaMemcpy failures if GPU memory is insufficient.
- Shared Memory Limits: Over-allocation per block may lead to kernel launch failures.
