UNIVERSITY OF WEST ATTICA
SCHOOL OF ENGINEERING
DEPARTMENT OF COMPUTER ENGINEERING AND INFORMATICS

University of West Attica · Department of Computer Engineering and Informatics

Parallel Systems

Parallel Computing using CUDA

Vasileios Evangelos Athanasiou
Student ID: 19390005

GitHub · LinkedIn

Supervision

Supervisor: Vasileios Mamalis, Professor

UNIWA Profile

Co-supervisor: Michalis Iordanakis, Academic Scholar

UNIWA Profile · Scholar

Athens, February 2025

README

Parallel Computing using CUDA

This repository implements parallel operations on 2D integer arrays using CUDA for high-performance GPU computation. The project was developed as part of the Parallel Systems course at the University of West Attica.

Section	Folder/File	Description
1	`assign/`	Assignment material for the CUDA workshop
1.1	`assign/_Par_Sys_Ask_2-1_2024-25.pdf`	Assignment description in English
1.2	`assign/_Παρ_Συσ_Ασκ_2-1_2024-25.pdf`	Assignment description in Greek
2	`docs/`	Documentation for parallel computing using CUDA
2.1	`docs/Parallel-Computig-using-CUDA.pdf`	English documentation for CUDA parallel computing
2.2	`docs/Παράλληλος-Υπολογισμός-με-CUDA.pdf`	Greek documentation for CUDA parallel computing
3	`src/`	Source code, input/output files, and CUDA implementation
3.1	`src/cuda1.cu`	Main CUDA program
3.2	`src/A/`	Input data files for CUDA exercise A
3.2.1	`src/A/AtoB.txt`	Input file for transformation A to B
3.2.2	`src/A/AtoC.txt`	Input file for transformation A to C
3.3	`src/OutArr/`	Intermediate output arrays
3.3.1	`src/OutArr/OutArrB.txt`	Output array B
3.3.2	`src/OutArr/OutArrC.txt`	Output array C
3.4	`src/Output/`	Final output files
3.4.1	`src/Output/Output_no_args.txt`	Output without arguments
3.4.2	`src/Output/Output8B.txt`	Output for 8B case
3.4.3	`src/Output/Output8C.txt`	Output for 8C case
3.4.4	`src/Output/Output512.txt`	Output for N=512
3.4.5	`src/Output/Output1024.txt`	Output for N=1024
3.4.6	`src/Output/Output10000.txt`	Output for N=10000
3.4.7	`src/Output/Output20000.txt`	Output for N=20000
4	`README.md`	Project documentation
5	`INSTALL.md`	Usage instructions

1. Overview

The project utilizes the CUDA architecture to perform matrix calculations in parallel on a GPU. A random N×N integer matrix is generated and processed across CUDA threads for efficiency.

2. Core Operations

Average Calculation (calcAvg): Computes the mean of all elements using parallel reduction and atomic operations.
Maximum Finding (findMax): Identifies the largest element in the matrix.
Matrix B Creation (createB):
Bᵢⱼ = a_max − Aᵢⱼ for i ≠ j
Bᵢᵢ = a_max
Also identifies the minimum element in B.
Matrix C Creation (createC):
Cᵢⱼ = 3·Aᵢⱼ + Aᵢ(j+1) + Aᵢ(j−1)

3. Design & Implementation

Optimization Techniques:

Parallel Reductions: Efficient aggregation for sum and maximum computations.
Shared Memory: Reduces global memory latency by storing block-local data.
Synchronization: Uses __syncthreads() to coordinate threads within a block.
Atomic Operations: Implements atomicMin for floating-point numbers using atomicCAS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Computing using CUDA

README

Parallel Computing using CUDA

Table of Contents

1. Overview

2. Core Operations

3. Design & Implementation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Parallel Computing using CUDA

README

Parallel Computing using CUDA

Table of Contents

1. Overview

2. Core Operations

3. Design & Implementation