Skip to content

Latest commit

 

History

History
139 lines (105 loc) · 16.1 KB

File metadata and controls

139 lines (105 loc) · 16.1 KB

UNIWA

UNIVERSITY OF WEST ATTICA
SCHOOL OF ENGINEERING
DEPARTMENT OF COMPUTER ENGINEERING AND INFORMATICS

University of West Attica · Department of Computer Engineering and Informatics


Parallel Systems

Parallel Computing using CUDA

Vasileios Evangelos Athanasiou
Student ID: 19390005

GitHub · LinkedIn


Supervision

Supervisor: Vasileios Mamalis, Professor

UNIWA Profile

Co-supervisor: Michalis Iordanakis, Academic Scholar

UNIWA Profile · Scholar


Athens, February 2025



README

Parallel Computing using CUDA

This repository implements parallel operations on 2D integer arrays using CUDA for high-performance GPU computation. The project was developed as part of the Parallel Systems course at the University of West Attica.


Table of Contents

Section Folder/File Description
1 assign/ Assignment material for the CUDA workshop
1.1 assign/_Par_Sys_Ask_2-1_2024-25.pdf Assignment description in English
1.2 assign/_Παρ_Συσ_Ασκ_2-1_2024-25.pdf Assignment description in Greek
2 docs/ Documentation for parallel computing using CUDA
2.1 docs/Parallel-Computig-using-CUDA.pdf English documentation for CUDA parallel computing
2.2 docs/Παράλληλος-Υπολογισμός-με-CUDA.pdf Greek documentation for CUDA parallel computing
3 src/ Source code, input/output files, and CUDA implementation
3.1 src/cuda1.cu Main CUDA program
3.2 src/A/ Input data files for CUDA exercise A
3.2.1 src/A/AtoB.txt Input file for transformation A to B
3.2.2 src/A/AtoC.txt Input file for transformation A to C
3.3 src/OutArr/ Intermediate output arrays
3.3.1 src/OutArr/OutArrB.txt Output array B
3.3.2 src/OutArr/OutArrC.txt Output array C
3.4 src/Output/ Final output files
3.4.1 src/Output/Output_no_args.txt Output without arguments
3.4.2 src/Output/Output8B.txt Output for 8B case
3.4.3 src/Output/Output8C.txt Output for 8C case
3.4.4 src/Output/Output512.txt Output for N=512
3.4.5 src/Output/Output1024.txt Output for N=1024
3.4.6 src/Output/Output10000.txt Output for N=10000
3.4.7 src/Output/Output20000.txt Output for N=20000
4 README.md Project documentation
5 INSTALL.md Usage instructions

1. Overview

The project utilizes the CUDA architecture to perform matrix calculations in parallel on a GPU. A random N×N integer matrix is generated and processed across CUDA threads for efficiency.


2. Core Operations

  • Average Calculation (calcAvg): Computes the mean of all elements using parallel reduction and atomic operations.
  • Maximum Finding (findMax): Identifies the largest element in the matrix.
  • Matrix B Creation (createB):
    Bᵢⱼ = a_max − Aᵢⱼ for i ≠ j
    Bᵢᵢ = a_max
    Also identifies the minimum element in B.
  • Matrix C Creation (createC):
    Cᵢⱼ = 3·Aᵢⱼ + Aᵢ(j+1) + Aᵢ(j−1)

3. Design & Implementation

Optimization Techniques:

  • Parallel Reductions: Efficient aggregation for sum and maximum computations.
  • Shared Memory: Reduces global memory latency by storing block-local data.
  • Synchronization: Uses __syncthreads() to coordinate threads within a block.
  • Atomic Operations: Implements atomicMin for floating-point numbers using atomicCAS.