Code Repository for the 100 Days of GPU Challenge
Coding: Vector Addition Kernel for a Data of 1024 Elements
Reading: Read Chapter 1 of PMPP Book
Coding:-
Encrypting and Decrypting a Character Array of lower-case letters.
Used the CUDA Events API to record the time and compute the bandwidth utilization.
Reading:
Read Chapter 2 of PMPP Book
Read about Performance Metrics in CUDA
Coding:-
Coded the Matrix Multiplication Kernel Basic
Reading:-
Read about Matmul from this link
Solved the Exercises of Chapter 2 of PMPP book.
Coding:-
Coded the Color to Grayscale Image Converter Kernel
Reading:-
Read Chapter 3 (Half) of the PMPP book.
Coding: -
Coded the Image Blur Kernel
Reading:-
Read about Image Blur Kernel from PMPP book.
Coding: -
Coded the Matrix Multiplication Kernel that calculates the output with per thread activity: -
Row-Wise
Col-Wise
Reading: -
Solved the Exercise of Chapter 3 in PMPP book.
Coding: -
Coded the Matrix-Vector Multiplication Kernel
Reading: -
Started Reading Chapter 4 of the PMPP Book.
Coding: -
Coded the Reduction Kernel using Atomic Operation
Reading: -
Read the Chapter 4 of the PMPP Book.
Coding: -
Coded the Reduction Kernel that uses interleaved addressing
Reading: -
Reading about reduction kernels from this link
Coding: -
Coded the Reduction Kernel that uses sequential addressing to reduce thread divergence.
Reading: -
Reading about reduction kernels from this link
Coding: -
Coded the Reduction Kernel that uses sequenced addressing to reduce bank conflicts.
Reading: -
Reading about reduction kernels from this link
Coding: -
Coded the Reduction Kernel that uses sequenced addressing to reduce bank conflicts and increases per thread activity by adding an element while loading
Reading: -
Reading about reduction kernels from this link
Coding: -
Coded the Reduction Kernel that uses sequenced addressing to reduce bank conflicts and increases per thread activity by adding an element while loading. Also added a warp reduce optimization to optimize the execution for the last warp.
Reading: -
Reading about reduction kernels from this link
Coding: -
Coded the CSR Representation in CPU but using the CSR to print the Degree of each Node using a CUDA Kernel.
Reading: -
Reading about CSR Format from this paper
Coding: -
Added Edge Weights to the CSR Kernel
Added Timing for GPU/CPU Comparison
Used the Synthetic Benchmark Suite to Generate a Graph of 250k edges and 15k vertices and did performance measurements
Reading: -
Reading about CSR Format from this paper
Coding: -
Wrote a GPU Kernel to Print Per-Vertex Edge Weight Mean
Used the Synthetic Benchmark Suite to Generate a Graph of 250k edges and 15k vertices and did performance measurements: -
Command: ./csr 15229 245952 .\bio-CE-CX.edges
Reading: -
Reading about CSR Format from this paper
Coding: -
Wrote a GPU Kernel to Implement BFS
Reading: -
Reading about BFS algorithm in this paper
Coding: -
Wrote a GPU Kernel to Implement Matrix Multiplication using Tiling
Coding: -
Wrote a GPU Kernel to Invert Colors of an image
Coding: -
Wrote a GPU Kernel to Compute the ReLU function for a set of floating point numbers.
Coding: -
Wrote a GPU Kernel to Compute the Leaky ReLU function for a set of floating point numbers.
Performance Analysis vs CPU version
Coding: -
Wrote a GPU Kernel to Compute the Matrix Transpose by loading through rows and storing by columns.
Coding: -
Wrote a GPU Kernel to Compute the Matrix Transpose by loading through columns and storing by rows.