High-Performance Matrix Multiplication Using OpenMP

This project presents a comprehensive study of high-performance matrix multiplication on shared-memory architectures using OpenMP. It focuses on performance optimisation, benchmarking, and strong scaling analysis of parallel matrix multiplication algorithms implemented in C.

Project Objectives

To implement parallel matrix multiplication using OpenMP
To analyse performance improvements through cache-aware optimisations
To study strong scaling behaviour on multicore processors
To understand the impact of memory bandwidth and thread scalability

Features

Naive OpenMP matrix multiplication implementation
Cache-optimised (tiled) matrix multiplication
Automated benchmarking using shell scripts
Strong scaling and efficiency evaluation

Project Structure

hpc-matrix-multiplication/
├── src/
│   ├── matrix_mul_omp.c
│   ├── matrix_mul_tiled.c
│   ├── run_benchmark.sh
│   ├── run_strong_scaling.sh
│   ├── benchmark_results.txt
│   └── strong_scaling_results.txt
├── results/
└── README.md

Implementation Overview

The project includes two matrix multiplication strategies:

Naive OpenMP Implementation — A straightforward parallelisation of the triple-nested loop matrix multiplication using OpenMP directives.
Tiled (Cache-Optimised) Implementation — A block-based approach designed to improve cache locality and reduce memory access latency.

How to Compile

cd src
gcc -O3 -fopenmp matrix_mul_omp.c -o matrix_omp
gcc -O3 -fopenmp matrix_mul_tiled.c -o matrix_tiled

How to Run

cd src
./run_benchmark.sh
./run_strong_scaling.sh

Benchmark Results (N = 2048)

The following table compares execution time between a naive OpenMP implementation and a cache-optimised tiled implementation. The reported speedup is calculated as the ratio of naive execution time to tiled execution time.

Threads	Naive Time (s)	Tiled Time (s)	Speedup
1	87.9724	10.1177	8.69
2	44.8283	5.5742	8.04
4	34.9294	3.4970	9.99
8	32.6481	2.1029	15.53

These results demonstrate that cache tiling significantly reduces memory access latency and improves data locality, leading to substantial performance gains. The benefits become increasingly pronounced as the number of threads increases.

Strong Scaling Analysis (Tiled OpenMP, N = 2048)

Strong scaling evaluates how execution time decreases as the number of threads increases for a fixed problem size. The following table reports runtime, speedup, and parallel efficiency for the tiled OpenMP implementation.

Threads	Time (s)	Speedup	Efficiency (%)
1	8.0001	1.00	100.00
2	4.5733	1.75	87.47
4	2.8410	2.82	70.40
8	2.1627	3.70	46.24

The results indicate diminishing parallel efficiency as thread count increases, primarily due to memory bandwidth saturation and synchronisation overheads. Nevertheless, the implementation exhibits good scalability up to moderate thread counts.

Key Observations

Cache tiling significantly reduces execution time
Near-linear speedup is observed at lower thread counts
Parallel efficiency decreases as memory bandwidth becomes a bottleneck
Performance gains are more prominent for larger matrix sizes

Technologies Used

C Programming Language
OpenMP
GCC Compiler
Bash Scripting
Linux / WSL Environment

Author

Aman Bashir Sheikh
Email: aman.b.sheikh119@gmail.com
LinkedIn: www.linkedin.com/in/aman-sheikh-aa701b253

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
results		results
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High-Performance Matrix Multiplication Using OpenMP

Project Objectives

Features

Project Structure

Implementation Overview

How to Compile

How to Run

Benchmark Results (N = 2048)

Strong Scaling Analysis (Tiled OpenMP, N = 2048)

Key Observations

Technologies Used

Author

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

High-Performance Matrix Multiplication Using OpenMP

Project Objectives

Features

Project Structure

Implementation Overview

How to Compile

How to Run

Benchmark Results (N = 2048)

Strong Scaling Analysis (Tiled OpenMP, N = 2048)

Key Observations

Technologies Used

Author

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages