Distributed parallel computing

Here I investigate the parallelisation of matrix relaxation over a distributed Azure cluster using MPI.

MPI is the de facto standard for scalable message passing between parallel programs. This project uses the OpenMPI implementation of this standard.

Matrix relaxation is sometimes also called the Jacobi method. In essence, a sliding window is passed over a matrix to calculate the average of a cell with its neighbours. This is an interesting problem to parallelise because each iteration depends on the previous iteration, and the value of each cell depends on its neighbours. This is especially interesting when parallelising over a distributed system because communication costs over a network are much higher in comparison to communication over memory. You must carefully minimise the data being transferred between processes while maintaining correctness.

I solve the same problem using threads on a shared memory machine here.

High level design

Broadly speaking:

The root process splits a large matrix into chunks and sends them to worker processes, which could be on the same machine or a different node.
After each iteration of relaxation, a check must be performed to see if the matrix has converged, or rather that its difference from the previous iteration is sufficiently small.
If it has not converged, processes communicate the boundary of their local problem to neighbouring chunks
Repeat 2-3
Once the problem has converged, the matrix is carefully reconstructed by the root process.

Asynchronous communications

This program carefully performs communications and computations simultaneously in a bid to reduce the communication overhead.

I've included an alternate implementation which uses synchronous communications in the report as well as a comparison of performance between asynchronous and synchronous communications. It also discusses different communication strategies and the reasoning behind my specific choice of strategy.

Scalability investigation

The report includes an investigation of the scalability of this system, including graphics. It provides calcuations of Speedup & Efficiency as well as comments on Amdahl's law and Gustafson's law.

Testing

The report also includes details on correctness testing.

Running locally

Have OpenMPI installed

sudo apt install libopenmpi-dev

or build from source https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/quickstart.html

Compile with mpicc

mpicc relaxation.c -o relaxation

Run with mpirun to spin up multiple nodes locally

# Usage: run.sh [num of nodes] [problem size] [precision]
./scripts/run.sh 4 20000 0.01

Running on a cluster

This will look different depending on architecture. This project was run on an Azure cluster using Slurm as a workload manager.

ssh into the head node and compile as before.
Dispatch with slurm, it will look something like this:

#!/bin/bash
#SBATCH --account=<your account>
#SBATCH --partition=<your partition>
#SBATCH --job-name=<your job name>
#SBATCH --nodes=<number of nodes> 
#SBATCH --mail-type=END
#SBATCH --mail-user=<your email>
pwd
./relaxation

This project was tested on several node sizes and a number of MPI process on each node. If you're interested and would like more details on performance, take a look at the report.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
scripts		scripts
.gitignore		.gitignore
Investigation Report.pdf		Investigation Report.pdf
README.md		README.md
relaxation.c		relaxation.c
relaxation.h		relaxation.h
spec.pdf		spec.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distributed parallel computing

High level design

Asynchronous communications

Scalability investigation

Testing

Running locally

Running on a cluster

About

Uh oh!

Releases

Packages

Languages

jayrabjohns/distributed-memory-parallel-computing

Folders and files

Latest commit

History

Repository files navigation

Distributed parallel computing

High level design

Asynchronous communications

Scalability investigation

Testing

Running locally

Running on a cluster

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages