Skip to content

jayrabjohns/distributed-memory-parallel-computing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed parallel computing

Here I investigate the parallelisation of matrix relaxation over a distributed Azure cluster using MPI.

MPI is the de facto standard for scalable message passing between parallel programs. This project uses the OpenMPI implementation of this standard.

Matrix relaxation is sometimes also called the Jacobi method. In essence, a sliding window is passed over a matrix to calculate the average of a cell with its neighbours. This is an interesting problem to parallelise because each iteration depends on the previous iteration, and the value of each cell depends on its neighbours. This is especially interesting when parallelising over a distributed system because communication costs over a network are much higher in comparison to communication over memory. You must carefully minimise the data being transferred between processes while maintaining correctness.

I solve the same problem using threads on a shared memory machine here.

High level design

Broadly speaking:

  1. The root process splits a large matrix into chunks and sends them to worker processes, which could be on the same machine or a different node.
  2. After each iteration of relaxation, a check must be performed to see if the matrix has converged, or rather that its difference from the previous iteration is sufficiently small.
  3. If it has not converged, processes communicate the boundary of their local problem to neighbouring chunks
  4. Repeat 2-3
  5. Once the problem has converged, the matrix is carefully reconstructed by the root process.

Asynchronous communications

This program carefully performs communications and computations simultaneously in a bid to reduce the communication overhead.

I've included an alternate implementation which uses synchronous communications in the report as well as a comparison of performance between asynchronous and synchronous communications. It also discusses different communication strategies and the reasoning behind my specific choice of strategy.

Scalability investigation

The report includes an investigation of the scalability of this system, including graphics. It provides calcuations of Speedup & Efficiency as well as comments on Amdahl's law and Gustafson's law.

Testing

The report also includes details on correctness testing.

Running locally

  1. Have OpenMPI installed
sudo apt install libopenmpi-dev

or build from source https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/quickstart.html

  1. Compile with mpicc
mpicc relaxation.c -o relaxation
  1. Run with mpirun to spin up multiple nodes locally
# Usage: run.sh [num of nodes] [problem size] [precision]
./scripts/run.sh 4 20000 0.01

Running on a cluster

This will look different depending on architecture. This project was run on an Azure cluster using Slurm as a workload manager.

  1. ssh into the head node and compile as before.

  2. Dispatch with slurm, it will look something like this:

#!/bin/bash
#SBATCH --account=<your account>
#SBATCH --partition=<your partition>
#SBATCH --job-name=<your job name>
#SBATCH --nodes=<number of nodes> 
#SBATCH --mail-type=END
#SBATCH --mail-user=<your email>
pwd
./relaxation

This project was tested on several node sizes and a number of MPI process on each node. If you're interested and would like more details on performance, take a look at the report.

About

A distributed parallel matrix relaxer built in C to run on an Azure cluster using OpenMPI for message passing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published