The current OpenMP implementation is not optimal (2x performance for 8 threads). There are two parts to OpenMP acceleration: building the interaction matrix (A), and solving the linear system Ax = b.
The issue for building A is probably load-balancing: the A/B vsh translation coefficients involve recursion relations that depend on inter-particle separation. Some threads will finish before others.
The issue for solving Ax = b is less obvious. Since this is a widely famous problem, it's worth looking into existing software solutions.
There are a few things that can easily be parallelized: source decomposition, cross-section evaluation, force/torque evaluation, E/H field evaluation
Lastly, there are two algorithm optimizations not being used:
- Using rotation-translation-rotation algorithm to construct A matrix
- There might exist an optimal solver for the linear system based on the physical problem, see Xu papers.
The current OpenMP implementation is not optimal (2x performance for 8 threads). There are two parts to OpenMP acceleration: building the interaction matrix (A), and solving the linear system Ax = b.
The issue for building A is probably load-balancing: the A/B vsh translation coefficients involve recursion relations that depend on inter-particle separation. Some threads will finish before others.
The issue for solving Ax = b is less obvious. Since this is a widely famous problem, it's worth looking into existing software solutions.
There are a few things that can easily be parallelized: source decomposition, cross-section evaluation, force/torque evaluation, E/H field evaluation
Lastly, there are two algorithm optimizations not being used: