Impoving OpenMP acceleration 

The current OpenMP implementation is not optimal (2x performance for 8 threads). There are two parts to OpenMP acceleration: building the interaction matrix (A), and solving the linear system Ax = b.

The issue for building A is probably load-balancing: the A/B vsh translation coefficients involve recursion relations that depend on inter-particle separation. Some threads will finish before others.

The issue for solving Ax = b is less obvious. Since this is a widely famous problem, it's worth looking into existing software solutions.

There are a few things that can easily be parallelized: source decomposition, cross-section evaluation, force/torque evaluation, E/H field evaluation

Lastly, there are two algorithm optimizations not being used:
1. Using rotation-translation-rotation algorithm to construct A matrix
2. There might exist an optimal solver for the linear system based on the physical problem, see Xu papers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impoving OpenMP acceleration #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Impoving OpenMP acceleration #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions