Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds an interface to QUDA's gauge flow routine. This is an intermediate update for MILC gauge flow before a planned future update that will refactor the gauge flow so that it can be called as a function from other applications rather than only operating as a standalone application.
Recent updates to QUDA (lattice/quda#1555) added fourth order integrator, fixed a bug in the topological charge calculation, added the rectangle observable, and added support for anisotropic flow and smearing. With those updates, MILC can now usefully offload gauge flow to QUDA.
Caveats
Usage
The following MILC targets are supported:
wilson_flow: Wilson or Symanzik flow using third-order Runge-Kutta integratorwilson_flow_bbb: Wilson or Symanzik flow using fourth-order Runge-Kutta integratorwilson_flow_a: Anisotropic Wilson or Symanzik flow using third-order Runge-Kutta integratorwilson_flow_bbb_a: Anisotropic Wilson or Symanzik flow using fourth-order Runge-Kutta integratorSimply compile with
WANTQUDA=trueand provide QUDA, QIO, and QMP paths in the usual way.Performance
0.09 fm ($64^3 \times 96$ ):
On 1 Big Red 200 CPU node (128 cores), 40 steps of Wilson flow took 1234s and 40 steps of Symanzik flow took 3993s during a single benchmarking run. On one DeltaAI GH200 GPU (1 GPU not 1 node!) the same took 15.6s and 45.9s respectively. This means (in some sense) that 1 GH200$\sim$ 10125 CPU cores for Wilson flow and 1 GH200 $\sim$ 11135 CPU cores for Symanzik flow. This is an ideal case for the GPU running because the 0.09 fm lattice fits on a single GH200 (no inter-GPU or inter-node communication is needed) and almost completely fills the GPU. The QUDA test of course took advantage of a previously-generated tunecache.
0.04 fm ($144^3 \times 288$ )
This is now a much larger lattice, so even the GPU case cannot escape the need for inter-node communication. GPU flow benchmarks on Perlmutter are compared to Perlmutter CPU production runs done in 2022. In 2022, 720 steps of Wilson flow performed on 64 CPU nodes took 2.97 hours, whereas today the same performed on 32 GPU nodes takes 10.6 minutes. In this comparison, 1 A100$\sim$ 1080 CPU cores. For Symanzik flow, the CPU running was done on 128 nodes and 720 flow steps took 4.69 hours for one particular config, but only takes 15.2 minutes when run on 32 GPU nodes. In this comparison, 1 A100 $\sim$ 2360 CPU cores.