Enable parallel memset/memcpy in `clear_x_field`/`copy_x_field`. by SaltyChiang · Pull Request #94 · milc-qcd/milc_qcd

SaltyChiang · 2026-03-03T06:36:02Z

clear_x_field/copy_x_field functions only use a single process to perform the memcpy/memset, which cannot utilize the full memory bandwidth on modern CPUs. Also, there are some memset calls to clear memory for gauge fields in the RHMC/RHMD algorithm. They contribute the most of the overall CPU overhead.

Parallel memcpy and memset calls are used in clear_x_field and copy_x_field functions, and memset calls to clear a gauge field are replaced by clear_m_array_field. Generally, it will reduce more than half of the CPU overhead.

Also, scalar_mult_fn + add_fn can be fused into one call scalar_mult_add_fn, which is obviously better since it will only read and write the field 3 times instead of 5 times.

…y_x_field`.

SaltyChiang added 2 commits March 3, 2026 00:26

Enable parallel memset in clear_x_field and parallel memcpy in `cop…

2e17c52

…y_x_field`.

Merge branch 'develop' into feature/parallel-clear-copy-field

019c05b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable parallel memset/memcpy in `clear_x_field`/`copy_x_field`.#94

Enable parallel memset/memcpy in `clear_x_field`/`copy_x_field`.#94
SaltyChiang wants to merge 2 commits intomilc-qcd:developfrom
SaltyChiang:feature/parallel-clear-copy-field

SaltyChiang commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SaltyChiang commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant