Skip to content

Enable parallel memset/memcpy in clear_x_field/copy_x_field.#94

Open
SaltyChiang wants to merge 2 commits intomilc-qcd:developfrom
SaltyChiang:feature/parallel-clear-copy-field
Open

Enable parallel memset/memcpy in clear_x_field/copy_x_field.#94
SaltyChiang wants to merge 2 commits intomilc-qcd:developfrom
SaltyChiang:feature/parallel-clear-copy-field

Conversation

@SaltyChiang
Copy link
Copy Markdown
Collaborator

clear_x_field/copy_x_field functions only use a single process to perform the memcpy/memset, which cannot utilize the full memory bandwidth on modern CPUs. Also, there are some memset calls to clear memory for gauge fields in the RHMC/RHMD algorithm. They contribute the most of the overall CPU overhead.

Parallel memcpy and memset calls are used in clear_x_field and copy_x_field functions, and memset calls to clear a gauge field are replaced by clear_m_array_field. Generally, it will reduce more than half of the CPU overhead.

Also, scalar_mult_fn + add_fn can be fused into one call scalar_mult_add_fn, which is obviously better since it will only read and write the field 3 times instead of 5 times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant