You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Algos should be designed to fit data on PMem blocks (256 byte) rather than single cache lines (64 byte)
Use streaming ops or stores followed by clwb, especially for data written to the same cache line, (e.g. array-like structures with size field or a global counter for time-stamping)
Using too many threads can lead to reduced performance
PMem read & write bandwidth is lower than DRAM. Prefer DRAM for performance-critical code.
Guidelines for effective usage of PMem
clwb, especially for data written to the same cache line, (e.g. array-like structures with size field or a global counter for time-stamping)