hpx::parallel::traits::vector_pack_load::unaligned does not perform an unaligned load. For full SIMD packs it returns a broadcast of one scalar across all lanes; for the load and store this means values are silently wrong for any datapar algorithm that calls it on a non-uniform input.
Example:
The single int returned by *iter is replicated across all 8 lanes of the SIMD pack {5, 5, 5, 5, 5, 5, 5, 5} instead of loading the 8 consecutive ints starting at that address, {5, 7, 2, 9, 1, 4, 8, 3}.
The same issue exists in EVE backend:
File: libs/core/execution/include/hpx/execution/traits/detail/eve/vector_pack_load_store.hpp
Lines 37–40 (load):
template <typename Iter>
HPX_HOST_DEVICE HPX_FORCEINLINE static V unaligned(Iter& iter)
{
return *iter;
}
Lines 56–61 (store):
template <typename Iter>
HPX_HOST_DEVICE HPX_FORCEINLINE static void unaligned(V& value, Iter& iter)
{
*iter = value;
}
hpx::parallel::traits::vector_pack_load::unaligned does not perform an unaligned load. For full SIMD packs it returns a broadcast of one scalar across all lanes; for the load and store this means values are silently wrong for any datapar algorithm that calls it on a non-uniform input.
Example:
The single int returned by *iter is replicated across all 8 lanes of the SIMD pack {5, 5, 5, 5, 5, 5, 5, 5} instead of loading the 8 consecutive ints starting at that address, {5, 7, 2, 9, 1, 4, 8, 3}.
The same issue exists in EVE backend:
File:
libs/core/execution/include/hpx/execution/traits/detail/eve/vector_pack_load_store.hppLines 37–40 (load):
Lines 56–61 (store):