Skip to content

NEC gather inefficiency #313

@raver119

Description

@raver119

Here's sample code.
https://gist.github.com/raver119/3988237c9bb2376b0cd745120c5bc38e

Both gather and load functions do exactly the same, but since they use slightly different inner loop, compiler either uses gather or load instructions. This results in performance degradation.

Output on Aurora:

/opt/nec/ve/bin/nc++ -O3 -fopenmp bug_gather.cpp
Time gather: [210 us]; Time load: [60 us]

Output on x86:

 g++ -O3 -mmmx -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx -mavx2 -mfma -mf16c -mprefetchwt1 -fopenmp bug_gather.cpp 

Time gather: [209 us]; Time load: [215 us]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions