Skip to content

Improve hwloc#2

Draft
panau161 wants to merge 3 commits intomasterfrom
improve_hwloc
Draft

Improve hwloc#2
panau161 wants to merge 3 commits intomasterfrom
improve_hwloc

Conversation

@panau161
Copy link
Copy Markdown
Owner

@panau161 panau161 commented Dec 18, 2020

[draft]

Taking the discussion following my comment in Issue 810 into account, I hereby propose an improved hwloc detection mechanism. With Hyper-Threading enabled, the size of the cache used as local memory will now be divided by the number of PUs (=hardware threads) below it. The proposed code also covers edge cases where all or none of the detected cache levels are shared.

PoCL currently uses L3/L2 for global_mem_cache_size/local_mem_size. As identified by Oblomov, Intel uses L2/L1 while AMD (fglrx) uses L1/L1. Hence, I included [commit-hash] to change PoCL's strategy from L3/L2 to L2/L1 (without identifying performance implications). Please let me know in case you prefer the L3/L2 over L2/L1 and I will remove the commit from this PR. The table below shows the resulting values for two example CPUs with Hyper-Threading enabled/disabled and the two proposed strategies.

CPU (#Cores/#Threads) L1* (I+D) L2* L3** Strategy global_mem_cache_size local_mem_size Tested
i5-7200U (2C/4T) 32K+32K 256K 3M L3/L2 3M 128K (=L2/2) yes
i5-7200U (2C/2T) 32K+32K 256K 3M L3/L2 3M 256K
i5-7200U (2C/4T) 32K+32K 256K 3M L2/L1 256K 16K (=L1d/2) yes
i5-7200U (2C/2T) 32K+32K 256K 3M L2/L1 256K 32K
Ryzen 5 3600 (6C/12T) 32K+32K 512K 32M L3/L2 32M 256K (=L2/2)
Ryzen 5 3600 (6C/6T) 32K+32K 512K 32M L3/L2 32M 512K yes
Ryzen 5 3600 (6C/12T) 32K+32K 512K 32M L2/L1 512K 16K (=L1d/2)
Ryzen 5 3600 (6C/6T) 32K+32K 512K 32M L2/L1 512K 32K yes

hwloc cache info available for Ryzen 5 3600?

v1.11.9 v2.1.0
yes tested
no tested
HT disabled HT enabled
i5-7200U tested
Ryzen 5 3600 tested

This code sets global_mem_cache_size equal to the size of the last
level cache (typically L3 on current CPUs) and local_mem_size equal to
the size of the last private cache of a core (typically L2 on current
CPUs) divided by the number of hardware threads for that cache. Move the
specification of local_mem_size from common.c to pocl_topology.c.
isuruf pushed a commit that referenced this pull request Feb 9, 2022
The segmentation fault can be observed with llvm-10, llvm-11 and llvm-12
and seems to be fixed in llvm-13. It happens on the architectures armhf
and armel (both 32-bit) always and on x86_64 sporadically.
The test segfaults only on the first run (i.e. the kernel is not yet in
pocl's kernel cache) while it passes on subsequent execution (with
something already in the kernel cache), emitting only some llvm
diagnostics:

inlinable function call in a function with debug info must have a !dbg location
  %11 = call i32 @_Z12get_local_idj(i32 0)
inlinable function call in a function with debug info must have a !dbg location
  %19 = call i32 @_Z12get_local_idj(i32 1)
inlinable function call in a function with debug info must have a !dbg location
  %27 = call i32 @_Z12get_local_idj(i32 2)

The backtrace of the segmentation fault as observed with llvm-10 and pocl 1.6:
 #0  getEmissionKind () at .../llvm/include/llvm/IR/DebugInfoMetadata.h:1244
 #1  initialize () at .../llvm/lib/CodeGen/LexicalScopes.cpp:53
 #2  0xb14102f0 in computeIntervals () at .../llvm/lib/CodeGen/LiveDebugVariables.cpp:979
 pocl#3  runOnMachineFunction () at .../llvm/lib/CodeGen/LiveDebugVariables.cpp:996
 pocl#4  runOnMachineFunction () at .../llvm/lib/CodeGen/LiveDebugVariables.cpp:1023
 pocl#5  0xb14856c8 in runOnFunction () at .../llvm/lib/CodeGen/MachineFunctionPass.cpp:73
 pocl#6  0xb12ff494 in runOnFunction () at .../llvm/lib/IR/LegacyPassManager.cpp:1481
 pocl#7  0xb12ff750 in runOnModule () at .../llvm/lib/IR/LegacyPassManager.cpp:1517
 pocl#8  0xb12ffba8 in runOnModule () at .../llvm/lib/IR/LegacyPassManager.cpp:1582
 pocl#9  run () at .../llvm/lib/IR/LegacyPassManager.cpp:1694
 pocl#10 0xb6e64c82 in pocl_llvm_codegen (Device=Device@entry=0xdb0010, Modp=0x1361838, Output=Output@entry=0xbefde86c, OutputSize=OutputSize@entry=0xbefde880) at ./lib/CL/pocl_llvm_wg.cc:624
 pocl#11 0xb6e291de in llvm_codegen (output=output@entry=0xdeb898 "...BMDHA/Sdot_kernel/0-0-0/Sdot_kernel.so", device_i=device_i@entry=0, kernel=kernel@entry=0xbefe0240,
     device=0xdb0010, command=command@entry=0xbefe0278, specialize=specialize@entry=0) at ./lib/CL/devices/common.c:158
 pocl#12 0xb6e2ae44 in pocl_check_kernel_disk_cache (command=command@entry=0xbefe0278, specialized=specialized@entry=0) at ./lib/CL/devices/common.c:958
 pocl#13 0xb6e2b262 in pocl_check_kernel_dlhandle_cache (command=0xbefe0278, initial_refcount=0, specialize=0) at ./lib/CL/devices/common.c:1081
 pocl#14 0xb6e033d4 in program_compile_dynamic_wg_binaries (program=program@entry=0xd8ab88) at ./lib/CL/pocl_build.c:179
 pocl#15 0xb6e13f20 in get_binary_sizes (sizes=0xbefe0384, program=0xd8ab88) at ./lib/CL/clGetProgramInfo.c:36
 pocl#16 POclGetProgramInfo (program=0xd8ab88, param_name=4453, param_value_size=128, param_value=0xbefe0384, param_value_size_ret=0xbefe0380) at ./lib/CL/clGetProgramInfo.c:115
 pocl#17 0x00473070 in main () at 975931.c:238

pocl#889
https://bugs.debian.org/975931
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant