Draft
Conversation
This code sets global_mem_cache_size equal to the size of the last level cache (typically L3 on current CPUs) and local_mem_size equal to the size of the last private cache of a core (typically L2 on current CPUs) divided by the number of hardware threads for that cache. Move the specification of local_mem_size from common.c to pocl_topology.c.
isuruf
pushed a commit
that referenced
this pull request
Feb 9, 2022
The segmentation fault can be observed with llvm-10, llvm-11 and llvm-12 and seems to be fixed in llvm-13. It happens on the architectures armhf and armel (both 32-bit) always and on x86_64 sporadically. The test segfaults only on the first run (i.e. the kernel is not yet in pocl's kernel cache) while it passes on subsequent execution (with something already in the kernel cache), emitting only some llvm diagnostics: inlinable function call in a function with debug info must have a !dbg location %11 = call i32 @_Z12get_local_idj(i32 0) inlinable function call in a function with debug info must have a !dbg location %19 = call i32 @_Z12get_local_idj(i32 1) inlinable function call in a function with debug info must have a !dbg location %27 = call i32 @_Z12get_local_idj(i32 2) The backtrace of the segmentation fault as observed with llvm-10 and pocl 1.6: #0 getEmissionKind () at .../llvm/include/llvm/IR/DebugInfoMetadata.h:1244 #1 initialize () at .../llvm/lib/CodeGen/LexicalScopes.cpp:53 #2 0xb14102f0 in computeIntervals () at .../llvm/lib/CodeGen/LiveDebugVariables.cpp:979 pocl#3 runOnMachineFunction () at .../llvm/lib/CodeGen/LiveDebugVariables.cpp:996 pocl#4 runOnMachineFunction () at .../llvm/lib/CodeGen/LiveDebugVariables.cpp:1023 pocl#5 0xb14856c8 in runOnFunction () at .../llvm/lib/CodeGen/MachineFunctionPass.cpp:73 pocl#6 0xb12ff494 in runOnFunction () at .../llvm/lib/IR/LegacyPassManager.cpp:1481 pocl#7 0xb12ff750 in runOnModule () at .../llvm/lib/IR/LegacyPassManager.cpp:1517 pocl#8 0xb12ffba8 in runOnModule () at .../llvm/lib/IR/LegacyPassManager.cpp:1582 pocl#9 run () at .../llvm/lib/IR/LegacyPassManager.cpp:1694 pocl#10 0xb6e64c82 in pocl_llvm_codegen (Device=Device@entry=0xdb0010, Modp=0x1361838, Output=Output@entry=0xbefde86c, OutputSize=OutputSize@entry=0xbefde880) at ./lib/CL/pocl_llvm_wg.cc:624 pocl#11 0xb6e291de in llvm_codegen (output=output@entry=0xdeb898 "...BMDHA/Sdot_kernel/0-0-0/Sdot_kernel.so", device_i=device_i@entry=0, kernel=kernel@entry=0xbefe0240, device=0xdb0010, command=command@entry=0xbefe0278, specialize=specialize@entry=0) at ./lib/CL/devices/common.c:158 pocl#12 0xb6e2ae44 in pocl_check_kernel_disk_cache (command=command@entry=0xbefe0278, specialized=specialized@entry=0) at ./lib/CL/devices/common.c:958 pocl#13 0xb6e2b262 in pocl_check_kernel_dlhandle_cache (command=0xbefe0278, initial_refcount=0, specialize=0) at ./lib/CL/devices/common.c:1081 pocl#14 0xb6e033d4 in program_compile_dynamic_wg_binaries (program=program@entry=0xd8ab88) at ./lib/CL/pocl_build.c:179 pocl#15 0xb6e13f20 in get_binary_sizes (sizes=0xbefe0384, program=0xd8ab88) at ./lib/CL/clGetProgramInfo.c:36 pocl#16 POclGetProgramInfo (program=0xd8ab88, param_name=4453, param_value_size=128, param_value=0xbefe0384, param_value_size_ret=0xbefe0380) at ./lib/CL/clGetProgramInfo.c:115 pocl#17 0x00473070 in main () at 975931.c:238 pocl#889 https://bugs.debian.org/975931
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[draft]
Taking the discussion following my comment in Issue 810 into account, I hereby propose an improved hwloc detection mechanism. With Hyper-Threading enabled, the size of the cache used as local memory will now be divided by the number of PUs (=hardware threads) below it. The proposed code also covers edge cases where all or none of the detected cache levels are shared.
PoCL currently uses L3/L2 for global_mem_cache_size/local_mem_size. As identified by Oblomov, Intel uses L2/L1 while AMD (fglrx) uses L1/L1. Hence, I included [commit-hash] to change PoCL's strategy from L3/L2 to L2/L1 (without identifying performance implications). Please let me know in case you prefer the L3/L2 over L2/L1 and I will remove the commit from this PR. The table below shows the resulting values for two example CPUs with Hyper-Threading enabled/disabled and the two proposed strategies.
hwloc cache info available for Ryzen 5 3600?