Improve hwloc by panau161 · Pull Request #2 · panau161/pocl

panau161 · 2020-12-18T05:58:14Z

[draft]

Taking the discussion following my comment in Issue 810 into account, I hereby propose an improved hwloc detection mechanism. With Hyper-Threading enabled, the size of the cache used as local memory will now be divided by the number of PUs (=hardware threads) below it. The proposed code also covers edge cases where all or none of the detected cache levels are shared.

PoCL currently uses L3/L2 for global_mem_cache_size/local_mem_size. As identified by Oblomov, Intel uses L2/L1 while AMD (fglrx) uses L1/L1. Hence, I included [commit-hash] to change PoCL's strategy from L3/L2 to L2/L1 (without identifying performance implications). Please let me know in case you prefer the L3/L2 over L2/L1 and I will remove the commit from this PR. The table below shows the resulting values for two example CPUs with Hyper-Threading enabled/disabled and the two proposed strategies.

CPU (#Cores/#Threads)	L1* (I+D)	L2*	L3**	Strategy	global_mem_cache_size	local_mem_size	Tested
i5-7200U (2C/4T)	32K+32K	256K	3M	L3/L2	3M	128K (=L2/2)	yes
i5-7200U (2C/2T)	32K+32K	256K	3M	L3/L2	3M	256K
i5-7200U (2C/4T)	32K+32K	256K	3M	L2/L1	256K	16K (=L1d/2)	yes
i5-7200U (2C/2T)	32K+32K	256K	3M	L2/L1	256K	32K
Ryzen 5 3600 (6C/12T)	32K+32K	512K	32M	L3/L2	32M	256K (=L2/2)
Ryzen 5 3600 (6C/6T)	32K+32K	512K	32M	L3/L2	32M	512K	yes
Ryzen 5 3600 (6C/12T)	32K+32K	512K	32M	L2/L1	512K	16K (=L1d/2)
Ryzen 5 3600 (6C/6T)	32K+32K	512K	32M	L2/L1	512K	32K	yes

hwloc cache info available for Ryzen 5 3600?

	v1.11.9	v2.1.0
yes	tested
no		tested

	HT disabled	HT enabled
i5-7200U		tested
Ryzen 5 3600	tested

This code sets global_mem_cache_size equal to the size of the last level cache (typically L3 on current CPUs) and local_mem_size equal to the size of the last private cache of a core (typically L2 on current CPUs) divided by the number of hardware threads for that cache. Move the specification of local_mem_size from common.c to pocl_topology.c.

The segmentation fault can be observed with llvm-10, llvm-11 and llvm-12 and seems to be fixed in llvm-13. It happens on the architectures armhf and armel (both 32-bit) always and on x86_64 sporadically. The test segfaults only on the first run (i.e. the kernel is not yet in pocl's kernel cache) while it passes on subsequent execution (with something already in the kernel cache), emitting only some llvm diagnostics: inlinable function call in a function with debug info must have a !dbg location %11 = call i32 @_Z12get_local_idj(i32 0) inlinable function call in a function with debug info must have a !dbg location %19 = call i32 @_Z12get_local_idj(i32 1) inlinable function call in a function with debug info must have a !dbg location %27 = call i32 @_Z12get_local_idj(i32 2) The backtrace of the segmentation fault as observed with llvm-10 and pocl 1.6: #0 getEmissionKind () at .../llvm/include/llvm/IR/DebugInfoMetadata.h:1244 #1 initialize () at .../llvm/lib/CodeGen/LexicalScopes.cpp:53 #2 0xb14102f0 in computeIntervals () at .../llvm/lib/CodeGen/LiveDebugVariables.cpp:979 pocl#3 runOnMachineFunction () at .../llvm/lib/CodeGen/LiveDebugVariables.cpp:996 pocl#4 runOnMachineFunction () at .../llvm/lib/CodeGen/LiveDebugVariables.cpp:1023 pocl#5 0xb14856c8 in runOnFunction () at .../llvm/lib/CodeGen/MachineFunctionPass.cpp:73 pocl#6 0xb12ff494 in runOnFunction () at .../llvm/lib/IR/LegacyPassManager.cpp:1481 pocl#7 0xb12ff750 in runOnModule () at .../llvm/lib/IR/LegacyPassManager.cpp:1517 pocl#8 0xb12ffba8 in runOnModule () at .../llvm/lib/IR/LegacyPassManager.cpp:1582 pocl#9 run () at .../llvm/lib/IR/LegacyPassManager.cpp:1694 pocl#10 0xb6e64c82 in pocl_llvm_codegen (Device=Device@entry=0xdb0010, Modp=0x1361838, Output=Output@entry=0xbefde86c, OutputSize=OutputSize@entry=0xbefde880) at ./lib/CL/pocl_llvm_wg.cc:624 pocl#11 0xb6e291de in llvm_codegen (output=output@entry=0xdeb898 "...BMDHA/Sdot_kernel/0-0-0/Sdot_kernel.so", device_i=device_i@entry=0, kernel=kernel@entry=0xbefe0240, device=0xdb0010, command=command@entry=0xbefe0278, specialize=specialize@entry=0) at ./lib/CL/devices/common.c:158 pocl#12 0xb6e2ae44 in pocl_check_kernel_disk_cache (command=command@entry=0xbefe0278, specialized=specialized@entry=0) at ./lib/CL/devices/common.c:958 pocl#13 0xb6e2b262 in pocl_check_kernel_dlhandle_cache (command=0xbefe0278, initial_refcount=0, specialize=0) at ./lib/CL/devices/common.c:1081 pocl#14 0xb6e033d4 in program_compile_dynamic_wg_binaries (program=program@entry=0xd8ab88) at ./lib/CL/pocl_build.c:179 pocl#15 0xb6e13f20 in get_binary_sizes (sizes=0xbefe0384, program=0xd8ab88) at ./lib/CL/clGetProgramInfo.c:36 pocl#16 POclGetProgramInfo (program=0xd8ab88, param_name=4453, param_value_size=128, param_value=0xbefe0384, param_value_size_ret=0xbefe0380) at ./lib/CL/clGetProgramInfo.c:115 pocl#17 0x00473070 in main () at 975931.c:238 pocl#889 https://bugs.debian.org/975931

panau161 added 3 commits December 7, 2020 02:23

Use L2/L1 instead of L3/L2 for cache/local_mem

7b91f10

Use required minimum values as fallback

1e7630d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve hwloc#2

Improve hwloc#2
panau161 wants to merge 3 commits intomasterfrom
improve_hwloc

panau161 commented Dec 18, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

panau161 commented Dec 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

panau161 commented Dec 18, 2020 •

edited

Loading