-
Notifications
You must be signed in to change notification settings - Fork 0
GPU heuristics determinism integration #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
09a484a
f55e648
554d24c
db73e00
0b8b6e1
122bfbc
45ebd57
a209096
718f3e7
1df9313
b45982e
d272180
0987b5c
f378246
4b77a59
bcea60d
d7ff462
37ca39e
5c9e59a
22c95a7
0fc30c5
8bd70e3
7bcf3ba
4bfb96f
f1afa59
cd2595e
6a1192f
8dfdfd8
8f26067
24fdc75
759b896
dc30219
1ca01ba
5894ffe
9911c3a
24b828e
d52bcb0
ef69eb7
7ac230f
24d7ccb
d35bed2
1913946
ecf9f25
fb9c782
4a51ca7
1a4ac76
ce44aff
9b881d8
b293e94
88adfcd
ee7037f
fb0c072
e3f2a79
057ecc7
2463de6
8356f1d
192ce53
8b31229
3cced01
63c917a
d33939b
47ae4cf
ed224fb
741f373
63b9ed1
0ea2c1f
00326e2
c374f69
2ae3d3e
d89c50a
d67f230
eb6bc28
d80667c
8ed172a
2a16ee6
ef777c6
6dcb062
35e8177
e06af9f
99e9ec2
8fdbff8
b6d7ecc
668391e
3b6d532
81aaf20
07cea4d
9fb0edd
e5e498d
525c7b5
bb5ed1c
1dc32de
79a6d54
ec248cd
9579cd1
c8a9836
5e13486
ec5152a
0d70d96
2b0e5e4
21bf0a5
75c942a
9db457b
c3e7ff6
3ed12cf
1e5807b
2322ad1
e532bcd
f80c733
e3e37ad
4c9f494
8e9a2fc
c217922
90f184c
ec4239b
014dc72
58c6c4b
5e5d285
068c699
7d0fd31
e077cb4
0619554
4907783
cc10766
8cd14cc
6e1be5b
bc93781
e6b7457
0e9427f
295fd7e
92ccbe9
59e0ad0
0a47fe0
eaf45e6
02a5bf8
fff6a8e
d915f6a
0a0f12f
3a7b43f
6494275
588c77f
7613978
d423c6f
638d35f
e8bdc7c
41ac7bb
1867f30
a90233f
9598b37
83b78ef
0227513
18a2d4d
3e214a8
e8b1291
cc1e5c3
859975f
a63af82
894b200
0a20329
f7ebf56
479b662
27fb24b
9a4b1d0
e87c98f
73dffbe
472f2b6
8d4e34b
cc33495
1ef5fd9
e80e545
427b78c
b41dc3f
1aeb045
78d072b
1a4e270
7ca398c
66992e4
63a71f2
0a6528f
d82f2a8
1391f13
78ee5bd
0107bc2
da540f0
d6c5046
2bbc1c4
bc5d939
7583f56
6ffd6fd
91ef5c8
1cad039
dc67a1f
f551754
9b10fe2
c9646ce
07cbfc1
5c3863f
a26ac85
8c6eeb7
1994302
1dc2adc
c0bb0b1
bcb1dba
bbde0e7
dbe90e8
5d473e7
ba9ce2c
78e5e29
eafa37e
69198ae
29ffb43
133fcf1
bc7400e
b4018e3
2c9cc21
583fe65
b295f48
03492f5
957074c
0b1f6b1
3cfd9b1
6d26754
ec7db3a
2ffb465
4dc05e9
86ba346
e0ef9a6
0f4bb86
6848dcc
3951105
dd9b716
6390814
caedeb9
70382f4
b1db5ab
1678640
dc8f073
c3f8fc3
7b843bd
629f681
75bbfb5
a5d830e
a943845
be7859f
40a72db
079a84e
c34cf0b
39d01a9
adc08b0
4f8c071
b25b34f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,249 @@ | ||
| <?xml version="1.0" encoding="utf-8"?> | ||
| <ComputeSanitizerOutput> | ||
| <record> | ||
| <kind>Initcheck</kind> | ||
| <what> | ||
| <text>Uninitialized __global__ memory read of size 4 bytes</text> | ||
| <size>4</size> | ||
| </what> | ||
| <where> | ||
| <func>.*</func> | ||
| </where> | ||
| <hostStack> | ||
| <frame> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <func>cusparseCsr2cscEx2</func> | ||
| <module>.*libcusparse.so.*</module> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
| <record> | ||
| <kind>Initcheck</kind> | ||
| <what> | ||
| <text>Uninitialized __global__ memory read of size 4 bytes</text> | ||
| <size>4</size> | ||
| </what> | ||
| <where> | ||
| <func>ThreadLoad</func> | ||
| </where> | ||
| <hostStack> | ||
| <frame> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>libcudart.*</module> | ||
| </frame> | ||
| <frame> | ||
| <func>cudaLaunchKernel</func> | ||
| </frame> | ||
| <frame> | ||
| <func>.*cub::.*::Device(Segmented)?(Reduce|Scan)(SingleTile)?Kernel.*</func> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
| <record> | ||
| <kind>Initcheck</kind> | ||
| <what> | ||
| <text>Uninitialized __global__ memory read of size 2 bytes</text> | ||
| <size>2</size> | ||
| </what> | ||
| <where> | ||
| <func>ThreadLoad</func> | ||
| </where> | ||
| <hostStack> | ||
| <frame> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>libcudart.*</module> | ||
| </frame> | ||
| <frame> | ||
| <func>cudaLaunchKernel</func> | ||
| </frame> | ||
| <frame> | ||
| <func>.*cub::.*::Device(Segmented)?(Reduce|Scan)(SingleTile)?Kernel.*</func> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
| <record> | ||
| <kind>Initcheck</kind> | ||
| <what> | ||
| <text>Uninitialized __global__ memory read of size 8 bytes</text> | ||
| <size>8</size> | ||
| </what> | ||
| <where> | ||
| <func>DeviceSegmentedReduceKernel</func> | ||
| </where> | ||
| </record> | ||
|
Comment on lines
+70
to
+79
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
python - <<'PY'
import xml.etree.ElementTree as ET
p = "ci/compute-sanitizer-suppressions.xml"
root = ET.parse(p).getroot()
for i, rec in enumerate(root.findall("record"), 1):
kind = (rec.findtext("kind") or "").strip()
where_func = (rec.findtext("./where/func") or "").strip()
has_host_stack = rec.find("hostStack") is not None
if kind == "Initcheck" and (not has_host_stack):
print(f"record {i}: Initcheck without hostStack, where/func={where_func or '<none>'}")
PYRepository: aliceb-nv/cuopt-fork Length of output: 141 Add a The suppression for 🤖 Prompt for AI Agents |
||
| <record> | ||
| <kind>Initcheck</kind> | ||
| <what> | ||
| <text>Uninitialized __global__ memory read of size 4 bytes</text> | ||
| <size>4</size> | ||
| </what> | ||
| <where> | ||
| <func>ThreadLoad</func> | ||
| </where> | ||
| <hostStack> | ||
| <frame> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>libcudart.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>libcudart.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcuopt.*</module> | ||
| </frame> | ||
| <frame> | ||
| <func>.*Device(Reduce|Scan)Kernel.*</func> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
| <!-- Rule matching cccl's pattern of copying tuples back to host after reduce_by_keys, which contain uninitialized padding --> | ||
| <!-- Because of aggressive inlining, thrust calls are elided out of the host stack, which prevents a more finely grained rule. In practice this is good enough --> | ||
| <record> | ||
| <kind>InitcheckApiError</kind> | ||
| <level>Error</level> | ||
| <what> | ||
| <text>Host API uninitialized memory access</text> | ||
| <accessSize>16</accessSize> | ||
| </what> | ||
| <hostStack> | ||
| <frame> | ||
| <func>cuMemcpyDtoHAsync.*</func> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
| <!-- Suppress uninit copies on rmm::device_vector copy constructor - often vector members are allocated but not filled --> | ||
| <record> | ||
| <kind>InitcheckApiError</kind> | ||
| <level>Error</level> | ||
| <what> | ||
| <text>Host API uninitialized memory access</text> | ||
| </what> | ||
| <hostStack> | ||
| <frame> | ||
| <func>cuMemcpyAsync</func> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcudart.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcudart.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcudart.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*librmm.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <func>rmm::device_buffer::device_buffer</func> | ||
| <module>.*librmm.so.*</module> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
| <!-- Suppress likely harmless Thrust/CUB tuple-buffer initcheck reads during sort_by_key | ||
| in trivial_presolve's COO->CSC reorder. Source and destination arrays are validated | ||
| immediately before the sort; the warning appears to come from internal tuple staging. --> | ||
| <record> | ||
| <kind>Initcheck</kind> | ||
| <what> | ||
| <text>Uninitialized __global__ memory read</text> | ||
| </what> | ||
| <where> | ||
| <func>transform_kernel</func> | ||
| </where> | ||
| <hostStack> | ||
| <frame> | ||
| <func>cuLaunchKernel_ptsz</func> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcudart.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <func>cudaLaunchKernel_ptsz</func> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
| <record> | ||
| <kind>InitcheckApiError</kind> | ||
| <level>Error</level> | ||
| <what> | ||
| <text>Host API uninitialized memory access</text> | ||
| </what> | ||
| <hostStack> | ||
| <frame> | ||
| <func>cuMemcpyAsync</func> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcudart.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcudart.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcudart.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*librmm.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*librmm.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <func>rmm::device_uvector.*::device_uvector</func> | ||
| <module>.*libcuopt.so.*</module> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
| <!-- Uninitialized device-to-device copies are usually harmless - if actualy bogus, errors may be caught later on --> | ||
| <record> | ||
| <kind>InitcheckApiError</kind> | ||
| <level>Error</level> | ||
| <what> | ||
| <text>Host API uninitialized memory access</text> | ||
| </what> | ||
| <hostStack> | ||
| <frame> | ||
| <func>cuMemcpyDtoDAsync.*</func> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
|
Comment on lines
+210
to
+222
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
python - <<'PY'
import xml.etree.ElementTree as ET, re
p = "ci/compute-sanitizer-suppressions.xml"
root = ET.parse(p).getroot()
for i, rec in enumerate(root.findall("record"), 1):
if (rec.findtext("kind") or "").strip() != "InitcheckApiError":
continue
frames = rec.findall("./hostStack/frame")
funcs = [(f.findtext("func") or "").strip() for f in frames]
if any(re.search(r"cuMemcpyDtoDAsync", fn) for fn in funcs):
print(f"record {i}: cuMemcpyDtoDAsync suppression has {len(frames)} host frame(s)")
for n, fn in enumerate(funcs, 1):
print(f" frame {n}: {fn or '<module-only frame>'}")
PYRepository: aliceb-nv/cuopt-fork Length of output: 155 🏁 Script executed: # First, let's look at the full context of the suppression file
head -250 ci/compute-sanitizer-suppressions.xml | tail -50Repository: aliceb-nv/cuopt-fork Length of output: 1363 🏁 Script executed: # Check if there are any comments in the XML that explain this suppression
grep -B5 -A5 "cuMemcpyDtoDAsync" ci/compute-sanitizer-suppressions.xmlRepository: aliceb-nv/cuopt-fork Length of output: 308 🏁 Script executed: # Look at the git history for when this suppression was added and why
git log --oneline -n 20 -- ci/compute-sanitizer-suppressions.xmlRepository: aliceb-nv/cuopt-fork Length of output: 84 🏁 Script executed: # Check if this suppression is mentioned in any documentation or comments in the codebase
rg "cuMemcpyDtoDAsync" --type txt --type md -A3 -B3Repository: aliceb-nv/cuopt-fork Length of output: 46 Constrain This rule broadly suppresses all 🤖 Prompt for AI Agents |
||
| <record> | ||
| <kind>InitcheckApiError</kind> | ||
| <level>Error</level> | ||
| <what> | ||
| <text>Host API uninitialized memory access</text> | ||
| </what> | ||
| <hostStack> | ||
| <frame> | ||
| <func>cuMemcpyAsync</func> | ||
| <module>.*libcuda.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcudart.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <module>.*libcudart.so.*</module> | ||
| </frame> | ||
| <frame> | ||
| <func>cudaMemcpyAsync</func> | ||
| </frame> | ||
| <frame> | ||
| <func>rmm::device_buffer::resize</func> | ||
| <module>.*librmm.so.*</module> | ||
| </frame> | ||
| </hostStack> | ||
| </record> | ||
| </ComputeSanitizerOutput> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: aliceb-nv/cuopt-fork
Length of output: 113
🏁 Script executed:
Repository: aliceb-nv/cuopt-fork
Length of output: 1762
🏁 Script executed:
Repository: aliceb-nv/cuopt-fork
Length of output: 433
🏁 Script executed:
# Get line count to understand file size wc -l benchmarks/linear_programming/cuopt/run_mip.cppRepository: aliceb-nv/cuopt-fork
Length of output: 115
🏁 Script executed:
Repository: aliceb-nv/cuopt-fork
Length of output: 1964
🏁 Script executed:
Repository: aliceb-nv/cuopt-fork
Length of output: 1228
Use
steady_clockfor elapsed-time capture.high_resolution_clockis not guaranteed to be monotonic and may aliassystem_clockon some implementations, causing the incumbent trace timestamps to jump backwards if system time is adjusted.steady_clockis guaranteed monotonic and is the correct choice for benchmark elapsed-time measurement. Apply this change to the constructor parameter, member variable, callback implementation, and thestart_run_solverassignment and finalendcalculation.Proposed changes
🤖 Prompt for AI Agents