Fixes for nightly#803
Open
maleadt wants to merge 7 commits into
Open
Conversation
…21+. The NVPTX back-end on LLVM 21 dropped its dependence on the legacy nvvm.annotations metadata for maxntid/reqntid/minctasm/maxnreg; the asm printer now reads function-level attributes that LLVM auto-upgrades the annotations into at IR parse time. Modules built in-memory don't go through that auto-upgrade, so emit the attributes ourselves on LLVM 21+. Also move the metadata emission ahead of optimization so the AnnotationCache lookups done by NVVMIntrRangePass on older releases don't latch onto a stale empty entry.
Member
Author
|
Hmm, I can't reproduce these failures locally... |
ParallelTestRunner buffers each worker's stdio into an IOBuffer that's only printed after the testset completes; an abrupt crash (the metal testset on Julia nightly) loses the libjulia stderr that names the signal. Drop to a single worker and enable --verbose so the crash lands on the parent process' stderr and the 'started at' line identifies the testset in flight. Revert once the underlying crash is diagnosed.
…nly. The previous --jobs=1 debug commit didn't help: PTR still constructs the Malt worker with monitor_stdout/stderr=false, drains the pipes into an IOBuffer asynchronously, and prints the buffer only when the testset completes. If the worker is killed abruptly, the libjulia signal/stack trace lands on the worker's stderr while no one is reading, then the pipe closes and the message is gone before PTR's @async reader is scheduled. Reopen the constructor and pass monitor_stdout=monitor_stderr=true so Malt forwards directly to the parent. Also narrow the testsuite to the metal testset so the failing run reaches the crash quickly. Revert all of this once we have a signal line.
Previous attempts (forcing --jobs=1, then live-forwarding the worker's stdio via monitor_stdout=monitor_stderr=true) still produced zero output on the crash. Drop ParallelTestRunner/Malt altogether for this triage run and include test/metal.jl directly in the parent process. Whatever kills the worker — signal, libjulia abort, allocation failure, ulimit — now hits the test driver itself and lands on the CI log's stderr with nothing in between.
CI surfaced 'received signal: 11' from Pkg.test on the inline-runner commit, but with no Julia-side stack trace. Preprocess metal.jl on include and inject a 'entering testset: <name>' line on stderr before each @testset so the last line in the CI log before the SIGSEGV pins the crash to a single testset.
Previous CI run pinned the SIGSEGV to the 'byref primitives' testset (last 'entering testset' marker; segv 4s later). Open up the two byref testsets and announce each individual Metal.code_llvm call on stderr so we can see whether the crash is the @eval, the non-kernel compile, or the kernel=true compile.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.