LLVM and SPIRV-LLVM-Translator pulldown (WW22)#6
Draft
bb-sycl wants to merge 3164 commits into
Draft
Conversation
…argets By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.
This reverts commit 8649778. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio
…invariant As per the reproducer provided by Mikael Holmén in post-commit review.
…op-invariant Given that BaseX is an incoming value when coming from the preheader, it *should* be loop-invariant, but let's just document this assumption.
The removed code just replicated what use_llvm_tool does, plus looked for an installed LLDB on the PATH to use. In a monorepo world, it seems likely that if people want to run the tests that require LLDB, they should enable and build LLDB itself. If users really want to use the installed LLDB executable, they can specify the path to the executable as an environment variable "LLDB". See the discussion in https://reviews.llvm.org/D95339#2638619 for more details. Reviewed by: jmorse, aprantl Differential Revision: https://reviews.llvm.org/D102680
RVV code generation does not successfully custom-lower BUILD_VECTOR in all cases. When it resorts to default expansion it may, on occasion, be expanded to scalar stores through the stack. Unfortunately these stores may then be picked up by the post-legalization DAGCombiner which merges them again. The merged store uses a BUILD_VECTOR which is then expanded, and so on. This patch addresses the issue by overriding the `mergeStoresAfterLegalization` hook. A lack of granularity in this method (being passed the scalar type) means we opt out in almost all cases when RVV fixed-length vector support is enabled. The only exception to this rule are mask vectors, which are always either custom-lowered or are expanded to a load from a constant pool. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102913
Drop old cmake variable names that were kept around so that zorg buildbot could be migrated, which has now happened (D102977). D102976 had fixed the inconsistent names. Differential Revision: https://reviews.llvm.org/D102997
The trip count for a memcpy/memset will be n/16 rounded up to the nearest integer. So (n+15)>>4. The old code was including a BIC too, to clear one of the bits, which does not seem correct. This remove the extra BIC. Note that ideally this would never actually be generated, as in the creation of a tail predicated loop we will DCE that setup code, letting the WLSTP perform the trip count calculation. So this doesn't usually come up in testing (and apparently the ARMLowOverheadLoops pass does not do any sort of validation on the tripcount). Only if the generation of the WLTP fails will it use the incorrect BIC instructions. Differential Revision: https://reviews.llvm.org/D102629
This makes sure that the blocks created for lowering memcpy to loops end up with branches, even if they fall through to the successor. Otherwise IfCvt is getting confused with unanalyzable branches and creating invalid block layouts. The extra branches should be removed as the tail predicated loop is finalized in almost all cases.
This reverts commit 94d5415. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
…ccessors The findLoopPreheader function will currently not find a preheader if it branches to multiple different loop headers. This patch adds an option to relax that, allowing ARMLowOverheadLoops to process more loops successfully. This helps with WhileLoopStart setup instructions that can branch/fallthrough to the low overhead loop and to branch to a separate loop from the same preheader (but I don't believe it is possible for both loops to be low overhead loops). Differential Revision: https://reviews.llvm.org/D102747
Allow use of bit-fields as a clang extension in OpenCL. The extension can be enabled using pragma directives. This fixes PR45339! Differential Revision: https://reviews.llvm.org/D101843
Differential Revision: https://reviews.llvm.org/D102498
Depends on D102498 Differential Revision: https://reviews.llvm.org/D102607
This patch adds initial implementation of mayReadOrWriteMemory, mayReadFromMemory and mayWriteToMemory to VPRecipeBase. Used by D100258.
The input IR for @load_extract_idx_var_i64_known_valid_by_assume and @load_extract_idx_var_i64_not_known_valid_by_assume_after_load has been swapped. This patch fixes the test so that @load_extract_idx_var_i64_known_valid_by_assume has the assume before the load and the other test has it after.
…n XOP/AVX2 targets By llvm-mca analysis, Haswell/Broadwell has a non-uniform vector shift recip-throughput cost of the AVX2 targets at 2 for both 128 and 256-bit vectors - XOP capable targets have better 128-bit vector shifts so improve the fallback in those cases.
This relands part of the UB fix in 4b074b4. The original commit also added some additional tests that uncovered some other issues (see D102845). I landed all the passing tests in 4878052 and this patch is now just fixing the UB in half2float. See D102846 for a proposed rewrite of the function. Original commit message: The added DumpDataExtractorTest uncovered that this is lshifting a negative integer which upsets ubsan and breaks the sanitizer bot. This patch just changes the variable we shift to be unsigned.
CONFLICT (content): Merge conflict in clang/test/Misc/nvptx.languageOptsOpenCL.cl
…tests
At the moment nearly every test calls something similar to
`self.dbg.CreateTarget(self.getBuildArtifact("a.out"))` and them sometimes
checks if the created target is actually valid with something like
`self.assertTrue(target.IsValid(), "some useless text")`.
Beside being really verbose the error messages generated by this pattern are
always just indicating that the target failed to be created but now why.
This patch introduces a helper function `createTestTarget` to our Test class
that creates the target with the much more verbose `CreateTarget` overload that
gives us back an SBError (with a fancy error). If the target couldn't be created
the function prints out the SBError that LLDB returned and asserts for us. It
also defaults to the "a.out" build artifact path that nearly all tests are using
to avoid to hardcode "a.out" in every test.
I converted a bunch of tests to the new function but I'll do the rest of the
test suite as follow ups.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D102771
In D102771 wanted to make `test_var` global to demonstrate the a no-launch test, but the old variable is still needed for another test. This just creates the global var with a different name to demonstrate the no-launch functionality.
This patch adds a first VPlan-based implementation of sinking of scalar operands. The current version traverse a VPlan once and processes all operands of a predicated REPLICATE recipe. If one of those operands can be sunk, it is moved to the block containing the predicated REPLICATE recipe. Continue with processing the operands of the sunk recipe. The initial version does not re-process candidates after other recipes have been sunk. It also cannot partially sink induction increments at the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the induction is used for example in a GEP, only the first lane is used and in the lowered IR the adds for the other lanes can be sunk into the predicated blocks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100258
This follows from the underlying logic for binops and min/max. Although it does not appear that we handle this for min/max intrinsics currently. https://alive2.llvm.org/ce/z/Kq9Xnh
CONFLICT (content): Merge conflict in clang/test/OpenMP/remarks_parallel_in_target_state_machine.c CONFLICT (content): Merge conflict in clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
Defined input/output procedures are specified in 12.6.4.8. There are different versions for read versus write and formatted versus unformatted, but they all share the same basic set of dummy arguments. I added several checking functions to check-declarations.cpp along with a test. In the process of implementing this, I noticed and fixed a typo in .../lib/Evaluate/characteristics.cpp. Differential Revision: https://reviews.llvm.org/D103045
This provides a sizable compile time improvement by seeding the worklist in an order that leads to less iterations of the worklist. This patch only changes the behavior of the Canonicalize pass itself, it does not affect other passes that use the GreedyPatternRewrite driver Differential Revision: https://reviews.llvm.org/D103053
If the nested create_directory call fails, we'd still want to re-report the errors with the create_directories function name, which is what the caller called. This fixes one aspect from MS STL's tests for std::filesystem. Differential Revision: https://reviews.llvm.org/D102365
…D_NEGATORS. This also provides some of the scaffolding needed by D102992 and D101729, and mops up after D101730 etc. Differential Revision: https://reviews.llvm.org/D103055
…s exited on 1st iteration" This reverts commit 2531fd7 due to performance regression on the PPC buildbot.
…t has more than one member Beside the `comdat any` deduplication feature, instrumentations use comdat to establish dependencies among a group of sections, to prevent section based linker garbage collection from discarding some members without discarding all. LangRef acknowledges this usage with the following wording: > All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key. On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated `__llvm_prf_data` section are placed in the same GRP_COMDAT group. A `__llvm_prf_data` is usually not referenced and expects the liveness of its associated `__llvm_prf_cnts` to retain it. The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained). The main goal of this patch is to fix the dependency relationship. I think it makes sense for InternalizePass to internalize a comdat and thus suppress the deduplication feature, e.g. a relocatable link of a regular LTO can create an object file affected by InternalizePass. If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the a.o references to the comdat definitions will be non-resolvable (references cannot bind to STB_LOCAL definitions in b.o). On PE-COFF, for a non-external selection symbol, deduplication is naturally suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend to believe the spec creator has not thought about this use case (see D102973). GNU ld and gold are still using the "signature is name based" interpretation. So even if D102973 for ld.lld is accepted, for portability, a better approach is to rename the comdat. A comdat with one single member is the common case, leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by deleting the comdat; otherwise we rename the comdat. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103043
[libomptarget][nfc] Move hostcall required test to rtl Remove a global, fix minor race. First of N patches to bring up hostcall. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103058
…ies. NFC. MLIRContext holds a few special case values that occur frequently like empty dictionary and NoneType, which allow us to avoid taking locks to get an instance of them. Give the empty StringAttr this treatment as well. This cuts several percent off compile time for CIRCT. Differential Revision: https://reviews.llvm.org/D103117
Reviewed By: cryptoad Differential Revision: https://reviews.llvm.org/D103122
… linalg.index Indexed Generic should be going away in the future. Migrate to linalg.index. Reviewed By: NatashaKnk, nicolasvasilache Differential Revision: https://reviews.llvm.org/D103110
When building with Clang 11 on Windows, silence the following:
[432/5643] Building C object projects\compiler-rt\lib\profile\CMakeFiles\clang_rt.profile-x86_64.dir\GCDAProfiling.c.obj
F:\aganea\llvm-project\compiler-rt\lib\profile\GCDAProfiling.c(464,13): warning: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Wsign-compare]
if (val != (gcov_version >= 90 ? GCOV_TAG_OBJECT_SUMMARY
~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
When building with Clang 11 on Windows, silence the following:
F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(955,8): warning: 'Run' overrides a member function but is not marked 'override' [-Wsuggest-override]
void Run(State& st);
^
F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(895,16): note: overridden virtual function is here
virtual void Run(State& state) = 0;
^
1 warning generated.
…tect-stack-use-after-return-mode. Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always. for issue: google/sanitizers#1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102462
The generic approach can still be used by musl and FreeBSD. Note: on glibc
2.31, TLS_PRE_TCB_SIZE is 0x700, larger than ThreadDescriptorSize() by 16, but
this is benign: as long as the range includes pthread::{specific_1stblock,specific}
pthread_setspecific will not cause false positives.
Note: the state before afec953 underestimated
the TLS size a lot (nearly ThreadDescriptorSize() = 1776).
That may explain why afec953 actually made some
tests pass.
Since the opaque pointer type won't contain the pointee type, we need to separately encode the value type for an atomicrmw. Emit this new code for atomicrmw. Handle this new code and the old one in the bitcode reader. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D103123
When the lower type test pass is invoked a second time with DropTypeTests set to true, it expects that all remaining type tests feed assume instructions, which are removed along with the type tests. In some cases the llvm.assume might have been merged with another one, i.e. from a builtin_assume instruction, in which case the type test would actually feed a phi that in turn feeds the merged assume instruction. In this case we can simply replace that operand of the phi with "true" before removing the type test. Differential Revision: https://reviews.llvm.org/D103073
Pass LLVM_PRETTY_FUNCTION directly for the no-argument macro.
Right after pushing, I remembered that this was added to silence a GCC warning (https://reviews.llvm.org/D99120). This reverts my patch and adds a comment.
FullTy is only necessary when we need to figure out what type an instruction works with given a pointer's pointee type. However, we just end up using the value operand's type, so FullTy isn't necessary. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D102788
Implements part of P0896 'The One Ranges Proposal'. Implements [range.iter.op.advance]. Differential Revision: https://reviews.llvm.org/D101922
…st (intel#3801) When compiling with the integration footer enabled, we want to be sure that the integration header is only pulled in once (during the preprocessing step) and not during the full compilation of the source w/ appended footer.
Compiler generated ITT annotations may interfere with debugging the instrumented user code even in non-ITT mode (when ITT is disabled via the specialization constant). This change marks the wrappers used by the compiler generated code with always_inline so that device compiler are able to get rid of the extra code in non-ITT mode. Signed-off-by: Vyacheslav Zakharin <vyacheslav.p.zakharin@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LLVM: llvm/llvm-project@9ef66ed
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@efa761b