LLVM and SPIRV-LLVM-Translator pulldown (WW22) by bb-sycl · Pull Request #6 · hanzhan1/llvm

bb-sycl · 2021-05-26T08:28:03Z

LLVM: llvm/llvm-project@9ef66ed
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@efa761b

…argets By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.

This reverts commit 8649778. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio

…invariant As per the reproducer provided by Mikael Holmén in post-commit review.

…op-invariant Given that BaseX is an incoming value when coming from the preheader, it *should* be loop-invariant, but let's just document this assumption.

The removed code just replicated what use_llvm_tool does, plus looked for an installed LLDB on the PATH to use. In a monorepo world, it seems likely that if people want to run the tests that require LLDB, they should enable and build LLDB itself. If users really want to use the installed LLDB executable, they can specify the path to the executable as an environment variable "LLDB". See the discussion in https://reviews.llvm.org/D95339#2638619 for more details. Reviewed by: jmorse, aprantl Differential Revision: https://reviews.llvm.org/D102680

RVV code generation does not successfully custom-lower BUILD_VECTOR in all cases. When it resorts to default expansion it may, on occasion, be expanded to scalar stores through the stack. Unfortunately these stores may then be picked up by the post-legalization DAGCombiner which merges them again. The merged store uses a BUILD_VECTOR which is then expanded, and so on. This patch addresses the issue by overriding the `mergeStoresAfterLegalization` hook. A lack of granularity in this method (being passed the scalar type) means we opt out in almost all cases when RVV fixed-length vector support is enabled. The only exception to this rule are mask vectors, which are always either custom-lowered or are expanded to a load from a constant pool. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102913

Drop old cmake variable names that were kept around so that zorg buildbot could be migrated, which has now happened (D102977). D102976 had fixed the inconsistent names. Differential Revision: https://reviews.llvm.org/D102997

The trip count for a memcpy/memset will be n/16 rounded up to the nearest integer. So (n+15)>>4. The old code was including a BIC too, to clear one of the bits, which does not seem correct. This remove the extra BIC. Note that ideally this would never actually be generated, as in the creation of a tail predicated loop we will DCE that setup code, letting the WLSTP perform the trip count calculation. So this doesn't usually come up in testing (and apparently the ARMLowOverheadLoops pass does not do any sort of validation on the tripcount). Only if the generation of the WLTP fails will it use the incorrect BIC instructions. Differential Revision: https://reviews.llvm.org/D102629

This makes sure that the blocks created for lowering memcpy to loops end up with branches, even if they fall through to the successor. Otherwise IfCvt is getting confused with unanalyzable branches and creating invalid block layouts. The extra branches should be removed as the tail predicated loop is finalized in almost all cases.

This reverts commit 94d5415. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

…ccessors The findLoopPreheader function will currently not find a preheader if it branches to multiple different loop headers. This patch adds an option to relax that, allowing ARMLowOverheadLoops to process more loops successfully. This helps with WhileLoopStart setup instructions that can branch/fallthrough to the low overhead loop and to branch to a separate loop from the same preheader (but I don't believe it is possible for both loops to be low overhead loops). Differential Revision: https://reviews.llvm.org/D102747

Allow use of bit-fields as a clang extension in OpenCL. The extension can be enabled using pragma directives. This fixes PR45339! Differential Revision: https://reviews.llvm.org/D101843

Differential Revision: https://reviews.llvm.org/D102498

Depends on D102498 Differential Revision: https://reviews.llvm.org/D102607

This patch adds initial implementation of mayReadOrWriteMemory, mayReadFromMemory and mayWriteToMemory to VPRecipeBase. Used by D100258.

The input IR for @load_extract_idx_var_i64_known_valid_by_assume and @load_extract_idx_var_i64_not_known_valid_by_assume_after_load has been swapped. This patch fixes the test so that @load_extract_idx_var_i64_known_valid_by_assume has the assume before the load and the other test has it after.

…n XOP/AVX2 targets By llvm-mca analysis, Haswell/Broadwell has a non-uniform vector shift recip-throughput cost of the AVX2 targets at 2 for both 128 and 256-bit vectors - XOP capable targets have better 128-bit vector shifts so improve the fallback in those cases.

This relands part of the UB fix in 4b074b4. The original commit also added some additional tests that uncovered some other issues (see D102845). I landed all the passing tests in 4878052 and this patch is now just fixing the UB in half2float. See D102846 for a proposed rewrite of the function. Original commit message: The added DumpDataExtractorTest uncovered that this is lshifting a negative integer which upsets ubsan and breaks the sanitizer bot. This patch just changes the variable we shift to be unsigned.

CONFLICT (content): Merge conflict in clang/test/Misc/nvptx.languageOptsOpenCL.cl

…tests At the moment nearly every test calls something similar to `self.dbg.CreateTarget(self.getBuildArtifact("a.out"))` and them sometimes checks if the created target is actually valid with something like `self.assertTrue(target.IsValid(), "some useless text")`. Beside being really verbose the error messages generated by this pattern are always just indicating that the target failed to be created but now why. This patch introduces a helper function `createTestTarget` to our Test class that creates the target with the much more verbose `CreateTarget` overload that gives us back an SBError (with a fancy error). If the target couldn't be created the function prints out the SBError that LLDB returned and asserts for us. It also defaults to the "a.out" build artifact path that nearly all tests are using to avoid to hardcode "a.out" in every test. I converted a bunch of tests to the new function but I'll do the rest of the test suite as follow ups. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D102771

In D102771 wanted to make `test_var` global to demonstrate the a no-launch test, but the old variable is still needed for another test. This just creates the global var with a different name to demonstrate the no-launch functionality.

This patch adds a first VPlan-based implementation of sinking of scalar operands. The current version traverse a VPlan once and processes all operands of a predicated REPLICATE recipe. If one of those operands can be sunk, it is moved to the block containing the predicated REPLICATE recipe. Continue with processing the operands of the sunk recipe. The initial version does not re-process candidates after other recipes have been sunk. It also cannot partially sink induction increments at the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the induction is used for example in a GEP, only the first lane is used and in the lowered IR the adds for the other lanes can be sunk into the predicated blocks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100258

This follows from the underlying logic for binops and min/max. Although it does not appear that we handle this for min/max intrinsics currently. https://alive2.llvm.org/ce/z/Kq9Xnh

CONFLICT (content): Merge conflict in clang/test/OpenMP/remarks_parallel_in_target_state_machine.c CONFLICT (content): Merge conflict in clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c

Defined input/output procedures are specified in 12.6.4.8. There are different versions for read versus write and formatted versus unformatted, but they all share the same basic set of dummy arguments. I added several checking functions to check-declarations.cpp along with a test. In the process of implementing this, I noticed and fixed a typo in .../lib/Evaluate/characteristics.cpp. Differential Revision: https://reviews.llvm.org/D103045

This provides a sizable compile time improvement by seeding the worklist in an order that leads to less iterations of the worklist. This patch only changes the behavior of the Canonicalize pass itself, it does not affect other passes that use the GreedyPatternRewrite driver Differential Revision: https://reviews.llvm.org/D103053

If the nested create_directory call fails, we'd still want to re-report the errors with the create_directories function name, which is what the caller called. This fixes one aspect from MS STL's tests for std::filesystem. Differential Revision: https://reviews.llvm.org/D102365

…D_NEGATORS. This also provides some of the scaffolding needed by D102992 and D101729, and mops up after D101730 etc. Differential Revision: https://reviews.llvm.org/D103055

…s exited on 1st iteration" This reverts commit 2531fd7 due to performance regression on the PPC buildbot.

…t has more than one member Beside the `comdat any` deduplication feature, instrumentations use comdat to establish dependencies among a group of sections, to prevent section based linker garbage collection from discarding some members without discarding all. LangRef acknowledges this usage with the following wording: > All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key. On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated `__llvm_prf_data` section are placed in the same GRP_COMDAT group. A `__llvm_prf_data` is usually not referenced and expects the liveness of its associated `__llvm_prf_cnts` to retain it. The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained). The main goal of this patch is to fix the dependency relationship. I think it makes sense for InternalizePass to internalize a comdat and thus suppress the deduplication feature, e.g. a relocatable link of a regular LTO can create an object file affected by InternalizePass. If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the a.o references to the comdat definitions will be non-resolvable (references cannot bind to STB_LOCAL definitions in b.o). On PE-COFF, for a non-external selection symbol, deduplication is naturally suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend to believe the spec creator has not thought about this use case (see D102973). GNU ld and gold are still using the "signature is name based" interpretation. So even if D102973 for ld.lld is accepted, for portability, a better approach is to rename the comdat. A comdat with one single member is the common case, leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by deleting the comdat; otherwise we rename the comdat. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103043

[libomptarget][nfc] Move hostcall required test to rtl Remove a global, fix minor race. First of N patches to bring up hostcall. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103058

…ies. NFC. MLIRContext holds a few special case values that occur frequently like empty dictionary and NoneType, which allow us to avoid taking locks to get an instance of them. Give the empty StringAttr this treatment as well. This cuts several percent off compile time for CIRCT. Differential Revision: https://reviews.llvm.org/D103117

Reviewed By: cryptoad Differential Revision: https://reviews.llvm.org/D103122

… linalg.index Indexed Generic should be going away in the future. Migrate to linalg.index. Reviewed By: NatashaKnk, nicolasvasilache Differential Revision: https://reviews.llvm.org/D103110

When building with Clang 11 on Windows, silence the following: [432/5643] Building C object projects\compiler-rt\lib\profile\CMakeFiles\clang_rt.profile-x86_64.dir\GCDAProfiling.c.obj F:\aganea\llvm-project\compiler-rt\lib\profile\GCDAProfiling.c(464,13): warning: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Wsign-compare] if (val != (gcov_version >= 90 ? GCOV_TAG_OBJECT_SUMMARY ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated.

When building with Clang 11 on Windows, silence the following: F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(955,8): warning: 'Run' overrides a member function but is not marked 'override' [-Wsuggest-override] void Run(State& st); ^ F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(895,16): note: overridden virtual function is here virtual void Run(State& state) = 0; ^ 1 warning generated.

…tect-stack-use-after-return-mode. Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always. for issue: google/sanitizers#1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102462

The generic approach can still be used by musl and FreeBSD. Note: on glibc 2.31, TLS_PRE_TCB_SIZE is 0x700, larger than ThreadDescriptorSize() by 16, but this is benign: as long as the range includes pthread::{specific_1stblock,specific} pthread_setspecific will not cause false positives. Note: the state before afec953 underestimated the TLS size a lot (nearly ThreadDescriptorSize() = 1776). That may explain why afec953 actually made some tests pass.

Since the opaque pointer type won't contain the pointee type, we need to separately encode the value type for an atomicrmw. Emit this new code for atomicrmw. Handle this new code and the old one in the bitcode reader. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D103123

When the lower type test pass is invoked a second time with DropTypeTests set to true, it expects that all remaining type tests feed assume instructions, which are removed along with the type tests. In some cases the llvm.assume might have been merged with another one, i.e. from a builtin_assume instruction, in which case the type test would actually feed a phi that in turn feeds the merged assume instruction. In this case we can simply replace that operand of the phi with "true" before removing the type test. Differential Revision: https://reviews.llvm.org/D103073

Pass LLVM_PRETTY_FUNCTION directly for the no-argument macro.

Right after pushing, I remembered that this was added to silence a GCC warning (https://reviews.llvm.org/D99120). This reverts my patch and adds a comment.

FullTy is only necessary when we need to figure out what type an instruction works with given a pointer's pointee type. However, we just end up using the value operand's type, so FullTy isn't necessary. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D102788

Implements part of P0896 'The One Ranges Proposal'. Implements [range.iter.op.advance]. Differential Revision: https://reviews.llvm.org/D101922

…st (intel#3801) When compiling with the integration footer enabled, we want to be sure that the integration header is only pulled in once (during the preprocessing step) and not during the full compilation of the source w/ appended footer.

Compiler generated ITT annotations may interfere with debugging the instrumented user code even in non-ITT mode (when ITT is disabled via the specialization constant). This change marks the wrappers used by the compiler generated code with always_inline so that device compiler are able to get rid of the extra code in non-ITT mode. Signed-off-by: Vyacheslav Zakharin <vyacheslav.p.zakharin@intel.com>

…pulldown

…v_pulldown

RKSimon and others added 30 commits May 24, 2021 09:48

Revert "[VectorCombine] Scalarize vector load/extract."

94d5415

This reverts commit 8649778. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio

flang: include limits

0f140ce

[LoopIdiom] 'logical right shift until zero': the value must be loop-…

aa3dac9

…invariant As per the reproducer provided by Mikael Holmén in post-commit review.

[NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is lo…

32bee42

…op-invariant Given that BaseX is an incoming value when coming from the preheader, it *should* be loop-invariant, but let's just document this assumption.

[MLIR] Drop old cmake var names

587408c

Drop old cmake variable names that were kept around so that zorg buildbot could be migrated, which has now happened (D102977). D102976 had fixed the inconsistent names. Differential Revision: https://reviews.llvm.org/D102997

Recommit "[VectorCombine] Scalarize vector load/extract."

4e8c28b

This reverts commit 94d5415. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.

Fix parameterized PI CUDA unittests with empty parameter

0621c56

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

Merge remote-tracking branch 'intel_llvm/sycl' into llvmspirv_pulldown

0365bd4

[OpenCL] Add clang extension for bit-fields.

237c692

Allow use of bit-fields as a clang extension in OpenCL. The extension can be enabled using pragma directives. This fixes PR45339! Differential Revision: https://reviews.llvm.org/D101843

[AArch64][SVE] Improve codegen for fixed length vector concat

4bc14be

Differential Revision: https://reviews.llvm.org/D102498

[AArch64][SVE] Add fixed length codegen for FP_ROUND/FP_EXTEND

e405132

Depends on D102498 Differential Revision: https://reviews.llvm.org/D102607

[OpenCL] Fix test by adding SPIR triple

626e964

[VPlan] Add mayReadOrWriteMemory & friends.

e9d97d7

This patch adds initial implementation of mayReadOrWriteMemory, mayReadFromMemory and mayWriteToMemory to VPRecipeBase. Used by D100258.

Merge from 'sycl' to 'sycl-web'

bdac370

[OpenCL][Docs] Minor update to OpenCL 3.0

5ccc79d

Merge from 'main' to 'sycl-web' (#2)

21f3f75

CONFLICT (content): Merge conflict in clang/test/Misc/nvptx.languageOptsOpenCL.cl

[lldb] Readd deleted variable in the sample test

5d7c1d8

In D102771 wanted to make `test_var` global to demonstrate the a no-launch test, but the old variable is still needed for another test. This just creates the global var with a different name to demonstrate the no-launch functionality.

[ConstProp] add tests for vector reductions with poison elements; NFC

3dd2063

[ConstProp] propagate poison from vector reduction element(s) to result

a0e71f1

This follows from the underlying logic for binops and min/max. Although it does not appear that we handle this for min/max intrinsics currently. https://alive2.llvm.org/ce/z/Kq9Xnh

asudarsa and others added 30 commits May 25, 2021 13:26

Merge from 'sycl' to 'sycl-web' (#1)

5e39487

CONFLICT (content): Merge conflict in clang/test/OpenMP/remarks_parallel_in_target_state_machine.c CONFLICT (content): Merge conflict in clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c

[libc++] [P0619] Hide not1 and not2 under _LIBCPP_ENABLE_CXX20_REMOVE…

d42d9e1

…D_NEGATORS. This also provides some of the scaffolding needed by D102992 and D101729, and mops up after D101730 etc. Differential Revision: https://reviews.llvm.org/D103055

Revert "[LoopDeletion] Break backedge if we can prove that the loop i…

832c99f

…s exited on 1st iteration" This reverts commit 2531fd7 due to performance regression on the PPC buildbot.

[ARM] Extra predicated tests for VMULH. NFC

8cc437a

[libc++] Install GCC 11 on CI builders

66781ef

[libomptarget][nfc] Move hostcall required test to rtl

df005fa

[libomptarget][nfc] Move hostcall required test to rtl Remove a global, fix minor race. First of N patches to bring up hostcall. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103058

[Toy] Update tests to pass with top-down canonicalize pass. NFC

a6a57f0

[NFC][SCUDO] Fix unittest for -gtest_repeat=10

e14696b

Reviewed By: cryptoad Differential Revision: https://reviews.llvm.org/D103122

[NFC][MLIR][TOSA] Replaced tosa linalg.indexed_generic lowerings with…

e5d227e

… linalg.index Indexed Generic should be going away in the future. Migrate to linalg.index. Reviewed By: NatashaKnk, nicolasvasilache Differential Revision: https://reviews.llvm.org/D103110

[lldb] Avoid format string in LLDB_SCOPED_TIMER

bbcb343

Pass LLVM_PRETTY_FUNCTION directly for the no-argument macro.

Revert "[lldb] Avoid format string in LLDB_SCOPED_TIMER"

564eb20

Right after pushing, I remembered that this was added to silence a GCC warning (https://reviews.llvm.org/D99120). This reverts my patch and adds a comment.

[libcxx][iterator] adds std::ranges::advance

36d0fdf

Implements part of P0896 'The One Ranges Proposal'. Implements [range.iter.op.advance]. Differential Revision: https://reviews.llvm.org/D101922

[gn build] Port 36d0fdf

dde1239

[clang-format][NFC] correctly sort StatementAttributeLike-macros' IO.map

9ef66ed

Merge remote-tracking branch 'otcshare_llvm/sycl-web' into llvmspirv_…

195b121

…pulldown

Merge commit '9ef66ed43758a575e1f53a09f07ecb7e3025aafa' into llvmspir…

1b08ff2

…v_pulldown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM and SPIRV-LLVM-Translator pulldown (WW22)#6

LLVM and SPIRV-LLVM-Translator pulldown (WW22)#6
bb-sycl wants to merge 3164 commits into
syclfrom
llvmspirv_pulldown

bb-sycl commented May 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

bb-sycl commented May 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants