Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW22)#6

Draft
bb-sycl wants to merge 3164 commits into
syclfrom
llvmspirv_pulldown
Draft

LLVM and SPIRV-LLVM-Translator pulldown (WW22)#6
bb-sycl wants to merge 3164 commits into
syclfrom
llvmspirv_pulldown

Conversation

@bb-sycl

@bb-sycl bb-sycl commented May 26, 2021

Copy link
Copy Markdown
Collaborator

RKSimon and others added 30 commits May 24, 2021 09:48
…argets

By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.
…invariant

As per the reproducer provided by Mikael Holmén in post-commit review.
…op-invariant

Given that BaseX is an incoming value when coming from the preheader,
it *should* be loop-invariant, but let's just document this assumption.
The removed code just replicated what use_llvm_tool does, plus looked
for an installed LLDB on the PATH to use. In a monorepo world, it seems
likely that if people want to run the tests that require LLDB, they
should enable and build LLDB itself. If users really want to use the
installed LLDB executable, they can specify the path to the executable
as an environment variable "LLDB".

See the discussion in https://reviews.llvm.org/D95339#2638619 for
more details.

Reviewed by: jmorse, aprantl

Differential Revision: https://reviews.llvm.org/D102680
RVV code generation does not successfully custom-lower BUILD_VECTOR in all
cases. When it resorts to default expansion it may, on occasion, be expanded to
scalar stores through the stack. Unfortunately these stores may then be picked
up by the post-legalization DAGCombiner which merges them again. The merged
store uses a BUILD_VECTOR which is then expanded, and so on.

This patch addresses the issue by overriding the `mergeStoresAfterLegalization`
hook. A lack of granularity in this method (being passed the scalar type) means
we opt out in almost all cases when RVV fixed-length vector support is enabled.
The only exception to this rule are mask vectors, which are always either
custom-lowered or are expanded to a load from a constant pool.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102913
Drop old cmake variable names that were kept around so that zorg
buildbot could be migrated, which has now happened (D102977). D102976
had fixed the inconsistent names.

Differential Revision: https://reviews.llvm.org/D102997
The trip count for a memcpy/memset will be n/16 rounded up to the
nearest integer. So (n+15)>>4. The old code was including a BIC too, to
clear one of the bits, which does not seem correct. This remove the
extra BIC.

Note that ideally this would never actually be generated, as in the
creation of a tail predicated loop we will DCE that setup code, letting
the WLSTP perform the trip count calculation. So this doesn't usually
come up in testing (and apparently the ARMLowOverheadLoops pass does not
do any sort of validation on the tripcount). Only if the generation of
the WLTP fails will it use the incorrect BIC instructions.

Differential Revision: https://reviews.llvm.org/D102629
This makes sure that the blocks created for lowering memcpy to loops end
up with branches, even if they fall through to the successor. Otherwise
IfCvt is getting confused with unanalyzable branches and creating
invalid block layouts.

The extra branches should be removed as the tail predicated loop is
finalized in almost all cases.
This reverts commit 94d5415.

This fixes a sanitizer failure by moving scalarizeLoadExtract(I)
before foldSingleElementStore(I), which may remove instructions.
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
…ccessors

The findLoopPreheader function will currently not find a preheader if it
branches to multiple different loop headers. This patch adds an option
to relax that, allowing ARMLowOverheadLoops to process more loops
successfully. This helps with WhileLoopStart setup instructions that can
branch/fallthrough to the low overhead loop and to branch to a separate
loop from the same preheader (but I don't believe it is possible for
both loops to be low overhead loops).

Differential Revision: https://reviews.llvm.org/D102747
Allow use of bit-fields as a clang extension
in OpenCL. The extension can be enabled using
pragma directives.

This fixes PR45339!

Differential Revision: https://reviews.llvm.org/D101843
This patch adds initial implementation of mayReadOrWriteMemory,
mayReadFromMemory and mayWriteToMemory to VPRecipeBase.

Used by D100258.
The input IR for @load_extract_idx_var_i64_known_valid_by_assume
and @load_extract_idx_var_i64_not_known_valid_by_assume_after_load
has been swapped.

This patch fixes the test so that @load_extract_idx_var_i64_known_valid_by_assume
has the assume before the load and the other test has it after.
…n XOP/AVX2 targets

By llvm-mca analysis, Haswell/Broadwell has a non-uniform vector shift recip-throughput cost of the AVX2 targets at 2 for both 128 and 256-bit vectors - XOP capable targets have better 128-bit vector shifts so improve the fallback in those cases.
This relands part of the UB fix in 4b074b4.
The original commit also added some additional tests that uncovered some
other issues (see D102845). I landed all the passing tests in
4878052 and this patch is now just fixing
the UB in half2float. See D102846 for a proposed rewrite of the function.

Original commit message:

  The added DumpDataExtractorTest uncovered that this is lshifting a negative
  integer which upsets ubsan and breaks the sanitizer bot. This patch just
  changes the variable we shift to be unsigned.
  CONFLICT (content): Merge conflict in clang/test/Misc/nvptx.languageOptsOpenCL.cl
…tests

At the moment nearly every test calls something similar to
`self.dbg.CreateTarget(self.getBuildArtifact("a.out"))` and them sometimes
checks if the created target is actually valid with something like
`self.assertTrue(target.IsValid(), "some useless text")`.

Beside being really verbose the error messages generated by this pattern are
always just indicating that the target failed to be created but now why.

This patch introduces a helper function `createTestTarget` to our Test class
that creates the target with the much more verbose `CreateTarget` overload that
gives us back an SBError (with a fancy error). If the target couldn't be created
the function prints out the SBError that LLDB returned and asserts for us. It
also defaults to the "a.out" build artifact path that nearly all tests are using
to avoid to hardcode "a.out" in every test.

I converted a bunch of tests to the new function but I'll do the rest of the
test suite as follow ups.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D102771
In D102771 wanted to make `test_var` global to demonstrate the a no-launch test,
but the old variable is still needed for another test. This just creates the
global var with a different name to demonstrate the no-launch functionality.
This patch adds a first VPlan-based implementation of sinking of scalar
operands.

The current version traverse a VPlan once and processes all operands of
a predicated REPLICATE recipe. If one of those operands can be sunk,
it is moved to the block containing the predicated REPLICATE recipe.
Continue with processing the operands of the sunk recipe.

The initial version does not re-process candidates after other recipes
have been sunk. It also cannot partially sink induction increments at
the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the
induction is used for example in a GEP, only the first lane is used and
in the lowered IR the adds for the other lanes can be sunk into the
predicated blocks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100258
This follows from the underlying logic for binops and min/max.
Although it does not appear that we handle this for min/max
intrinsics currently.
https://alive2.llvm.org/ce/z/Kq9Xnh
asudarsa and others added 30 commits May 25, 2021 13:26
  CONFLICT (content): Merge conflict in clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  CONFLICT (content): Merge conflict in clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
Defined input/output procedures are specified in 12.6.4.8.  There are different
versions for read versus write and formatted versus unformatted, but they all
share the same basic set of dummy arguments.

I added several checking functions to check-declarations.cpp along with a test.

In the process of implementing this, I noticed and fixed a typo in
.../lib/Evaluate/characteristics.cpp.

Differential Revision: https://reviews.llvm.org/D103045
This provides a sizable compile time improvement by seeding
the worklist in an order that leads to less iterations of the
worklist.

This patch only changes the behavior of the Canonicalize pass
itself, it does not affect other passes that use the
GreedyPatternRewrite driver

Differential Revision: https://reviews.llvm.org/D103053
If the nested create_directory call fails, we'd still want to
re-report the errors with the create_directories function name,
which is what the caller called.

This fixes one aspect from MS STL's tests for std::filesystem.

Differential Revision: https://reviews.llvm.org/D102365
…D_NEGATORS.

This also provides some of the scaffolding needed by D102992 and D101729, and mops up after D101730 etc.

Differential Revision: https://reviews.llvm.org/D103055
…s exited on 1st iteration"

This reverts commit 2531fd7 due to
performance regression on the PPC buildbot.
…t has more than one member

Beside the `comdat any` deduplication feature, instrumentations use comdat to
establish dependencies among a group of sections, to prevent section based
linker garbage collection from discarding some members without discarding all.
LangRef acknowledges this usage with the following wording:

> All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key.

On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated
`__llvm_prf_data` section are placed in the same GRP_COMDAT group.  A
`__llvm_prf_data` is usually not referenced and expects the liveness of its
associated `__llvm_prf_cnts` to retain it.

The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the
use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained).
The main goal of this patch is to fix the dependency relationship.

I think it makes sense for InternalizePass to internalize a comdat and thus
suppress the deduplication feature, e.g. a relocatable link of a regular LTO can
create an object file affected by InternalizePass.
If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the
a.o references to the comdat definitions will be non-resolvable (references
cannot bind to STB_LOCAL definitions in b.o).

On PE-COFF, for a non-external selection symbol, deduplication is naturally
suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend
to believe the spec creator has not thought about this use case (see D102973).

GNU ld and gold are still using the "signature is name based" interpretation.
So even if D102973 for ld.lld is accepted, for portability, a better approach is
to rename the comdat. A comdat with one single member is the common case,
leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by
deleting the comdat; otherwise we rename the comdat.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103043
[libomptarget][nfc] Move hostcall required test to rtl

Remove a global, fix minor race. First of N patches to bring up hostcall.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103058
…ies. NFC.

MLIRContext holds a few special case values that occur frequently like empty
dictionary and NoneType, which allow us to avoid taking locks to get an instance
of them.  Give the empty StringAttr this treatment as well.  This cuts several
percent off compile time for CIRCT.

Differential Revision: https://reviews.llvm.org/D103117
… linalg.index

Indexed Generic should be going away in the future. Migrate to linalg.index.

Reviewed By: NatashaKnk, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D103110
When building with Clang 11 on Windows, silence the following:

[432/5643] Building C object projects\compiler-rt\lib\profile\CMakeFiles\clang_rt.profile-x86_64.dir\GCDAProfiling.c.obj
F:\aganea\llvm-project\compiler-rt\lib\profile\GCDAProfiling.c(464,13): warning: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Wsign-compare]
    if (val != (gcov_version >= 90 ? GCOV_TAG_OBJECT_SUMMARY
        ~~~ ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
When building with Clang 11 on Windows, silence the following:

F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(955,8): warning: 'Run' overrides a member function but is not marked 'override' [-Wsuggest-override]
  void Run(State& st);
       ^
F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(895,16): note: overridden virtual function is here
  virtual void Run(State& state) = 0;
               ^
1 warning generated.
…tect-stack-use-after-return-mode.

Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always.

for issue: google/sanitizers#1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102462
The generic approach can still be used by musl and FreeBSD. Note: on glibc
2.31, TLS_PRE_TCB_SIZE is 0x700, larger than ThreadDescriptorSize() by 16, but
this is benign: as long as the range includes pthread::{specific_1stblock,specific}
pthread_setspecific will not cause false positives.

Note: the state before afec953 underestimated
the TLS size a lot (nearly ThreadDescriptorSize() = 1776).
That may explain why afec953 actually made some
tests pass.
Since the opaque pointer type won't contain the pointee type, we need to
separately encode the value type for an atomicrmw.

Emit this new code for atomicrmw.

Handle this new code and the old one in the bitcode reader.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D103123
When the lower type test pass is invoked a second time with
DropTypeTests set to true, it expects that all remaining type tests feed
assume instructions, which are removed along with the type tests.

In some cases the llvm.assume might have been merged with another one,
i.e. from a builtin_assume instruction, in which case the type test
would actually feed a phi that in turn feeds the merged assume
instruction. In this case we can simply replace that operand of the phi
with "true" before removing the type test.

Differential Revision: https://reviews.llvm.org/D103073
Pass LLVM_PRETTY_FUNCTION directly for the no-argument macro.
Right after pushing, I remembered that this was added to silence a GCC
warning (https://reviews.llvm.org/D99120). This reverts my patch and
adds a comment.
FullTy is only necessary when we need to figure out what type an
instruction works with given a pointer's pointee type. However, we just
end up using the value operand's type, so FullTy isn't necessary.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102788
Implements part of P0896 'The One Ranges Proposal'.
Implements [range.iter.op.advance].

Differential Revision: https://reviews.llvm.org/D101922
…st (intel#3801)

When compiling with the integration footer enabled, we want to be sure
that the integration header is only pulled in once (during the preprocessing
step) and not during the full compilation of the source w/ appended
footer.
Compiler generated ITT annotations may interfere with debugging
the instrumented user code even in non-ITT mode (when ITT is disabled
via the specialization constant). This change marks the wrappers
used by the compiler generated code with always_inline so that
device compiler are able to get rid of the extra code in non-ITT mode.

Signed-off-by: Vyacheslav Zakharin <vyacheslav.p.zakharin@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.