Skip to content

Conversation

@Hardcode84
Copy link
Contributor

@Hardcode84 Hardcode84 commented Dec 28, 2025

Decompose memrefs early, materializing index calculations, llvm GEPs and llvm loads/stores explicitly. This makes index calculations more amendable for integer range and (future) uniformity analyses. Add an integer narrowing pass to the pipleine to see the reduced count of VGPRs on mxfp gemm.

Some operations with non-trivial lowering (buffer casts and amdgpu.gather-to-lds) are kept in memref land but converted to 0D memrefs, to expose indexing as well.

This pass is basically an alternative memref-to-llvm-lowering and it is compatible with the existing upstream one as for operations not lowered it will generate MemrefDescriptor shims.

  • What's wrong with the default upstream memref-to-llvm lowering? Default lowering uses MemrefDescriptors, hiding all index calculations.
  • Why not use 1D/0D memrefs? I want * sizeof(T) part of index calculations to be materialized explicitly. Early POC was using memref<?xi8> and memref.view but it required a lot of memref casts back and forth.
  • Why not ptr dialect? I had a ptr dialect POC as well but ptr dialect is currently incomplete and also at this point it is a carbon copy of llvm dialect and provides no useful abstraction.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
This reverts commit 81d9747.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
This reverts commit 548b353.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
@Hardcode84 Hardcode84 changed the title Decompose memref early Decompose memrefs early Dec 28, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an early memref decomposition pass that explicitly materializes index calculations, LLVM GEPs, and loads/stores. This enables better integer range and uniformity analyses by exposing index arithmetic early in the compilation pipeline. The changes demonstrate a tangible benefit with VGPR count reduction from 160 to 140 in the mxfp gemm test case.

Key changes:

  • Adds water-memref-decomposition pass that converts multi-dimensional memref operations to explicit pointer arithmetic with affine maps
  • Integrates integer narrowing pass (arith-int-range-narrowing) into the compilation pipeline to leverage exposed index calculations
  • Reorders pipeline to run memref decomposition before affine lowering, enabling better optimization opportunities

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
wave_lang/kernel/wave/water.py Updates compilation pipeline to add memref decomposition before affine lowering and integrates integer range optimizations
water/tools/water-opt/water-opt.cpp Registers the arithmetic integer range narrowing pass
water/test/Transforms/memref-decomposition.mlir Comprehensive test suite covering load/store, vector operations, reinterpret_cast, and AMD GPU-specific operations
water/lib/Transforms/MemrefDecomposition.cpp Core implementation of memref decomposition pass with type converter and pattern rewriters
water/lib/Transforms/CMakeLists.txt Adds new source file and required dependencies (AMDGPU, SCFTransforms)
water/include/water/Transforms/Passes.td Defines the new pass with documentation and dialect dependencies
tests/kernel/wave_gemm_mxfp_test.py Updates expected VGPR count from 160 to 140 and adjusts waitcount expectations reflecting optimization impact

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

using namespace mlir;

namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there is no point in having static functions inside an anonymous namespace. LLVM style says to prefer static functions and only use namespaces for classes.


namespace {

static Value getValue(OpBuilder &rewriter, Location loc, OpFoldResult in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation to all top-level entities.


static SmallVector<Value> getValues(OpBuilder &rewriter, Location loc,
ArrayRef<OpFoldResult> in) {
SmallVector<Value> result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: reserve before pushing back in a loop.

Comment on lines +51 to +57
static SmallVector<Value> flatten(ArrayRef<ValueRange> values) {
SmallVector<Value> result;
for (ValueRange value : values)
llvm::append_range(result, value);

return result;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would llvm::concat<Value> work instead? It avoids allocation/copy.

Comment on lines +60 to +61
static std::tuple<LogicalResult, Value, SmallVector<OpFoldResult>,
SmallVector<OpFoldResult>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's strange to have LogicalResult as part of the tuple instead of FailureOr. I'd consider having vectors as SmallVectorImpl & operands and using null Value as error marker. Multi-element tuples tend to be unreadable at callsites.

}

/// Generate a GEP op with the given buffer and byte offset.
static Value GEP(OpBuilder &builder, Location loc, Value buffer, Value offset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: createGEP would not break the naming style guide, and it is generally nicer to indicate when some IR may be created.

/// adjusted pointer.
static Value getFlattenMemref(OpBuilder &rewriter, Location loc, Value source,
Type loadType, ArrayRef<OpFoldResult> sizes,
unsigned typeBit, ArrayRef<OpFoldResult> strides,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is typeBit? Is it the bitwidth of the elemental type in bits (it should be named accordingly!). Why in bits? What happens if it is not divisible by 8?

zero, sizes, strides,
getAsOpFoldResult(indices));

AffineExpr mul = rewriter.getAffineSymbolExpr(0) * (typeBit / 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can this use some more descriptive name?


unsigned alignment = loadOp.getAlignment().value_or(0);
rewriter.replaceOpWithNewOp<LLVM::LoadOp>(loadOp, loadType, ptr, alignment,
/*volatile_*/ false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can't we just carry over the volatile flag?


unsigned alignment = storeOp.getAlignment().value_or(0);
rewriter.replaceOpWithNewOp<LLVM::StoreOp>(storeOp, valueToStore, ptr,
alignment, /*volatile_*/ false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants