Problems encountered in compiling the F16 precision Qwen3-32B model using 'buddy-mlir'

##  **Describe the bug**
###  bug 1
When using buddy mlir to compile the f16 precision of Qwen3-32B model, in the generated subgrap0uprefill_32b.mdir file, vector.transfer-read will return a matrix<16xf32>when reading data. Causing the fma operator to pass in different matrix types。
The error is as follows:
```bash
buddy-mlir/build/examples/BuddyQwen32/subgraph0_decode_32b.mlir:306:24: error: 'vector.fma' op failed to verify that all of {lhs, rhs, acc, result} have same type
              %18686 = "vector.fma"(%18684, %18685, %arg4722) : (vector<16xf32>, vector<16xf32>, vector<16xf16>) -> vector<16xf16>
```
The reason is that in the `frontend/Python/ops/tosa.py` file, the `vector.transfer.rad` operation was processed, causing the type of the read matrix to change to f32. I am not sure why the f32 type is fixed here.After modifying the fixed f32, it can compile normally. The modification method is as follows:
```bash
diff --git a/frontend/Python/ops/tosa.py b/frontend/Python/ops/tosa.py
index b11d57d..cdaba5f 100644
--- a/frontend/Python/ops/tosa.py
+++ b/frontend/Python/ops/tosa.py
@@ -3844,7 +3844,10 @@ def flash_attention_for_cpu_prefill_op(
     f32 = F32Type.get()
     dtype_qkv = node.tensor_meta["dtype"][0]
     dtype_qkv = mlir_element_type_get(dtype_qkv)
-    dtype = f32
+    # Use the same dtype as Q/K/V to keep IR type-consistent across precisions
+    # (e.g. f16 models). 
+    
+    dtype = dtype_qkv
     vector_width = 16
     v16 = ir.VectorType.get([vector_width], dtype)
     v16_qkv = ir.VectorType.get([vector_width], dtype_qkv)
@@ -11313,13 +11316,13 @@ def gqa_attention_fused_op(node: GQAAttentionFusedOp, symbol_table):
                     with ir.InsertionPoint(vec_loop.body):
                         d = vec_loop.induction_variable
                         va = vec_loop.inner_iter_args[0]
-                        vec_ty = ir.VectorType.get([16], f32)
+                        vec_ty = ir.VectorType.get([16], mlir_dtype)
                         perm_map = ir.AffineMap.get(
                             4, 0, [ir.AffineDimExpr.get(3)]
                         )
                         qv = vector.TransferReadOp(
-                            vec_ty,  # vector<16xf32>
-                            query,  # tensor<1x12x1x128xf32>
+                            vec_ty,
+                            query,
                             [b, h, q, d],  # indices
                             perm_map,  # (d0,d1,d2,d3)->(d3)
                             zero,  # padding
@@ -11327,8 +11330,8 @@ def gqa_attention_fused_op(node: GQAAttentionFusedOp, symbol_table):
                             loc=loc,
                         ).result
                         kv = vector.TransferReadOp(
-                            vec_ty,  # vector<16xf32>
-                            k_cache,  # tensor<1x2x1024x128xf32>
+                            vec_ty,
+                            k_cache,
                             [b, h_kv, k, d],  # indices
                             perm_map,  # (d0,d1,d2,d3)->(d3)
                             zero,  # padding
@@ -11340,7 +11343,7 @@ def gqa_attention_fused_op(node: GQAAttentionFusedOp, symbol_table):
                         scf.YieldOp([va1.result])
 
                     red = vector.ReductionOp(
-                        f32, "add", vec_loop.result, loc=loc
+                        mlir_dtype, "add", vec_loop.result, loc=loc
                     ).result
 
                     acc = arith.AddFOp(prev.result, red)
@@ -11434,9 +11437,9 @@ def gqa_attention_fused_op(node: GQAAttentionFusedOp, symbol_table):
                         ).result
 
                         pv = vector.SplatOp(
-                            ir.VectorType.get([16], f32), p, loc=loc
+                            ir.VectorType.get([16], mlir_dtype), p, loc=loc
                         ).result
-                        vec_ty = ir.VectorType.get([16], f32)
+                        vec_ty = ir.VectorType.get([16], mlir_dtype)
                         perm_map = ir.AffineMap.get(
                             4, 0, [ir.AffineDimExpr.get(3)]
                         )

```
### bug 2
After compilation, running the program may result in segment errors during prefill

## **To Reproduce**
Create a BuddyQwen32 folder in the buddy mlir/examples path, place the files in the attachment, and modify the CMakeLists.txt file in the same path. Add the following code:
```
if(BUDDY_QWEN32_EXAMPLES)
  add_subdirectory(BuddyQwen32)
endif()
```
Then follow the steps in the BuddyQwen32/README.md file to compile.

[buddy-qwen3-32b-main.cpp](https://github.com/user-attachments/files/25257145/buddy-qwen3-32b-main.cpp)
[CMakeLists.txt](https://github.com/user-attachments/files/25257146/CMakeLists.txt)
[import-qwen3.py](https://github.com/user-attachments/files/25257143/import-qwen3.py)
[README.md](https://github.com/user-attachments/files/25257144/README.md)
[vocab.txt](https://github.com/user-attachments/files/25257142/vocab.txt)

## **Expected behavior**
Expected to compile and run F16 precision models normally

## **Screenshots**
<img width="1486" height="179" alt="Image" src="https://github.com/user-attachments/assets/4a6103b7-9fc2-4cf7-a97b-e7ea619f3754" />

## **Desktop (please complete the following information):**
 - OS: [Ubuntu]
 - Version [24.04.3 LTS (Noble Numbat)]

## **Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems encountered in compiling the F16 precision Qwen3-32B model using 'buddy-mlir' #694

Describe the bug

bug 1

bug 2

To Reproduce

Expected behavior

Screenshots

Desktop (please complete the following information):

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems encountered in compiling the F16 precision Qwen3-32B model using 'buddy-mlir' #694

Description

Describe the bug

bug 1

bug 2

To Reproduce

Expected behavior

Screenshots

Desktop (please complete the following information):

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions