[quantization] Quantization-Aware Shape Operation Folding

From https://github.com/Samsung/TICO/pull/491


> 
> ### Constant Folding Logic
> 
> To note, below is the ConstProp pass's core logic.
> ```py
>         # propagate constant because all of its args are constant tensors.
>         with torch.no_grad():
>             prop_constant_tensor = node.target(*args_data, **kwargs_data)
>         const_node_to_tensor[node] = prop_constant_tensor 
> ```
> 
> 
> ### Quantized Operation's Constant Folding
> 
> - reshape.default(quantized_a, shape) : It is folded well.
> - add.Tensor(quantized_a, quantized_b) # ERR! : If operation includes calculation, it makes an error as below.
> 
> ```py
> import torch
> 
> input_a = torch.quantize_per_tensor(torch.tensor([1.0, 2.0], dtype=torch.float32), 0.1, 10, torch.qint8)
> input_b = torch.quantize_per_tensor(torch.tensor([3.0, 4.0], dtype=torch.float32), 0.1, 10, torch.qint8)
> 
> target_op_add = torch.ops.aten.add.Tensor
> target_op_reshape = torch.ops.aten.reshape.default
> 
> try:
>     with torch.no_grad():
>         result = target_op_add(input_a, input_b) # ERROR!
>         # Error: Could not run 'aten::add.out' with arguments from the 'QuantizedCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions.
>         result = target_op_reshape(input_a, torch.tensor([1, 1, 2], dtype=torch.int32)) # PASSES
>         print(f"Result Scale: {result.q_scale()}, ZP: {result.q_zero_point()}") 
> except Exception as e:
>     print(f"Error: {e}")
> ```
> 
> Therefore, the problematic case will be caught by this exception. 
> I believe we could implement const props with only allowed shape-relevant operators.

_Originally posted by @dayo09 in https://github.com/Samsung/TICO/pull/491#discussion_r2826013666_
            

### What?

Let's fold shape operators(reshape, view, permute, transpose, ...) with full consideration of Channel-wise quantization!


### Why it is needed? 

When "Quantization Boundary Break" occurs due to some shape-related operators are generated after composite operation's decomposition, Circle is compiled with weird pattern `(int,int) => Conv2d => (float)` (Why has it not validated in TICO, though? 🤔) linke: https://github.com/Samsung/TICO/pull/491#issuecomment-3895557028

Since these are "shape-only" operations, they can be pre-calculated at compile-time to simplify the graph and improve inference performance.

### How?

- precondition: Circle IR doesn't restrict channel-wise quantization axis to certain number, it offers the field. 
  - scale/zp can be aligned with the axis. We should track if the axis is kept well - not merged, not splitted.

We need to extend ConstPropPass to handle aten.reshape, aten.transpose, and aten.view for quantized tensors by synchronizing the quantization metadata.

1. Transpose / Permute Logic
 * Metadata Update: Update the quantized_dimension (axis) to reflect the new position of the channel dimension.
2. Reshape / View / StridedSlice / Concat  Logic
 * Validation: * If the channel dimension remains "independent" (not merged with other dimensions), simply update the axis index.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Quantization-Aware Shape Operation Folding #499

Constant Folding Logic

Quantized Operation's Constant Folding

What?

Why it is needed?

How?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[quantization] Quantization-Aware Shape Operation Folding #499

Description

Constant Folding Logic

Quantized Operation's Constant Folding

What?

Why it is needed?

How?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions