I have some questions. Previously, I quantized some models using torch.fx, and both the size and speed of the models changed. However, I noticed that these tools are not used in academia. So, I used your code to learn how to quantize a model in an academic context, but it seems that there is no change in the size and speed of the model, and the stored data remains unchanged as well. Is this normal? And is it not necessary to fuse operators or similar elements?
I have some questions. Previously, I quantized some models using torch.fx, and both the size and speed of the models changed. However, I noticed that these tools are not used in academia. So, I used your code to learn how to quantize a model in an academic context, but it seems that there is no change in the size and speed of the model, and the stored data remains unchanged as well. Is this normal? And is it not necessary to fuse operators or similar elements?