End-to-end inference doesn't be accelerated.

Hi, thanks for your excellent work! 
I'm quite interested in your approach to speedup ViT's throughput. However, when I implement ViT-B end-to-end inference (including data Input, preprocessing, and model inference), the processing time is the same whether using ToMe or not. I even tried using different batch_size to fill the GPU memory, but the results are still the same. 
Here's the result:
    - device: each row using a RTX3090 GPU
    - dataset: ImageNet-1k validation set
<img width="984" alt="end-to-end_result" src="https://github.com/facebookresearch/ToMe/assets/132449186/6e70806d-7453-48ab-8459-4bac095798e2">

For every test case, I only change the model or batch_size. Other components for data Input, preprocessing.... are the same. (the same device and code)

My question is why the "Total Inference Time" of models with ToMe are similar to baseline (No ToMe)? Didn't throughput mean the efficiency for model inference? Even if I didn't optimize the code for data input and data preprocessing, the "Total Inference Time" still should smaller than the baseline because the ToMe can speed up the time spent in model inference.
Did I misunderstand something?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-to-end inference doesn't be accelerated. #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

End-to-end inference doesn't be accelerated. #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions