Very slow algorithm, is that normal?

Hello,

I implemented the algorithm in the vision transformer architecture the following way:
```
#inside __init__()
self.spe = SineSPE(num_heads=head_cnt,in_features=in_dim,num_sines=5,num_realizations=64)
self.filter = SPEFilter(gated=False,code_shape=self.spe.code_shape)

#inside forward()
q,k=self.filter(q,k,self.spe(q.shape[:2]))
qk,kp = performer(...)
out=lin_attention(...)
```

The model I am using has 4 layers 6 heads and embedding dimension 384, patch_size=4.

Training 100 epochs with CIFAR100 converges to 42.3% and without SPE 45.3%. Although this can be expected, with SPE the training time is around 6x longer, is that normal?
Performers + ViT takes 39 minutes
Perfomers + ViT + SPE takes around 4 hours
For both I am using 2 Titan XP GPUs.

This is very problematic to me because I was considering scaling up those experiments with imagenet.

I would also like to know how can I implement the indexing T=N^2 for images (where did you do it in the lra benchmark?), according to section 2 of the paper.

Many thanks!






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow algorithm, is that normal? #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Very slow algorithm, is that normal? #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions