-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
There's a million ways to do the position encoding. It's also different for local vs. global attention. Just think this warrants thinking about more. Seems to be two main approaches.
Notes:
- Something like ViT, where you just use an
nn.Embedding. This won't really work for different domain sizes, since the indexing is obviously finite. But it is possible to interpolate for different resolutions. - Rotary Position encoding.
- Relative position bias. This is some parameterized matrix that's used as a bias in Softmax attention:
Softmax(QK + B)V.- swinv2 uses a log continuous position bias. Supposed to help when fine-tuning on larger images.
Metadata
Metadata
Assignees
Labels
No labels