Position Encoding

There's a million ways to do the position encoding. It's also different for local vs. global attention. Just think this warrants thinking about more.  Seems to be two main approaches. 

Notes:
1. Something like ViT, where you just use an `nn.Embedding`. This won't really work for different domain sizes, since the indexing is obviously finite.  But it is possible to interpolate for different resolutions.
2. Rotary Position encoding. 
3. Relative position bias. This is _some_ parameterized matrix that's used as a bias in Softmax attention:  `Softmax(QK + B)V`.
    - [swinv2](https://arxiv.org/abs/2111.09883) uses a log continuous position bias. Supposed to help when fine-tuning on larger images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Position Encoding #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Position Encoding #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions