Skip to content

Position Encoding #3

@arthurfeeney

Description

@arthurfeeney

There's a million ways to do the position encoding. It's also different for local vs. global attention. Just think this warrants thinking about more. Seems to be two main approaches.

Notes:

  1. Something like ViT, where you just use an nn.Embedding. This won't really work for different domain sizes, since the indexing is obviously finite. But it is possible to interpolate for different resolutions.
  2. Rotary Position encoding.
  3. Relative position bias. This is some parameterized matrix that's used as a bias in Softmax attention: Softmax(QK + B)V.
    • swinv2 uses a log continuous position bias. Supposed to help when fine-tuning on larger images.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions