Amazing Work!

I recommend adding a line on vision foundation models—from MaskGIT[1], to MUSE[2], to Meissonic[3]. Incorporating both vision and language perspectives will make the overall survey more comprehensive, especially in the context of Figure 1.


[1] [MaskGIT: Masked Generative Image Transformer](https://arxiv.org/abs/2202.04200)

[2] [Muse: Text-To-Image Generation via Masked Generative Transformers](https://arxiv.org/abs/2202.04200)

[3] [Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis](https://arxiv.org/abs/2410.08261)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amazing Work! #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Amazing Work! #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions