Skip to content

Inquiry about MoE (Mixture of Experts) Training Support #272

@dyyoungg

Description

@dyyoungg

Hello VILA team!

First, thank you for open-sourcing this incredible family of Vision Language Models! The work on VILA, NVILA, and is truly impressive, and the focus on efficiency and deployment is particularly valuable for the community.

I have been exploring the codebase and documentation with great interest. My question is regarding the future development roadmap: ​Are there any plans to support training VILA models with a Mixture of Experts (MoE) architecture(such as Qwen3-MOE, Deepseek-MOE models)?​​

The integration of MoE could be a powerful way to further scale the model's capacity and capabilities while maintaining inference efficiency, which aligns perfectly with the project's goals. This would be especially exciting for handling even more complex multi-image and long-video understanding tasks.

I would be very interested to know if this is a direction you are considering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions