LLaVA-based architecture

How does the Octopus dataset is organized and trained on LLaVA architecture? LLaVA doesn't support in-context learning, if we merge all subtasks into a multi-turn conversation, another problem raises: LLaVA will input all subtask's images embeddings at once, and this problem seems hard to solve.
So how do you deal with that, input no images and only use env information? could you provide a demo.json to show me how dataset is organized on LLaVA architecture? thanks a lot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVA-based architecture #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LLaVA-based architecture #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions