Experiments on LLaVA-NeXT

Hi, when I evaluate open-llava-next-vicuna-7b on MME-Realworld with `model_vqa_mme_realworld.py`, it outputs a sentence like "The correct answer to the question is (A) There are three cars and two pedestrians. This is evident from the image, which shows three cars parked on the side of the road and two pedestrians walking on the sidewalk. The other options provided do not accurately reflect the contents of the image. There are no traffic cones, barriers, trucks, trailers, or construction vehicles visible in the image. Therefore, the best answer based on the image is (A).", rather than a single choice.

Is there any idea why this happen?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments on LLaVA-NeXT #13

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Experiments on LLaVA-NeXT #13

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions