Skip to content

Experiments on LLaVA-NeXT #13

Description

@wangzishuo029

Hi, when I evaluate open-llava-next-vicuna-7b on MME-Realworld with model_vqa_mme_realworld.py, it outputs a sentence like "The correct answer to the question is (A) There are three cars and two pedestrians. This is evident from the image, which shows three cars parked on the side of the road and two pedestrians walking on the sidewalk. The other options provided do not accurately reflect the contents of the image. There are no traffic cones, barriers, trucks, trailers, or construction vehicles visible in the image. Therefore, the best answer based on the image is (A).", rather than a single choice.

Is there any idea why this happen?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions