You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work. For multiple-choice evaluations, the model may be influenced by the order in which the options are presented. Have you considered using a likelihood-based evaluation method, similar to the approach used in vstar bench?
Thanks for your great work. For multiple-choice evaluations, the model may be influenced by the order in which the options are presented. Have you considered using a likelihood-based evaluation method, similar to the approach used in vstar bench?