I think researchers should make an evaluation on some benchmark, such as Humaneval, Locomo, and SWE-bench. Just for proving😭🙋. Thank you for your work.
I think researchers should make an evaluation on some benchmark, such as Humaneval, Locomo, and SWE-bench. Just for proving😭🙋.
Thank you for your work.