Hi authors, thank you for sharing a valuable work.
I have a question regarding the evaluation code.
Currently, I'm trying to evaluate my own model using python files in huggingface.
However, it seems that the current code version is not compatible with current data annotations in huggingface.
Could you please update the code for the accurate evaluation?
Also, I'm wondering do you have any plan for releasing ground-truth answer for the test split.
Thank you in advance.
Hi authors, thank you for sharing a valuable work.
I have a question regarding the evaluation code.
Currently, I'm trying to evaluate my own model using python files in huggingface.
However, it seems that the current code version is not compatible with current data annotations in huggingface.
Could you please update the code for the accurate evaluation?
Also, I'm wondering do you have any plan for releasing ground-truth answer for the test split.
Thank you in advance.