Today EvalAlgorithmInterface.evaluate is typed to return List[EvalOutput] ("for dataset(s)", per the docstring), but its dataset_config argument only accepts Optional[DataConfig].
It seems like most concrete eval algorithms (like QAAccuracy here) either take the user's data_config for a single dataset, or take all the pre-defined DATASET_CONFIGS relevant to the evaluator's problem type.
...So the internal logic of evaluators is set up to support providing multiple datasets and returning multiple results already, but we seem to prevent users from calling evaluate() with multiple of their own datasets for no particular reason?
Today EvalAlgorithmInterface.evaluate is typed to return
List[EvalOutput]("for dataset(s)", per the docstring), but itsdataset_configargument only acceptsOptional[DataConfig].It seems like most concrete eval algorithms (like QAAccuracy here) either take the user's
data_configfor a single dataset, or take all the pre-defined DATASET_CONFIGS relevant to the evaluator's problem type....So the internal logic of evaluators is set up to support providing multiple datasets and returning multiple results already, but we seem to prevent users from calling
evaluate()with multiple of their own datasets for no particular reason?