The current FMBench implementation for evaluations uses judges on Amazon Bedrock via litellm. To add a bring your own judge functionality, we will have to change this to have a base evaluatorClass that will make predictions and calculate the cost (similar to the FMBenchPredictor base class).
Using this implementation, customers will be able to evaluate models using their own judge LLMs in a custom/personalized manner.