- Introduces Ragas as a supported evaluation framework. This integration only supports the
RubricsScoremetric and OpenAI models. Users can pass in either a dataset with a pre-computeduser_input,referenceandresponsefields or they can provide a dataset containinguser_inputandreferencealong with information about a model endpoint that will be used for computing theresponsefield.
- Adds the ability to provide a custom system prompt to the MMLU-based evaluators. When a system prompt is provided, LM-eval applies the chat template under the hood, else it will pass the model a barebones prompt.
- Adds an
extra_argsparameter to the.runmethod of all MMLU-based evaluators. This way, consumers are able to directly pass any additional arguments they want through to thelm_eval.evaluators.simple_evaluatefunction.
- Added ability to specify a custom http client to MT-Bench