Skip to content

Releases: v7labs/benchllm

v0.3.0 - Function Mocks

20 Jul 12:55
0b9d133
Compare
Choose a tag to compare

Mock calls

You can now mock functions that your chain or agent might be calling:

input: I live in London, can I expect rain today?
expected: ["no"]
calls:
  - name: forecast.get_n_day_weather_forecast
    returns: It's sunny in London.
    arguments:
      location: London
      num_days: 1

This will replace get_n_day_weather_forecast in forecast with a mocked function always returning It's sunny in London.
See examples/weather_functions for some examples.

Embedding Distance

New evalautor EmbeddingEvaluator, embeds both the model output and the expected values, and compare the cosine distance.
Currently the threshold is hardcoded and set to 0.9 but will be dynamic in the future.

$ bench run . --evaluator embedding

Scoring

Evaluators now return List[Evaluator.Candidate] instead of Optional[Evaluator.Match], this lets us inspect the score (for example cosine distance) for failed evaluations.

This is incompatible with the old caching format.

Multiple test functions in the same file

You can now have multiple @benchllm.test in the same python file, the function name is also now shown in the benchllm output.

import benchllm

def my_model(input, model):
    # implementation

@benchllm.test(suite=".")
def gpt_3_5(input: ChatInput):
    return my_model(input)

@benchllm.test(suite=".")
def gpt_4(input: ChatInput):
    return my_model(input, model="gpt-4")

v0.2.0 - Cache

13 Jul 15:44
7a1c240
Compare
Choose a tag to compare

Caching

Added two new Evaluators for handling caching, MemoryCache and FileCache.

Using the API:

evaluator = MemoryCache(SemanticEvaluator())

Using the CLI:

$ bench run --cache memory # or `file` or `none`

For commandline caching is on by default.

Match object

Changed the signature of evaluate_prediction from bool to return Optional[Evaluator.Match]
Match carries information about which of the expected values matched with the output from the tested model.
This will likely be extended in the next release with even more information.

v0.1.0

06 Jul 17:48
Compare
Choose a tag to compare

BenchLLM is now open source!