Skip to content

JetBrains-Research/agents-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agents and Planning Models Evaluation 🤖⛓

Toolkit for collecting datasets for Agents and Planning models and running evaluation pipelines.

SetUp

pip install requirements.txt

Evaluation Pipeline Configuration

We use Hydra library for evaluation pipeline. Each configuration is specified in config.yaml format:

# @package _global_
hydra:
  job:
    name: planning_${agent.model_name}
  run:
    dir:[YOUR_PATH_TO_OUTPUT_DIR]/${hydra:job.name}
  job_logging:
    root:
      handlers: [ console, file ]
defaults:
  - _self_
  - data_source: hf
  - env: http
  - agent: planning

Where you can define the datasource, env and agent you want to evaluate. We present several implementations for each defined in sub yamls:\

field options
data_source hf.yaml
env http.yaml
agent vanilla.yaml
planning.yaml
reflexion.yaml
tree_of_thoughts.yaml
adapt.yaml

Project Template Generation Evaluation

The challenge is to generate project template -- small compilable project that can be described in 1-5 sentences containing small examples of all mentioned libraries/technologies/functionality.

Dataset

Dataset of template-related repos collected GitHub are published to HuggingFace 🤗. Detains about dataset collection and source code is placed in template_generation directory

Agent Models

Model Metrics
⚠️ Coming soon ⚠️ ⚠️ Coming soon ⚠️

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published