Agents and Planning Models Evaluation 🤖⛓

Toolkit for collecting datasets for Agents and Planning models and running evaluation pipelines.

SetUp

pip install requirements.txt

Evaluation Pipeline Configuration

We use Hydra library for evaluation pipeline. Each configuration is specified in config.yaml format:

# @package _global_
hydra:
  job:
    name: planning_${agent.model_name}
  run:
    dir:[YOUR_PATH_TO_OUTPUT_DIR]/${hydra:job.name}
  job_logging:
    root:
      handlers: [ console, file ]
defaults:
  - _self_
  - data_source: hf
  - env: http
  - agent: planning

Where you can define the datasource, env and agent you want to evaluate. We present several implementations for each defined in sub yamls:\

field	options
`data_source`	hf.yaml
`env`	http.yaml
`agent`	vanilla.yaml planning.yaml reflexion.yaml tree_of_thoughts.yaml adapt.yaml

Project Template Generation Evaluation

The challenge is to generate project template -- small compilable project that can be described in 1-5 sentences containing small examples of all mentioned libraries/technologies/functionality.

Dataset

Dataset of template-related repos collected GitHub are published to HuggingFace 🤗. Detains about dataset collection and source code is placed in template_generation directory

Agent Models

Model	Metrics
⚠️ Coming soon ⚠️	⚠️ Coming soon ⚠️

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
configs/template_generation		configs/template_generation
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs/template_generation

configs/template_generation

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Agents and Planning Models Evaluation 🤖⛓

SetUp

Evaluation Pipeline Configuration

Project Template Generation Evaluation

Dataset

Agent Models

About

Releases

Packages

Languages

License

JetBrains-Research/agents-eval

Folders and files

Latest commit

History

Repository files navigation

Agents and Planning Models Evaluation 🤖⛓

SetUp

Evaluation Pipeline Configuration

Project Template Generation Evaluation

Dataset

Agent Models

About

Resources

License

Stars

Watchers

Forks

Languages