Toolkit for collecting datasets for Agents and Planning models and running evaluation pipelines.
pip install requirements.txt
We use Hydra library for evaluation pipeline.
Each configuration is specified in config.yaml
format:
# @package _global_
hydra:
job:
name: planning_${agent.model_name}
run:
dir:[YOUR_PATH_TO_OUTPUT_DIR]/${hydra:job.name}
job_logging:
root:
handlers: [ console, file ]
defaults:
- _self_
- data_source: hf
- env: http
- agent: planning
Where you can define the datasource, env and agent you want to evaluate. We present several implementations for each defined in sub yamls:\
field | options |
---|---|
data_source |
hf.yaml |
env |
http.yaml |
agent |
vanilla.yaml planning.yaml reflexion.yaml tree_of_thoughts.yaml adapt.yaml |
The challenge is to generate project template -- small compilable project that can be described in 1-5 sentences containing small examples of all mentioned libraries/technologies/functionality.
Dataset of template-related repos collected GitHub are published to HuggingFace 🤗. Detains about dataset collection and source code is placed in template_generation directory
Model | Metrics |
---|---|