🦀 Crab: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents

Overview

Crab is a framework for building LLM agent benchmark environments in a Python-centric way.

🌐 Cross-platform

Create build agent environments that support various deployment options including in-memory, Docker-hosted, virtual machines, or distributed physical machines, provided they are accessible via Python functions.
Let the agent access all the environments in the same time through a unified interface.

⚙ ️Easy-to-use Configuration

📐 Novel Benchmarking Suite

Define tasks and the corresponding evlauators in an intuitive Python-native way.
Introduce a novel graph evaluator method providing fine-grained metrics.

pip install crab-framework[visual-prompt]

You can run the examples using the following command.

export OPENAI_API_KEY=<your api key>
python examples/single_env.py
python examples/multi_env.py

You can run the examples using the following command.

export OPENAI_API_KEY=<your api key>
python examples/desktop_env.py "Open Firefox"

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
assets		assets
crab		crab
docs		docs
examples		examples
licenses		licenses
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml