Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support tool use assertions for Claude #640

Open
CamdenClark opened this issue Apr 6, 2024 · 2 comments
Open

Support tool use assertions for Claude #640

CamdenClark opened this issue Apr 6, 2024 · 2 comments

Comments

@CamdenClark
Copy link
Contributor

CamdenClark commented Apr 6, 2024

Hey, I'd like to work on support for tool use in Claude, which was just launched https://docs.anthropic.com/claude/docs/tool-use

Unfortunately, Anthropic didn't use the OpenAI pattern for supporting this and changed a few things.

Tool definition

here's the tool definition in claude's API.

[
  {
    "name": "get_stock_price",
    "description": "Get the current stock price for a given ticker symbol.",
    "input_schema": {
      "type": "object",
      "properties": {
        "ticker": {
          "type": "string",
          "description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
        }
      },
      "required": ["ticker"]
    }
  }
]

Basically all that's changed here is that there's no base type of function and input_schema is changed from parameters. I think we can one to one map between anthropic and openai tool definitions. So I'm thinking we can support both the openai and anthropic tool definitions in anthropic.

Forcing tool use

Anthropic doesn't support forcing tool use unless you specifically reference the tool in a user message. Kind of a strange choice but I don't think there's anything to worry about here.

Tool responses

So, putting in the tool responses is a bit annoyingly different from the behavior of OpenAI.

They have tool_use responses as part of the content block that's sent back from Anthropic.

{
  "id": "msg_01Aq9w938a90dw8q",
  "model": "claude-3-opus-20240229",
  "stop_reason": "tool_use",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "<thinking>I need to use the get_weather, and the user wants SF, which is likely San Francisco, CA.</thinking>"
    },
    {
      "type": "tool_use",
      "id": "toolu_01A09q90qw90lq917835lq9",
      "name": "get_weather",
      "input": {"location": "San Francisco, CA", "unit": "celsius"}
    }
  ]
}

Then, you have to send these as part of a content block of a user message.

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
      "content": "65 degrees"
    }
  ]
}

is_error

you can also support giving an explicit flag that the tool had an error.

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
      "content": "ConnectionError: the weather service API is not available (HTTP 500)",
      "is_error": true
    }
  ]
}

There's no way to map this from openai right now so I think I'll just allow it as part of an anthropic response.

I'm going to give how to approach this from a high-level some more thought over the next day or two. I'm thinking directionally promptfoo should try to support the OpenAI prompt format as the lingua franca, and then maybe we can support anthropic specific prompts too?

@typpo
Copy link
Collaborator

typpo commented Apr 6, 2024

Really appreciate the rundown of Claude tools, which I haven't had the chance to look into yet, and the thought you've put into next steps.

I'm thinking directionally promptfoo should try to support the OpenAI prompt format as the lingua franca, and then maybe we can support anthropic specific prompts too?

I agree with this approach, and in some cases we're starting to move in that direction for regular prompting too. It's still good to give the user fine-grained control over provider-specific prompts if they desire.

@CamdenClark
Copy link
Contributor Author

Really appreciate the rundown of Claude tools, which I haven't had the chance to look into yet, and the thought you've put into next steps.

I'm thinking directionally promptfoo should try to support the OpenAI prompt format as the lingua franca, and then maybe we can support anthropic specific prompts too?

I agree with this approach, and in some cases we're starting to move in that direction for regular prompting too. It's still good to give the user fine-grained control over provider-specific prompts if they desire.

Awesome, will get to work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants