Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Llama2 inferentia] : runtime error when invoking endpoint through boto3 #4549

Open
krokoko opened this issue Jan 31, 2024 · 0 comments
Open

Comments

@krokoko
Copy link

krokoko commented Jan 31, 2024

Link to the notebook
https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/aws-trainium-inferentia-finetuning-deployment/llama-2-trainium-inferentia-finetuning-deployment.ipynb

Describe the bug
Using a Lambda function with boto3 to query the neuron llama2 7b f model deployed on a ML INF2 XLARGE instance, the invoke endpoint operation fails with the following message:

{
  "errorMessage": "An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message \"{\n  \"code\": 400,\n  \"type\": \"BadRequestException\",\n  \"message\": \"Parameter model_name is required.\"\n}\n\". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/testllamaneuron in account XXXXXXX for more information.",
  "errorType": "ModelError",
  "requestId": "2f2a7aa4-9eeb-42f5-9a14-6285894581bb",
  "stackTrace": [
    "  File \"/var/task/lambda.py\", line 19, in handler\n    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,\n",
    "  File \"/var/runtime/botocore/client.py\", line 530, in _api_call\n    return self._make_api_call(operation_name, kwargs)\n",
    "  File \"/var/runtime/botocore/client.py\", line 960, in _make_api_call\n    raise error_class(parsed_response, operation_name)\n"
  ]
}

The model configuration is as follow:

  • image: 763104351884.dkr.ecr.us-east-2.amazonaws.com/djl-inference:0.24.0-neuronx-sdk2.14.1
  • env variables:
image
  • modelId: meta-textgenerationneuron-llama-2-7b-f
  • modelVersion: 1.0.0

To reproduce

  • Deploy the model to an endpoint
  • Create a lambda function to query the endpoint with the following code:
import boto3
import json

def handler(event, context):
    runtime= boto3.client('runtime.sagemaker')
    
    ENDPOINT_NAME = 'testllamaneuron'
    
    dic = {
     "inputs": [
      [
       {"role": "system", "content": "You are chat bot who writes songs"},
       {"role": "user", "content": "Write a rap song about Amazon Web Services"}
      ]
     ],
     "parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
    }
    
    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                       ContentType='application/json',
                                       Body=json.dumps(dic),
                                       CustomAttributes="accept_eula=true")
    
    result = json.loads(response['Body'].read().decode())
    print(result)
    
    
    return {
        "statusCode": 200,
        "body": json.dumps(result)
    }

Logs

Lambda Function logs:

[ERROR] ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "BadRequestException",
  "message": "Parameter model_name is required."
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant