Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use Inference Endpoint: UnprocessableEntityError: Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'} #1870

Open
1 of 4 tasks
rvoak opened this issue May 8, 2024 · 1 comment

Comments

@rvoak
Copy link

rvoak commented May 8, 2024

System Info

I am following the instructions here (https://huggingface.co/blog/llama3#inference-integrations) to deploy Llama-3 on an Inference Endpoint. I created my endpoint and once it was setup, I tried to reproduce the basic example.

However, I get the following error:

---------------------------------------------------------------------------
UnprocessableEntityError                  Traceback (most recent call last)
[<ipython-input-14-fbe6f32d4e45>](https://localhost:8080/#) in <cell line: 11>()
      9 )
     10 
---> 11 chat_completion = client.chat.completions.create(
     12     model="tgi",
     13     messages=[

4 frames
[/usr/local/lib/python3.10/dist-packages/openai/_base_client.py](https://localhost:8080/#) in _request(self, cast_to, options, remaining_retries, stream, stream_cls)
   1018 
   1019             log.debug("Re-raising status error")
-> 1020             raise self._make_status_error_from_response(err.response) from None
   1021 
   1022         return self._process_response(

UnprocessableEntityError: Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'}

As I am using an inference Endpoint, I'm not sure how or where I can modify the template. I noticed another issue with a similar problem, but it has been closed with a commit saying it was fixed; I still have the same error when using the Endpoint.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction


T="<<TOKEN>>"

# initialize the client but point it to TGI
client = OpenAI(
    base_url="https://URL/v1/", 
    api_key=T,  # replace with your token
)

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Why is open-source software important?"},
    ],
    stream=True,
    max_tokens=500
)

# iterate and print stream
for message in chat_completion:
    print(message.choices[0].delta.content, end="")```

### Expected behavior

Unsure. But definitely not an error.
@rastna12
Copy link

rastna12 commented May 8, 2024

I'm seeing this same template error issue as well with llama3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants