You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I am using an inference Endpoint, I'm not sure how or where I can modify the template. I noticed another issue with a similar problem, but it has been closed with a commit saying it was fixed; I still have the same error when using the Endpoint.
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
T="<<TOKEN>>"
# initialize the client but point it to TGI
client = OpenAI(
base_url="https://URL/v1/",
api_key=T, # replace with your token
)
chat_completion = client.chat.completions.create(
model="tgi",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Why is open-source software important?"},
],
stream=True,
max_tokens=500
)
# iterate and print stream
for message in chat_completion:
print(message.choices[0].delta.content, end="")```
### Expected behavior
Unsure. But definitely not an error.
The text was updated successfully, but these errors were encountered:
System Info
I am following the instructions here (https://huggingface.co/blog/llama3#inference-integrations) to deploy Llama-3 on an Inference Endpoint. I created my endpoint and once it was setup, I tried to reproduce the basic example.
However, I get the following error:
As I am using an inference Endpoint, I'm not sure how or where I can modify the template. I noticed another issue with a similar problem, but it has been closed with a commit saying it was fixed; I still have the same error when using the Endpoint.
Information
Tasks
Reproduction
The text was updated successfully, but these errors were encountered: