Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AzureOpenAI #197

Open
rajib76 opened this issue Nov 21, 2023 · 8 comments
Open

Add support for AzureOpenAI #197

rajib76 opened this issue Nov 21, 2023 · 8 comments

Comments

@rajib76
Copy link

rajib76 commented Nov 21, 2023

I see that langkit is not supporting Azure Open AI. When can this be supported?

@jamie256
Copy link
Collaborator

Hi @rajib76! Thanks for opening an issue.

Are you looking for support of an Azure OpenAI deployed LLM in LangKit's OpenAIDefault kind of thing like we do in the Choosing an LLM example? or something else?

I wanted to get some better support for Azure hosted models into LangKit soon, we could probably focus on changes to support the Azure OpenAI models as a first iteration if that is helpful (e.g. gpt-35-turbo, gpt4)?

@rajib76
Copy link
Author

rajib76 commented Nov 21, 2023

Yes, looking for support in LangKit's OpenAIDefault. Currently if I need to do hallucination checks, looks like I cannot do it using Azure Open AI. It is only supported for open ai. I am looking at using to evaluate the response from Azure Open AI for hallucination, prompt injection, contextual relevancy and all

@jamie256
Copy link
Collaborator

Ok, working on it. If you want to try an initial dev build:
pip install langkit==0.0.26.dev0

New class and usage looks like this:

from langkit import response_hallucination
from langkit.openai import OpenAIAzure

response_hallucination.init(llm=OpenAIAzure(engine="LangKit-test-01"), num_samples=1)

Also need to set these new env vars:

os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["AZURE_OPENAI_KEY"]

As referenced in this example: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?tabs=python&pivots=programming-language-chat-completions#working-with-the-gpt-35-turbo-and-gpt-4-models

@rajib76
Copy link
Author

rajib76 commented Nov 22, 2023

Was able to run it but saw one thing. I tried the example code for hallucination score. I did not see a way to add a context and then tell the LLM to grade the response based on question and context

@rajib76
Copy link
Author

rajib76 commented Nov 22, 2023

Was able to run it but saw one thing. I tried the example code for hallucination score. I did not see a way to add a context and then tell the LLM to grade the response based on question and context

I think I got how it is working. It is self check validation. The prompt is being sent to the same model to get and answer again and then we are checking back with the response. Is it possible to do below

  1. Send a ground truth instead of LLM creating sample
  2. If I need to create a sample with a llm, can I use a different LLM than the one which actually does the hallucination check(or is the hallucination check done by the LLM or through a ML model)

@FelipeAdachi
Copy link
Contributor

Hi, @rajib76

Yes, this is exactly how response_hallucination works. To try to answer your questions:

  1. This module was designed with the zero-resource scenario in mind - without ground truth or context available - so sending a ground truth is impossible. This is something we could look into, though. Would it work for your use case if we had a variant where you pass the response and ground truth/context columns, thus removing the need to generate additional samples? (one llm call would still be required for the consistency check)

  2. Currently, both processes (creating the samples and consistency check) use the same LLM, so it is not possible to use one llm to create the sample and another one to do the hallucination check. Can you explain your scenario a bit more?

@rajib76
Copy link
Author

rajib76 commented Nov 28, 2023

Thanks Felipe, for#1, it will work if we can just pass the response and the ground truth. But do we need a LLM to do the consistency check. Can we not have an option to do a semantic match with an embedding model and then put a threshold score.

For #2, I am planning to implement chain of verification as I mentioned in this recording. I wanted to check if this can be out of the box from langkit

https://www.youtube.com/watch?v=qRyCmi0DeU8

@FelipeAdachi
Copy link
Contributor

Thanks for the reply @rajib76

For #1, yes, it should be possible to perform the semantic similarity based consistency check without the presence of an LLM.

And #2 also makes a lot of sense for your and others' scenarios

I created two issues to reflect both topics we are discussing:

We'll plan those changes in future sprints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants