Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rest API for inference locally #1563

Open
mohamed-alired opened this issue Apr 15, 2024 · 5 comments
Open

Rest API for inference locally #1563

mohamed-alired opened this issue Apr 15, 2024 · 5 comments
Labels
type/feature Feature request

Comments

@mohamed-alired
Copy link

hi 
I have installed h2ogpt locally, but I want to build a frontend app using it, so I was wondering if there's an API that I can consume, like one for ingestion and another for inference. 

@pseudotensor
Copy link
Collaborator

pseudotensor commented Apr 15, 2024

An extensive gradio API exists, see: See readme_client.md and examples via test code like test_client_chat_stream_langchain_steps3

And a full chat OpenAI API that is REST capable exists, but no upload of file or other things exists yet. Is that what you are looking for?

@pseudotensor pseudotensor reopened this Apr 15, 2024
@pseudotensor pseudotensor added the type/feature Feature request label Apr 15, 2024
@mohamed-alired
Copy link
Author

mohamed-alired commented Apr 15, 2024

What I am looking for is a fastapi rest API for the different ingestion techniques and a rag completion API so I can use H2OGPT as a backend rag for my frontend webUI. Also, I wish you included JSON metadata for filtering in ingestion and rag completion so we can choose the files to chat with.

@abuyusif01
Copy link

abuyusif01 commented Apr 17, 2024

hi @mohamed-alired

Am currently building something exactly like this. its still in development tho. U can certainly fork the repo or make PR's. the foundation is there. The project extends the official FastAPI Template so scalling and deploying wont really much of a husle.

check it out here: https://github.com/abuyusif01/h2ogpt-fast-api/tree/main/backend/app/h2ogpt

there's still alot things need to be done. Including a proper README and support Streaming the Response (I planed to get this done in this weekend)

Here is what we currently support:

  1. Chat with on disk files (there's an endpoint to upload docs, and retrieve whats being uploaded, so u can select which doc to ingest)
  2. Chat with user Created pipelines (Currently MongoDB streamed data)
  3. Chat with Urls
  4. Chat with Publications, We use OpenDoaj API and scihub to download the papers.

@mohamed-alired
Copy link
Author

hi @abuyusif01
how are you?
i am really busy so if i have some time i will definitely PR
bit i can give you some recommendations like don't force the inference with users cause i may wanna use it on my existing project also i think you have to make it possible with local inference like llamaCpp or something else so it's completely locally

@abuyusif01
Copy link

@mohamed-alired
You're right we dont really need to enforce auth, hence its removal
I also make it possible to local inference using llamaCPP.

Subsequently, i restructure the repo, write a readme and containerize the app. Its now easy to setup + extend
check it here: https://github.com/abuyusif01/h2ogpt-fast-api

@pseudotensor
Since gradio is relatively stable now, why not reference this in the readme. so other people can use it as a starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Feature request
Projects
None yet
Development

No branches or pull requests

3 participants