Rest API for inference locally #1563

mohamed-alired · 2024-04-15T20:38:01Z

hi
I have installed h2ogpt locally, but I want to build a frontend app using it, so I was wondering if there's an API that I can consume, like one for ingestion and another for inference.

pseudotensor · 2024-04-15T21:47:44Z

An extensive gradio API exists, see: See readme_client.md and examples via test code like test_client_chat_stream_langchain_steps3

And a full chat OpenAI API that is REST capable exists, but no upload of file or other things exists yet. Is that what you are looking for?

mohamed-alired · 2024-04-15T22:43:44Z

What I am looking for is a fastapi rest API for the different ingestion techniques and a rag completion API so I can use H2OGPT as a backend rag for my frontend webUI. Also, I wish you included JSON metadata for filtering in ingestion and rag completion so we can choose the files to chat with.

abuyusif01 · 2024-04-17T04:26:45Z

hi @mohamed-alired

Am currently building something exactly like this. its still in development tho. U can certainly fork the repo or make PR's. the foundation is there. The project extends the official FastAPI Template so scalling and deploying wont really much of a husle.

check it out here: https://github.com/abuyusif01/h2ogpt-fast-api/tree/main/backend/app/h2ogpt

there's still alot things need to be done. Including a proper README and support Streaming the Response (I planed to get this done in this weekend)

Here is what we currently support:

Chat with on disk files (there's an endpoint to upload docs, and retrieve whats being uploaded, so u can select which doc to ingest)
Chat with user Created pipelines (Currently MongoDB streamed data)
Chat with Urls
Chat with Publications, We use OpenDoaj API and scihub to download the papers.

mohamed-alired · 2024-04-23T18:04:09Z

hi @abuyusif01
how are you?
i am really busy so if i have some time i will definitely PR
bit i can give you some recommendations like don't force the inference with users cause i may wanna use it on my existing project also i think you have to make it possible with local inference like llamaCpp or something else so it's completely locally

abuyusif01 · 2024-04-28T02:50:07Z

@mohamed-alired
You're right we dont really need to enforce auth, hence its removal
I also make it possible to local inference using llamaCPP.

Subsequently, i restructure the repo, write a readme and containerize the app. Its now easy to setup + extend
check it here: https://github.com/abuyusif01/h2ogpt-fast-api

@pseudotensor
Since gradio is relatively stable now, why not reference this in the readme. so other people can use it as a starting point.

pseudotensor closed this as completed Apr 15, 2024

pseudotensor reopened this Apr 15, 2024

pseudotensor added the type/feature Feature request label Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rest API for inference locally #1563

Rest API for inference locally #1563

mohamed-alired commented Apr 15, 2024

pseudotensor commented Apr 15, 2024 •

edited

mohamed-alired commented Apr 15, 2024 •

edited

abuyusif01 commented Apr 17, 2024 •

edited

mohamed-alired commented Apr 23, 2024

abuyusif01 commented Apr 28, 2024

Rest API for inference locally #1563

Rest API for inference locally #1563

Comments

mohamed-alired commented Apr 15, 2024

pseudotensor commented Apr 15, 2024 • edited

mohamed-alired commented Apr 15, 2024 • edited

abuyusif01 commented Apr 17, 2024 • edited

mohamed-alired commented Apr 23, 2024

abuyusif01 commented Apr 28, 2024

pseudotensor commented Apr 15, 2024 •

edited

mohamed-alired commented Apr 15, 2024 •

edited

abuyusif01 commented Apr 17, 2024 •

edited