Support for `v1/embeddings` endpoint #179

josephrocca · 2023-04-22T11:09:46Z

I'm not sure how feasible or within-scope this is, but it'd be very useful if the Basaran project were able to implement the v1/embeddings endpoint (using Hugging Face repos, like with the v1/completions endpoint).

Text embeddings are very often used alongside the completion endpoints, and we have this particular requirement for OpenCharacters so we can save and search over the character's "memories".

(And very soon we'll have the same requirement for text-to-image. If it were possible for Basaran to aim to be the OpenAI-compatible, open-source API server, that would be awesome.)

The text was updated successfully, but these errors were encountered:

peakji · 2023-04-23T02:28:10Z

Basaran's current goal is to ensure compatibility with both the text completion and chat completion APIs, which actually share the same model. To support embeddings, we would need another (or a set of) model(s), which is actually how OpenAI API does it.

From a architecture perspective, a GPT-like decoder-only is not the best choice for obtaining embeddings. I suggest using options like SentenceTransformers or Universal Sentence Encoder. Therefore, users will need to deploy additional services regardless, so perhaps the embeddings feature is indeed beyond the scope of the Basaran project?

Electrofried · 2023-04-23T03:20:34Z

Here is the thing, from an end user perspective for this to be a drop in replacement for OpenAI API for it to be considered 'functional', it needs to replicate all main features of the API. I understand that the goal is to only replicate the completion part of the API, however for the vast majority of implementations it will simply make the project useless and require the end user to implement additional measures.

What many (most?) people want is to be able to load up this docker image, and then redirect any application that currently uses OpenAI api and have it 'just work'. I understand it may be outside the scope of the project but I honestly hope you will consider expanding the scope to cover the entire API. Without expanding the scope, the use cases are extremely limited.

Please do not take this input the wrong way, I certainly appreciate the hard work that people have been doing on this project and hope it continues to thrive.

josephrocca · 2023-04-23T05:39:41Z

To support embeddings, we would need another (or a set of) model(s), which is actually how OpenAI API does it.

Yep, of course.

What many (most?) people want is to be able to load up this docker image, and then redirect any application that currently uses OpenAI api and have it 'just work'.

Yes, this is exactly the use case in OpenCharacters. We need a "one-stop-shop" because the UX for directing users away from the closed ecosystem would otherwise be a 10 step process instead of just "install docker and run this command". It needs to be that simple, and Basaran has been awesome for this so far. It seems like Basaran has enough momentum that it could be the open-source ML server.

Also, I just want to emphasise that the Hugging Face repo based approach is excellent and should be maintained for any other APIs (embedding, text-to-image, etc.)

peakji · 2023-04-23T05:48:35Z

@Electrofried @josephrocca I completely understand and agree with your point of view! The current difficulty actually comes from the architecture: supporting embeddings requires deploying additional models, which are much smaller than LLM but still require significant resources. As a result, Basaran may no longer be a simple docker image, but rather multiple services behind a router, with increased deployment complexity. For example:

[user]  -->  [basaran-nginx]  -  /v1/completions  ->  [basaran-llm]
                              -  /v1/chat  ->         [basaran-llm]
                              -  /v1/embeddings  ->   [basaran-embedding]

peakji · 2023-04-23T05:56:27Z

We plan to focus on achieving compatibility with the chat API in the short term, and in the long term, we may start a router project that provides a complete replacement for the OpenAI API, where Basaran is one of the backend. This will also enable model selection using the model parameter.

This could be the beginning of a whole new ecosystem!

josephrocca · 2023-04-23T06:08:23Z

This sounds perfect! I was actually going to open a separate issue about allowing multiple LLM models behind the same API (i.e. with a single Docker command, but with multiple user/repos specified), and the client uses the model parameter to choose which one. IIUC this would open up the possibility for something like that.

(Maybe even an option for lazily-loaded models so you don't specify MODEL in the docker command - you just make a request, and it loads whatever model you requested on-the-fly. Useful for testing, at the very least.)

Either way, I'm very excited for this - it would be awesome if the open source ML ecosystem had a full, drop-in replacement for OpenAI APIs.

josephrocca · 2023-04-25T08:21:05Z

Maybe even an option for lazily-loaded models so you don't specify MODEL in the docker command - you just make a request, and it loads whatever model you requested on-the-fly. Useful for testing, at the very least.

@peakji Actually, this might be a really important feature. I can imagine a cloud service (perhaps run by the Basaran project itself as an open-core startup) where I can just tell my OpenCharacters users to swap the api.openai.com URL in their settings for api.basaran.com and then everything works exactly the same, and you just specify the huggingface user/repo as the model parameter. This would be really awesome, because a lot of my users don't have the hardware or the know-how to run Basaran locally, but they still want to play with various open source models.

JacoBezuidenhout · 2023-04-25T10:48:11Z

+1 for /embeddings :)

josephrocca · 2023-05-01T20:21:07Z

Related project: https://github.com/closedai-project/closedai They're also intending to add embeddings, image generation, etc. but currently they only support completion and chat completion. Might be room for some collaboration here

hewr1993 · 2023-06-03T03:49:14Z

How about the following lines added here

def get_embeddings(self, input_ids):
    if input_ids.ndim == 1:
        input_ids = input_ids[None, :]
    outs = self.model.base_model.forward(input_ids, return_dict=False)
    features = outs[0].float()
    return features.mean(dim=1)[0]  # force batch 1

reference to https://github.com/UKPLab/sentence-transformers/blob/214498f/sentence_transformers/SentenceTransformer.py#L809

peakji · 2023-06-03T04:38:37Z

@hewr1993 It may not be a good idea to use a text generation model to obtain embeddings. I will explain in detail below.

peakji · 2023-06-03T05:07:01Z

We have decided not to add support for embeddings in Basaran:

Currently, Basaran is designed to serve only one model per process. Considering the capabilities of commodity hardware and the size of the latest models, we believe this is a reasonable design. The primary goal of Basaran is to provide text completion capabilities (and soon-to-be-added chat completion), which generally require decoder-only or encoder-decoder architecture.

However, the best practice for embedding models is to use a Transformer encoder. Therefore, it is challenging to reuse the same model to support both completion and embedding. In fact, the OpenAI embedding model, sometimes referred as GPT-3 embedding, is actually a separate encoder model initialized with the weights of GPT or Codex^[1].

In addition to the limitations imposed by the model structure, we currently cannot achieve full compatibility with OpenAI's embedding API using open source models: OpenAI's latest model, text-embedding-ada-002, has successfully replaced the four previous models used for different scenarios, thereby providing a simple unified embedding API. However, the models in the open-source community are currently not as versatile. They either require different models for symmetric or asymmetric tasks or need specific instructions to adapt to different domains^[2].

Therefore, considering the engineering and research limitations, we have decided not to add support for embeddings in Basaran. In the future, we may initiate a new project that acts as a router to fully support all OpenAI APIs. Multiple Basaran instances (or other inference services) can be mounted to achieve load balancing and model selection at that time.

References:

Neelakantan, Arvind, et al. "Text and code embeddings by contrastive pre-training." arXiv:2201.10005 (2022).
Su, Hongjin, et al. "One Embedder, Any Task: Instruction-Finetuned Text Embeddings." arXiv:2212.09741 (2022).

peakji added the question Further information is requested label Apr 23, 2023

peakji added the enhancement New feature or request label Apr 23, 2023

peakji pinned this issue Jun 3, 2023

peakji removed enhancement New feature or request question Further information is requested labels Jun 3, 2023

peakji added the wontfix This will not be worked on label Jun 3, 2023

peakji closed this as completed Jun 3, 2023

KastanDay mentioned this issue Sep 21, 2023

Strong need for multiple models in a single deployment #263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for `v1/embeddings` endpoint #179

Support for `v1/embeddings` endpoint #179

josephrocca commented Apr 22, 2023

peakji commented Apr 23, 2023

Electrofried commented Apr 23, 2023

josephrocca commented Apr 23, 2023

peakji commented Apr 23, 2023

peakji commented Apr 23, 2023

josephrocca commented Apr 23, 2023

josephrocca commented Apr 25, 2023

JacoBezuidenhout commented Apr 25, 2023

josephrocca commented May 1, 2023

hewr1993 commented Jun 3, 2023 •

edited

peakji commented Jun 3, 2023

peakji commented Jun 3, 2023

Support for v1/embeddings endpoint #179

Support for v1/embeddings endpoint #179

Comments

josephrocca commented Apr 22, 2023

peakji commented Apr 23, 2023

Electrofried commented Apr 23, 2023

josephrocca commented Apr 23, 2023

peakji commented Apr 23, 2023

peakji commented Apr 23, 2023

josephrocca commented Apr 23, 2023

josephrocca commented Apr 25, 2023

JacoBezuidenhout commented Apr 25, 2023

josephrocca commented May 1, 2023

hewr1993 commented Jun 3, 2023 • edited

peakji commented Jun 3, 2023

peakji commented Jun 3, 2023

Support for `v1/embeddings` endpoint #179

Support for `v1/embeddings` endpoint #179

hewr1993 commented Jun 3, 2023 •

edited