Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding support for embeddings #3462

Open
st-pasha opened this issue May 8, 2023 · 5 comments
Open

Consider adding support for embeddings #3462

st-pasha opened this issue May 8, 2023 · 5 comments

Comments

@st-pasha
Copy link
Contributor

st-pasha commented May 8, 2023

  • New parametric data type: float32[N], and possibly float16[N]. The data type is a plain array of N floats.
  • Supported operations:
    • create from a plain list of python floats;
    • create from a list of N regular columns (which could also be numpy arrays);
    • dot-product;
    • cosine-similarity;
    • euclidian/manhattan distance;
    • jacard similarity;
    • others.

The LLMs are in fashion now, so why not add a support for them?

@oleksiyskononenko
Copy link
Contributor

Can’t the existing array type be used for that?

@st-pasha
Copy link
Contributor Author

st-pasha commented May 9, 2023

You'd want an array of fixed size, kind of like a mathematical vector. It might be pretty similar to the existing array type in terms of implementation, though.

@oleksiyskononenko
Copy link
Contributor

Yeah, but it probably could be array[float, N] type for fixed lengths vectors and just array[float] for arbitrary length.

@oleksiyskononenko
Copy link
Contributor

I'm not an expert on LLMs, could you please point me to some information on the data types that they're using?

@st-pasha
Copy link
Contributor Author

st-pasha commented May 9, 2023

At the most basic level, an embedding is just a vector of floats of a fixed length. For example, see here: https://www.pinecone.io/learn/vector-database/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants