Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed media like images, audio, 3d, video or etc? #79

Open
fire opened this issue Feb 26, 2024 · 5 comments
Open

Embed media like images, audio, 3d, video or etc? #79

fire opened this issue Feb 26, 2024 · 5 comments

Comments

@fire
Copy link

fire commented Feb 26, 2024

Hi,

I was wondering if it was in scope to embed media?

@fire fire changed the title In scope or out of scope to embed media like images, audio, 3d, video or etc? Embed media like images, audio, 3d, video or etc? Feb 26, 2024
@emrgnt-cmplxty
Copy link
Contributor

That's definitely in scope. The best way to approach this would be to introduce the necessary embedding providers and to modify or create a new pipeline that shows an example of this in action.

I'm happy to team up on this.

@fire
Copy link
Author

fire commented Feb 26, 2024

I have two primary usecases:

  1. The basic use-case is taking an image and making it an embedding for use. Like stable diffusion or the various combined vision-text models. There are a few models that can also also do video.
  2. My pet emerging technologies use-case is to take a 3d mesh from https://github.com/lucidrains/meshgpt-pytorch and have it auto complete vertices or search a database of other embedded meshes using the mesh-token-embedding.
  3. Someday maybe: audio, speech. I am not familiar at all with this.

@emrgnt-cmplxty
Copy link
Contributor

For image embedding, do you think we can fit it into the pipeline here [https://github.com/SciPhi-AI/R2R/blob/main/r2r/pipelines/basic/ingestion.py] with a specific embedding provider, or do you think we need to fundamentally rework the structure of the codebase in some way?

I think multi-modal is an important use case and I am very interested in figuring out how to best support this.

@fire
Copy link
Author

fire commented Feb 29, 2024

I don't think I can drive multi-modal too much, but I'll see what spare time I can gather.

@fire
Copy link
Author

fire commented Feb 29, 2024

The obvious question are like what happens when we have two different embedding models like token integers, how do we sync them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants