Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embedding search #17

Open
jasonjmcghee opened this issue Dec 29, 2023 · 29 comments
Open

Add embedding search #17

jasonjmcghee opened this issue Dec 29, 2023 · 29 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jasonjmcghee
Copy link
Owner

jasonjmcghee commented Dec 29, 2023

rem should index all text via embedding store.

We could use something like https://github.com/asg017/sqlite-vss

If we go this route we should fork / open a PR to add the extension https://github.com/stephencelis/SQLite.swift/tree/3d25271a74098d30f3936d84ec1004d6b785d6cd/Sources/SQLite/Extensions

This way we can search without needing verbatim matches.

We'll need to see what the RAM footprint and insertion time is.


More out of the box solutions appear to be available now:

https://github.com/ashvardanian/SwiftSemanticSearch

We'd need to see how long insertion / index updates take, but seems super promising.

@jasonjmcghee jasonjmcghee added the enhancement New feature or request label Dec 29, 2023
@jasonjmcghee
Copy link
Owner Author

Logged an issue over in SQLite.swift repo, but that doesn't mean we can't fork / add support / open a PR there to fulfill the issue! stephencelis/SQLite.swift#1232

@jasonjmcghee
Copy link
Owner Author

Exploring embedding generation from swift… seems like a good candidate would be using candle (rust) with a sentence transformer, and building a binary that takes in text and outputs embeddings.

or explore CoreML and look into transformer or ONNX conversion

@jasonjmcghee
Copy link
Owner Author

I'm really bad at C bindings stuff but i tried to put together a candle text -> embeddings binary that we can talk to via FFI

https://github.com/jasonjmcghee/rust_embedding_lib

@jasonjmcghee jasonjmcghee added the help wanted Extra attention is needed label Dec 30, 2023
@roblg
Copy link

roblg commented Dec 30, 2023

from rust_embedding_lib README.md

Am I crazy not to use https://github.com/huggingface/swift-transformers?

You might be. :) (edit: although it doesn't seem like there's a ton actually present in that library right now) I was noodling on this and I was prepared to try and embed a Python interpreter into this binary to get access to the whole ecosystem of Python modules there... ; didn't realize Swift was an option there. (Also the idea of embedding a Python interpreter into something seems kind of insane, so I just wanted to try it.)

Do you have an idea of which model embeddings you want to use for search? I've played with a couple of other projects that defaulted to bge-small-en-v1.5 -- #15 or all-mpnet-base-v2 -- #45 from HF leaderboard: https://huggingface.co/spaces/mteb/leaderboard

Both are pretty small, and "seem" good for RAG based on the limited poking I've done with them. I've never tried to use them outside of python though.

edit: n/m, I see gte-small in the rust project. That's #22 on the leaderboard!

@jasonjmcghee
Copy link
Owner Author

gte-small feels like a good balance between quality and size from manual experimentation, but totally open to suggestion and / or making it so people can use whatever they want

@roblg
Copy link

roblg commented Dec 30, 2023

It looks like somebody already posted a coreml conversion of gte-small: https://huggingface.co/thenlper/gte-small/tree/main/coreml/feature-extraction/float32_model.mlpackage

I have no experience w/ this, so I don't know if that's a format we can use but I found it while researching conversion options.

I also found https://github.com/huggingface/exporters, but they appear to not support embedding models (plus I tried to do the conversion using their tool and it fails a validation step because some math is coming up with NaN.)

@jasonjmcghee
Copy link
Owner Author

Theoretically, what I built should work, we just need to build the swift framework

@roblg
Copy link

roblg commented Dec 30, 2023

I guess that's a question I should have asked initially -- is the FFI bridge + rust lib the way you'd prefer to go? Or something more native like CoreML?

@jasonjmcghee
Copy link
Owner Author

😅 rust embeddings approach means any safetensors model with config and tokenizers should work, which feels like a very good thing. But if you can get CoreML working- that's awesome. I did noticed they were strangely large - like double the size for gte-small

@roblg
Copy link

roblg commented Dec 30, 2023

rust embeddings approach means any safetensors model with config and tokenizers should work

Agreed. The "run anything on the internet" was one of the reasons I felt like my awful embed-Python approach could almost be justifiable. I'm agnostic either way re: rust lib vs coreml, just having fun soaking all this stuff up. For my own entertainment I'll probably throw up a branch on my fork illustrating the coreml approach, but I've got no attachment to it. I've just never played w/ CoreML before.

@jasonjmcghee
Copy link
Owner Author

Please! That would be awesome! Thank you- I can't wait.

@roblg
Copy link

roblg commented Dec 31, 2023

Not having great luck with prebuilt coreml model. Will post more later on that.

re: rust/candle - I did notice that candle doesn't support metal acceleration yet, only the 'accelerate' framework. I'm not sure if that's a concern with the embedding part, but I could imagine it will be with local LLMs

@jasonjmcghee
Copy link
Owner Author

Not having great luck with prebuilt coreml model. Will post more later on that.

You got this!

candle doesn't support metal acceleration yet

Problem for another day. Don't need the best solution, just need one that works for now.

@vkehfdl1
Copy link

Hi, @jasonjmcghee
I am making RAGchain, which is specialized framework for RAG.
I think you are interested in building RAG in local apple silicon environment. But, I think it will be super cool that get data from rem and ingest it through RAGchain, and talks with LLM about my memories.
What do you think about this? Do you prefer "no internet connection" for this project?

@jasonjmcghee
Copy link
Owner Author

jasonjmcghee commented Dec 31, 2023

update (repo here: https://github.com/jasonjmcghee/ragpipe):

This script:

  • retreives the text from rem db
  • cleans up the text a bunch
  • embeds the selected text in 500 character chunks
  • builds an hnsw index
  • queries the hnsw index
  • executes ollama run openhermes2.5-mistral with a prompt and the text

$ ./askRem "Which GitHub issues have I read recently?" <(sqlite3 db 'select text from allText order by frameId desc limit 1000') 
Batches: 100%|███████████████████████████████| 19/19 [00:11<00:00,  1.65it/s]
You have recently read issues: #3 (dark mode icons), #9 (login item - Rem will run on boot), and #11 (icon looks kinda weird when active in dark mode).
total duration:       26.622822625s
load duration:        5.327591125s
prompt eval count:    1933 token(s)
prompt eval duration: 17.73078s
prompt eval rate:     109.02 tokens/s
eval count:           41 token(s)
eval duration:        3.554184s
eval rate:            11.54 tokens/s

@jasonjmcghee
Copy link
Owner Author

@vkehfdl1 - definitely want to make it easy to ingest from rem. You can query the sqlite file right now, which will give you the path to the ffmpeg file + frame offset too, so you can get the text and image.

I'd love to simplify this though / make it easy to just ask rem somehow / use it as a datasource

@vkehfdl1
Copy link

@jasonjmcghee Great! I'd love to make data loader from rem for RAGchain.
Use rem as a datasource. I'll let you know my progress.

@vkehfdl1
Copy link

@jasonjmcghee
I make loader for RAGchain and Langchain. (Compatible with Langchain)
It loads texts from sqlite3 file, and make it to Langchain Document schema.
You can see PR here.

Now, I'll try to make some kind of demo that using rem and RAGchain together.

@seletz
Copy link
Contributor

seletz commented Dec 31, 2023

@vkehfdl1 that looks very cool! Not knowing too much about RAGChain, how would the data extractor pipeline be run? Would it be beneficial if the extractor is pipeline is triggered by REM at some fixed intervals?

@vkehfdl1
Copy link

@vkehfdl1 that looks very cool! Not knowing too much about RAGChain, how would the data extractor pipeline be run? Would it be beneficial if the extractor is pipeline is triggered by REM at some fixed intervals?

@seletz I just made simple example running RAGchain and rem. (repo here: https://github.com/vkehfdl1/rem-RAGchain)
I think it will be super cool that I can trigger ingest pipeline when new rem record is added. From now, you can run ingest.py with crontab. It can run my ingest python script at every x minutes, then new record will automatically ingested, make new embeddings, and use it for talking with LLM!

@vkehfdl1
Copy link

@jasonjmcghee @seletz
Plus, here is sample image that I run RAGchain with rem.
I saw this issue tab with rem record was turned on 😁

Screenshot 2023-12-31 at 9 49 27 PM

@jasonjmcghee
Copy link
Owner Author

Cool!

However, answer quality is not good enough.

Did you try writing a custom prompt for the use-case?

@jasonjmcghee
Copy link
Owner Author

Would it be beneficial if the extractor is pipeline is triggered by REM at some fixed intervals?

Could be reading into this the wrong way, but I'd want to make sure it's a client-agnostic approach and ideally, rem isn't facilitating outside applications consuming it's data.

One of my concerns right now though is network access related stuff. Seems like the smart way (from an eng arch perspective) is to have an API for providing access to data and for talking to agents.

but that unlocks "network access" stuff in App Sandbox - which... idk I feel many folks would feel better with a "absolutely no network access" approach.

Maybe there could be 2 builds? One with network access entitlements and one without?

@seletz
Copy link
Contributor

seletz commented Dec 31, 2023

@jasonjmcghee @vkehfdl1 I think a "no network connection" policy is very cool. We could use triggers as mentioned in #14 for this. Maybe it would be OK for now to just call a user-provided script which gets the path to the SQLite DB as argument? The DB tables would be the API, then ...

@vkehfdl1
Copy link

vkehfdl1 commented Jan 1, 2024

@jasonjmcghee

Did you try writing a custom prompt for the use-case?

I will try your great prompt! Plus, I will try some experiments for improving answer quality.
First, it will be good we use hybrid retrieval, which means use vector DB and BM25 together. I think it might be common to search specific word for searching. Like human's name?
Second, I want to delete duplicated texts. rem captures screen often, so it has duplicated texts many times. So it needs to compress information somehow. I plan to try various strategies for this.
Third, use custom prompt.
Fourth, use multi-modal model. Maybe it will take some time to build....

@vkehfdl1
Copy link

vkehfdl1 commented Jan 1, 2024

@seletz It will be cool! I agree rem will be great to keep "no network connection" as default, and user can always access their data easily with hooks or trigger. It looks fastest way to build RAG with rem from now.
However, in the future, it will be cool that rem have their own RAG pipeline, totally local, use local embedding and LLM.

@vkehfdl1
Copy link

vkehfdl1 commented Jan 1, 2024

@jasonjmcghee
I try your custom prompt at here and the result is actually promising.
There are some examples I tried. (I record rem issue and repo pages)

Question : Where rem should index all data?
Answer : Rem should index all data in the "allText_content" table in the "main" database.

Question : What is the rem approach for building embedding search and RAG?
Answer : The rem approach for building embedding search and RAG involves indexing all text via an embedding store and using a SQLite extension like sqlite-vss.

But, I tried this like 2 minutes recording only. I'm recording few hours for real use-cases.

@vkehfdl1
Copy link

vkehfdl1 commented Jan 1, 2024

Update.
Now, ingest document without duplicated ones. I used token f1 score to calculate similarity.
And, I use hybrid retrieval and WeightedTimeReranker for latest information.
This is my PR here. Try it!

However, raw passage (OCR result) is pretty unprocessed, so LLM can't recognize and extract information easily.
It can be real challenge for high-quality embedding search and QA with rem.
There is no silver bullet from now. Hope OCR quality will increase or use multi-modal models. Some models that truly understand GUI.

@jasonjmcghee
Copy link
Owner Author

I think this looks super promising:

https://github.com/ashvardanian/SwiftSemanticSearch

@jasonjmcghee jasonjmcghee pinned this issue Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants