-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Metadata filtering on Pgvector #410
Conversation
Thanks! I'll let @jmartisk review this |
We will definitely need the PR rebased onto |
yes, will cleanup after review and comments. I know i should not sync with the github site, but too late |
…store Metadata: JSON, JSONB, COLUMNS - Add indexes on metadata - use Langchain4j metadata filter test - Change tests to @QuarkusTest. QuarkusUnitTest extension does not support ParametrizedTest with custom classes as argument.
I'll try to look into it asap, I'm not sure I'll have enough time before leaving until Tuesday. |
I really do like it, very nice job! But yeah a lot of this looks like it could be moved to upstream langchain4j, which currently doesn't support metadata filtering on pgvector. |
Yeah, would be awesome to have |
ok will move it to langchain4j, and adapt this one. |
Will be happy to review this PR and also the parts moving to @langchain4j |
last commit integrates : langchain4j/langchain4j#851 , this will no build because last langchain pgvector is not yet available. just for @jmartisk review. |
- Fix asciidoctor doc generation: interfaces that extends interfaces produce duplicated parameter in doc that make the build failing. - add integration test - Add buildStep to include dependencies in index - Add default no args constructor for cdi - !!! Need to do some tests in native mode.
Thanks, I'm going to look into it ASAP |
I can't get this to work somehow...
So I tried adding
in the application's properties, but then I get
Then I tried adding
but I am back at the first error |
To reproduce this problem, for example, take the (I built upstream langchain4j from your commit fcce3f1f26da1f1bba78b51267de6066a260ff1b and quarkus-langchain4j from 2dc8653 before trying this) |
interesting, does the rag-with-pgvector integration test work for you ? |
@jmartisk Are you using a pg container that has pgvector ? It should mention it in the log if not https://github.com/quarkiverse/quarkus-langchain4j/blob/main/pgvector/runtime/src/main/java/io/quarkiverse/langchain4j/pgvector/PgVectorEmbeddingStore.java#L114 |
@sebastienblanc should be automatic now with #399 |
Right, the integration test is failing too. |
Curious they are working for me. Could you show me the error ? |
When running the integration tests I get (this is the relevant part)
|
weird:
Perhaps link to the fact it re-use a previous pg16 container where connections have been set correctly in my side. Let me remove the container. By the way could you check my review question , I think it's link to the issue. |
…roduce bad behavior during first getConnection - add vector management in LangChain4jPgVectorBaseTest#clearstore in case it's the first call to the datasource. - add pgvector artifact classloader setting in runtime pom instead of application - rename PgVectorEmbeddingStore as QuarkusPgVectorEmbeddingStore
I've change the way we handle connection settings. some weird behavior at the first usage of the store in certain condition. |
…groalpoollistener - remove other settings. - getConnection return just the connection - Add an integretion test when embeddings table created by flyway.
@jmartisk , @sebastienblanc, should be ok. If you can give a try to validate your external project. I've added 2 integrations tests for rag integration. one with automatic table creation and another with flyway. For the connection settings I use a pool listener to configure pgvector settings. Then getConnection just give datasource connection. I've tested with internal test and external project. And with fresh pg containers. First time I have this behavior. |
Thanks, I will look into it today! |
.../src/main/java/io/quarkiverse/langchain4j/pgvector/runtime/PgVectorEmbeddingStoreConfig.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/quarkiverse/langchain4j/pgvector/runtime/PgVectorEmbeddingStoreConfig.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/quarkiverse/langchain4j/pgvector/runtime/PgVectorEmbeddingStoreConfig.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/quarkiverse/langchain4j/pgvector/runtime/PgVectorEmbeddingStoreConfig.java
Outdated
Show resolved
Hide resolved
Truly outstanding work, and it works for me now! I have just a few comments. On the upstream PR, I commented that we need to avoid using GSON directly (langchain4j/langchain4j#851 (comment)) - after fixing that, could you please add an exclusion for gson, just like we have it in the OpenAI module (https://github.com/quarkiverse/quarkus-langchain4j/blob/0.10.3/openai/openai-common/runtime/pom.xml#L42) - we have a dependency on |
Works for me! |
What's the status of this? |
@geoand I have to re-review it. How urgent is it? |
Not terrible urgent from point of view, but it does seem like a very contribution |
For this one i will create a new PR when langchain4j will be released with : langchain4j/langchain4j#851 |
… JSONB and Columns (#851) Implement metadata filtering for PgVectorEmbeddingStore with 3 implementations: COLUMN_PER_KEY, COMBINED_JSON and COMBINED_JSONB Related: quarkiverse/quarkus-langchain4j#410
Waiting for langchain4J 0.31.0 |
@humcqc will be released next week |
Too many conflicts. |
Not sure if this should be in quarkus-langchain4j or in langchain4j ?
PR open for discussion.