Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Langchain & Deeplake: SelfQueryRetriever Error on querying code #2598

Open
1 task
kaan9700 opened this issue Sep 19, 2023 · 3 comments
Open
1 task
Assignees
Labels
bug Something isn't working

Comments

@kaan9700
Copy link

kaan9700 commented Sep 19, 2023

Severity

None

Current Behavior

I have a deeplake vector database with code chunks of a project. According to an issue I want to find the corresponding code chunks. For this I have written a SelfQueryRetriever.
But it throws an error exactly when I mention an expression like 'train.py script' in the query. If I leave this out, I get no error. The whole thing is supposed to work automatically for all possible issues, so it is not possible to simply say to keep such expressions out of the issues.

Steps to Reproduce

def CustomRetriever(files, dataset_path,issue):

    metadata_field_info = [
        AttributeInfo(
            name="source",
            description="The soruce file the chunk was extracted from",
            type="string",
        ),
        AttributeInfo(
            name="file_name",
            description="The name of the file the chunk was extracted from",
            type="string",
        ),
        AttributeInfo(
            name="chunk_id",
            description="the id of the chunk",
            type="string",
        ),
    ]
    document_content_description = "The sourcecode of a project"
    model = ChatOpenAI(model="gpt-4")

    embeddings = OpenAIEmbeddings(disallowed_special=())
    db = DeepLake(dataset_path=dataset_path, read_only=True, embedding=embeddings, exec_option='python')
    docs = (db.similarity_search(query=" ", k=10000000))
    retriever = SelfQueryRetriever.from_llm(
        model, db, document_content_description, metadata_field_info, verbose=True
    )
    try:
        # Ihr Code, der den Fehler verursacht
        print('TEST', retriever.get_relevant_documents(
            f"Which documents contain code to resolve the following issue? -> {issue}"))
    except ValueError as e:


        print(traceback.format_exc())

Here is the error:

query='CNN instead of BERT model in train.py script, handle data better, generated using Tensorflow, integrated into logic, adapted to word vectors, change code' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='source', value='train.py'), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='file_name', value='train.py')]) limit=None

Traceback (most recent call last):
  File "/Users/kaanerbay/GitHub/Github_Issue_Solver/langchainLogic/retriever2.py", line 93, in CustomRetriever
    print('TEST', retriever.get_relevant_documents(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/schema/retriever.py", line 208, in get_relevant_documents
    raise e
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/schema/retriever.py", line 201, in get_relevant_documents
    result = self._get_relevant_documents(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/retrievers/self_query/base.py", line 135, in _get_relevant_documents
    docs = self.vectorstore.search(new_query, self.search_type, **search_kwargs)
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/vectorstores/base.py", line 121, in search
    return self.similarity_search(query, **kwargs)
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/vectorstores/deeplake.py", line 475, in similarity_search
    return self._search(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/vectorstores/deeplake.py", line 348, in _search
    return self._search_tql(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/vectorstores/deeplake.py", line 267, in _search_tql
    result = self.vectorstore.search(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/deeplake/core/vectorstore/deeplake_vectorstore.py", line 429, in search
    utils.parse_search_args(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/deeplake/core/vectorstore/vector_search/utils.py", line 229, in parse_search_args
    raise ValueError(
ValueError: User-specified TQL queries are not support for exec_option=python.

Here is the used issue:

a CNN should be used instead of the BERT model in the train.py script, because it can handle the type of data better.
The CNN should not be too complex, but also not too simple and should be generated using Tensorflow.
The CNN should be integrated into the logic and adapted according to the word vectors used. Change the code of it, as good as you can.

Expected/Desired Behavior

If you replace the expression 'train.py scripts' with for example 'training process', the error disappears and the query is executed correctly

Python Version

3.10.13

OS

MacOS Ventura 13.5.2

IDE

PyCharm

Packages

langchain==0.0.293, lark==1.1.7, deeplake==3.6.26

Additional Context

No response

Possible Solution

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR (Thank you!)
@kaan9700 kaan9700 added the bug Something isn't working label Sep 19, 2023
@adolkhan
Copy link
Contributor

Hey @kaan9700!

Thanks for sharing the error? I am curious, have installed the latest deeplake version? Also have you installed deeplake[enterprise]. The problem is related to exec_option not being casted correctly, this either can be because you're using old deeplake version or haven't installed deeplake[enterprise].

To install deeplake[enterprise] please run the following command:

pip install 'deeplake[enterprise]'

@kaan9700
Copy link
Author

kaan9700 commented Sep 20, 2023

Hey @adolkhan
i have already had deeplake[enterprise] installed. Unfortunately this is not the solution to this error. And I have the latest deeplake version installed.

I noticed that this error occurs when the query contains scripts like 'train.py' or 'package.json' in combination with text.

@adolkhan
Copy link
Contributor

I see, thank you! will rerun the script and get back to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants