Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: pgvector query returning byts instead of string #2667

Open
capella-ben opened this issue May 12, 2024 · 1 comment
Open

[Issue]: pgvector query returning byts instead of string #2667

capella-ben opened this issue May 12, 2024 · 1 comment

Comments

@capella-ben
Copy link

Describe the issue

When running the pgvector example I get the following error:

m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Trying to create collection.
2024-05-12 19:51:30,510 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Use the existing collection `flaml_collection`.
File M:\OneDrive\Documents\dev\DISCO\pgVector\..\website\docs does not exist. Skipping.
2024-05-12 19:51:30,974 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 2 chunks.
2024-05-12 19:51:30,975 - autogen.agentchat.contrib.vectordb.pgvectordb - INFO - Error executing select on non-existent table: flaml_collection. Creating it instead. Error: relation "flaml_collection" does not exist 
LINE 1: SELECT id, metadatas, documents, embedding FROM flaml_collec...
                                                        ^
2024-05-12 19:51:31,007 - autogen.agentchat.contrib.vectordb.pgvectordb - INFO - Created table flaml_collection
VectorDB returns doc_ids:  [[b'bdfbc921', b'7968cf3c']]
Traceback (most recent call last):
  File "m:\OneDrive\Documents\dev\DISCO\pgVector\autogen_pgvector_1.py", line 84, in <module>
    chat_result = ragproxyagent.initiate_chat(
  File "m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\autogen\agentchat\conversable_agent.py", line 1004, in initiate_chat
    msg2send = message(_chat_info["sender"], _chat_info["recipient"], kwargs)
  File "m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\autogen\agentchat\contrib\retrieve_user_proxy_agent.py", line 631, in message_generator
    doc_contents = sender._get_context(sender._results)
  File "m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\autogen\agentchat\contrib\retrieve_user_proxy_agent.py", line 426, in _get_context
    _doc_tokens = self.custom_token_count_function(doc["content"], self._model)
  File "m:\OneDrive\Documents\dev\DISCO\pgVector\.venv\lib\site-packages\autogen\token_count_utils.py", line 69, in count_token
    raise ValueError(f"input must be str, list or dict, but we got {type(input)}")
ValueError: input must be str, list or dict, but we got <class 'bytes'>    

After some investigation psycopg (3.1.19) always returns the id and descriptions fields as bytes. pgvectordb.py is expecting strings.
It is unclear why these 2 fields are always returned as bytes. Other fields in other tables on my postgres server do return strings as expected.

The only workaround I can find is to decode in pgvectordb.py:

Update pgvectordb py

Is this an issue others are facing?

Steps to reproduce

Screenshots and logs

image

Additional Information

pyautogen-0.2.27

@ErikQQY
Copy link

ErikQQY commented May 26, 2024

Same issue here, stack trace:

    chat_result = ragproxyagent.initiate_chat(
  File "/root/anaconda3/envs/qqy/lib/python3.10/site-packages/autogen/agentchat/conversable_agent.py", line 988, in initiate_chat
    msg2send = message(_chat_info["sender"], _chat_info["recipient"], kwargs)
  File "/root/anaconda3/envs/qqy/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py", line 631, in message_generator
    doc_contents = sender._get_context(sender._results)
  File "/root/anaconda3/envs/qqy/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py", line 426, in _get_context
    _doc_tokens = self.custom_token_count_function(doc["content"], self._model)
  File "/root/anaconda3/envs/qqy/lib/python3.10/site-packages/autogen/token_count_utils.py", line 65, in count_token
    return _num_token_from_text(input, model=model)
  File "/root/anaconda3/envs/qqy/lib/python3.10/site-packages/autogen/token_count_utils.py", line 75, in _num_token_from_text
    encoding = tiktoken.encoding_for_model(model)
  File "/root/anaconda3/envs/qqy/lib/python3.10/site-packages/tiktoken/model.py", line 101, in encoding_for_model
    return get_encoding(encoding_name_for_model(model_name))
  File "/root/anaconda3/envs/qqy/lib/python3.10/site-packages/tiktoken/model.py", line 77, in encoding_name_for_model
    if model_name in MODEL_TO_ENCODING:
TypeError: unhashable type: 'dict'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants