Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot restore large index #66

Open
NicolasAlmerge opened this issue Dec 13, 2023 · 3 comments
Open

Cannot restore large index #66

NicolasAlmerge opened this issue Dec 13, 2023 · 3 comments

Comments

@NicolasAlmerge
Copy link

Hello,

I am using Python 3.10.9 and vectordb==0.0.20 (latest as of this date), and I have a trouble when restoring saved data.

I have two large files A and B, and when I index them, snapshot them and restore them separately, everything works fine.

When I read and parse files A and B, index all the documents in both, then save them together, the snapshotting is successful. However, when trying to restore the data, I get the following error:

Traceback (most recent call last):
  ...
  File "~/.local/lib/python3.10/site-packages/vectordb/db/executors/inmemory_exact_indexer.py", line 86, in restore
    self._indexer = InMemoryExactNNIndex[self._input_schema](index_file_path=snapshot_file)
  File "~/.local/lib/python3.10/site-packages/docarray/index/backends/in_memory.py", line 68, in __init__
    self._docs = DocList.__class_getitem__(
  File "~/.local/lib/python3.10/site-packages/docarray/array/doc_list/io.py", line 810, in load_binary
    return cls._load_binary_all(
  File "~ /.local/lib/python3.10/site-packages/docarray/array/doc_list/io.py", line 608, in _load_binary_all
    proto.ParseFromString(d)
google.protobuf.message.DecodeError: Error parsing message

Given previous tests I made and explanation, I suspect the issue is that the index is too large, hence raising the error. Does anyone know what can be done to fix this issue?

@JoanFM
Copy link
Member

JoanFM commented Dec 14, 2023

hey @NicolasAlmerge , does it happen everytime u try to do so?

@NicolasAlmerge
Copy link
Author

Hey @JoanFM, yes it does. Small / medium files work well separately, but when trying to restore from a large file (~3GB, combining multiple small files), it crashes with this error. Maybe it is because of an array index out of bounds due to a 32-bit integer overflow, since the max 32-bit int is around 2.7B, but I do not know.

@JoanFM
Copy link
Member

JoanFM commented Dec 14, 2023

can u provide some dummy example for me to reproduce? And what code exactly u are runnijg? together with jina and docarray versions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants