Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate_sbert_multi_gpu - metrics.compute() unable to read cache file #134

Open
ashokrajab opened this issue Mar 20, 2023 · 1 comment · May be fixed by #155
Open

evaluate_sbert_multi_gpu - metrics.compute() unable to read cache file #134

ashokrajab opened this issue Mar 20, 2023 · 1 comment · May be fixed by #155

Comments

@ashokrajab
Copy link

ashokrajab commented Mar 20, 2023

I'm trying to run beir/examples/retrieval/evaluation/dense/evaluate_sbert_multi_gpu.py. Doing do I end up with the below error.

Traceback (most recent call last):
File "evaluate_sbert_multi_gpu.py", line 62, in
results = retriever.retrieve(corpus, queries)
File "/data/user/beir/beir/retrieval/evaluation.py", line 23, in retrieve
return self.retriever.search(corpus, queries, self.top_k, self.score_function, **kwargs)
File "/data/user/beir/beir/retrieval/search/dense/exact_search_multi_gpu.py", line 150, in search
cos_scores_top_k_values, cos_scores_top_k_idx, chunk_ids = metric.compute()
File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/evaluate/module.py", line 433, in compute
self._finalize()
File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/evaluate/module.py", line 390, in _finalize
self.data = Dataset(**reader.read_files([{"filename": f} for f in file_paths]))
File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/arrow_reader.py", line 260, in read_files
pa_table = self._read_files(files, in_memory=in_memory)
File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/arrow_reader.py", line 195, in _read_files
pa_table: Table = self._get_table_from_filename(f_dict, in_memory=in_memory)
File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/arrow_reader.py", line 331, in _get_table_from_filename
table = ArrowReader.read_table(filename, in_memory=in_memory)
File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/arrow_reader.py", line 352, in read_table
return table_cls.from_file(filename)
File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/table.py", line 1065, in from_file
table = _memory_mapped_arrow_table_from_file(filename)
File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/table.py", line 52, in _memory_mapped_arrow_table_from_file
pa_table = opened_stream.read_all()
File "pyarrow/ipc.pxi", line 750, in pyarrow.lib.RecordBatchReader.read_all
File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Expected to be able to read 80088040 bytes for message body, got 80088032

--
command used: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python evaluate_sbert_multi_gpu.py

@thakur-nandan Any idea how to proceed?

@thakur-nandan
Copy link
Member

The reason for this error is insufficient host memory (CPU ram). I would suggest evaluating on a larger GPU cluster or try reducing the batch size.

@NouamaneTazi NouamaneTazi linked a pull request Aug 15, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants