You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using BM25, for sparse embedding in a pretty big datasets (eg. FiQA), I get the following assertion error: AssertionError: Elastic-Search Window too large, Max-Size = 10000
The function that call BM25 is the next one:
def sparse_embeddings_bm25(dataset_name, corpus, queries, qrels, k_primes):
'''
PURPOSE: compute the sparse embedding using the BM25 implementation from beir and elastichsearch
ARGUMENTS:
- dataset_name: string describing the dataset name
- corpus: sequence of documents
- queries: sequence of queries
- qrels: ground truth of query document relevance
- k_primes: list of number of top k prime documents to return
RETURN: see embeddings return values
'''
hostname = 'localhost'
index_name = dataset_name
initialize = True # Delete existing index with same name and reindex all documents
print(f'{dataset_name} - BM25')
model = BM25(index_name=index_name, hostname=hostname, initialize=initialize) # Defining the BM25
return embeddings('Sparse', model, corpus, queries, qrels, k_primes)
I've already tryed to create the index before running BM25 and set initialize = False, but doing so I need somewhat to pass to the index the corpus and the queries.
Note that I'm running all the application in Google Colab Pro, I don't know if this is important or not.
The text was updated successfully, but these errors were encountered:
Using BM25, for sparse embedding in a pretty big datasets (eg. FiQA), I get the following assertion error:
AssertionError: Elastic-Search Window too large, Max-Size = 10000
The function that call BM25 is the next one:
I've already tryed to create the index before running BM25 and set initialize = False, but doing so I need somewhat to pass to the index the corpus and the queries.
Note that I'm running all the application in Google Colab Pro, I don't know if this is important or not.
The text was updated successfully, but these errors were encountered: