Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Elastic-Search Window too large, Max-Size = 10000 #140

Open
zuliani99 opened this issue Apr 18, 2023 · 0 comments
Open

AssertionError: Elastic-Search Window too large, Max-Size = 10000 #140

zuliani99 opened this issue Apr 18, 2023 · 0 comments

Comments

@zuliani99
Copy link

Using BM25, for sparse embedding in a pretty big datasets (eg. FiQA), I get the following assertion error:
AssertionError: Elastic-Search Window too large, Max-Size = 10000

The function that call BM25 is the next one:

def sparse_embeddings_bm25(dataset_name, corpus, queries, qrels, k_primes):
  '''
  PURPOSE: compute the sparse embedding using the BM25 implementation from beir and elastichsearch
  ARGUMENTS:
    - dataset_name: string describing the dataset name
    - corpus: sequence of documents 
    - queries: sequence of queries
    - qrels: ground truth of query document relevance
    - k_primes: list of number of top k prime documents to return
  RETURN: see embeddings return values
  '''
  hostname = 'localhost' 
  index_name = dataset_name
  initialize = True # Delete existing index with same name and reindex all documents

  print(f'{dataset_name} - BM25')
  model = BM25(index_name=index_name, hostname=hostname, initialize=initialize) # Defining the BM25
  return embeddings('Sparse', model, corpus, queries, qrels, k_primes)

I've already tryed to create the index before running BM25 and set initialize = False, but doing so I need somewhat to pass to the index the corpus and the queries.

Note that I'm running all the application in Google Colab Pro, I don't know if this is important or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant