Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisonError while running update with new data #1075

Open
mrp3anut opened this issue Nov 27, 2023 · 0 comments
Open

ZeroDivisonError while running update with new data #1075

mrp3anut opened this issue Nov 27, 2023 · 0 comments

Comments

@mrp3anut
Copy link

I was trying to train my model incrementally as I have over 100M vectors I want to reduce, I get this error when I try to update.
I understand the error has to do with the knn calculations and the logical statement inside init_update, but could not understand whether this is related to small batches or simply an error case that is not being handled. My batches are around 300K x 768.

Very simple code

for idx, dx in tqdm(enumerate(dxs), total=len(dxs), position=0, leave=True):
    embeddings = np.load(dx)
    if idx == 0:
        umap_model.fit(embeddings)
    else:
        umap_model.update(embeddings)

Error

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[4], line 6
      4     umap_model.fit(embeddings)
      5 else:
----> 6     umap_model.update(embeddings)

File ~/.conda/envs/bertopic/lib/python3.10/site-packages/umap/umap_.py:3482, in UMAP.update(self, X, force_all_finite)
   3478 init = np.zeros(
   3479     (self._raw_data.shape[0], self.n_components), dtype=np.float32
   3480 )
   3481 init[:original_size] = self.embedding_
-> 3482 init_update(init, original_size, self._knn_indices)
   3484 if self.n_epochs is None:
   3485     n_epochs = 0

ZeroDivisionError: division by zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant