You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to train my model incrementally as I have over 100M vectors I want to reduce, I get this error when I try to update.
I understand the error has to do with the knn calculations and the logical statement inside init_update, but could not understand whether this is related to small batches or simply an error case that is not being handled. My batches are around 300K x 768.
Very simple code
for idx, dx in tqdm(enumerate(dxs), total=len(dxs), position=0, leave=True):
embeddings = np.load(dx)
if idx == 0:
umap_model.fit(embeddings)
else:
umap_model.update(embeddings)
Error
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[4], line 6
4 umap_model.fit(embeddings)
5 else:
----> 6 umap_model.update(embeddings)
File ~/.conda/envs/bertopic/lib/python3.10/site-packages/umap/umap_.py:3482, in UMAP.update(self, X, force_all_finite)
3478 init = np.zeros(
3479 (self._raw_data.shape[0], self.n_components), dtype=np.float32
3480 )
3481 init[:original_size] = self.embedding_
-> 3482 init_update(init, original_size, self._knn_indices)
3484 if self.n_epochs is None:
3485 n_epochs = 0
ZeroDivisionError: division by zero
The text was updated successfully, but these errors were encountered:
I was trying to train my model incrementally as I have over 100M vectors I want to reduce, I get this error when I try to update.
I understand the error has to do with the knn calculations and the logical statement inside init_update, but could not understand whether this is related to small batches or simply an error case that is not being handled. My batches are around 300K x 768.
Very simple code
Error
The text was updated successfully, but these errors were encountered: