Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: 'numpy.float64' object cannot be interpreted as an integer #343

Open
Gulfon opened this issue Jul 20, 2023 · 6 comments
Open

Comments

@Gulfon
Copy link

Gulfon commented Jul 20, 2023

Hi there,

When trying to run the example code I encounter the following:

from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
model = Top2Vec(documents=newsgroups.data, speed="learn", workers=8)
2023-07-20 13:51:37,083 - top2vec - INFO - Pre-processing documents for training
2023-07-20 13:51:48,891 - top2vec - INFO - Creating joint document/word embedding
2023-07-20 14:01:43,811 - top2vec - INFO - Creating lower dimension embedding of documents
2023-07-20 14:02:09,146 - top2vec - INFO - Finding dense areas of documents

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/top2vec/Top2Vec.py", line 666, in __init__
    self.compute_topics(umap_args=umap_args, hdbscan_args=hdbscan_args, topic_merge_delta=topic_merge_delta)
  File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/top2vec/Top2Vec.py", line 1266, in compute_topics
    cluster = hdbscan.HDBSCAN(**hdbscan_args).fit(umap_model.embedding_)
  File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 1205, in fit
    ) = hdbscan(clean_data, **kwargs)
  File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 884, in hdbscan
    _tree_to_labels(
  File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 80, in _tree_to_labels
    labels, probabilities, stabilities = get_clusters(
  File "hdbscan/_hdbscan_tree.pyx", line 659, in hdbscan._hdbscan_tree.get_clusters
  File "hdbscan/_hdbscan_tree.pyx", line 733, in hdbscan._hdbscan_tree.get_clusters

TypeError: 'numpy.float64' object cannot be interpreted as an integer

All of the libraries are updated to the latest versions, but I have tried downgrading lumpy and hdbscan with no result.

I am fairly new to Python and not sure if there's something I am doing wrong here. I did see some discussion of this error on the hdbscan issues page, but their solution there was to upgrade to the most recent version, which did not help in my case.

@BobTourne
Copy link

I am running into the same problem

@sieu-tran
Copy link

I have the same issue. All embedding models ran into this error. Using Python 3.10 right now!

@Gulfon
Copy link
Author

Gulfon commented Jul 31, 2023

So, I switched to a different method, but encountered the same error there. I am using python 3.11, so ymmw, but what helped me was installing older versions of a couple of libraries. Not sure if the second line is required for top2vec.

%pip install --user --no-warn-script-location --disable-pip-version-check Cython==0.29.34 numpy==1.23.5
%pip install --user --no-warn-script-location --disable-pip-version-check --no-build-isolation hdbscan==0.8.29

@sieu-tran
Copy link

Folks, I found the problem and a "fix"! Its actually gcc and hdbscan problem which seems to be a dependency for hdbscan. The fix for me is installing VC+++ 2022 and add the C++ Desktop Development package. pip install now works for hdbscan and enables top2vec to run properly. I hope this helps!

@jvanelteren
Copy link

For me this did not work. After uninstalling hdbscan and cloning + installing manually it did work. As per scikit-learn-contrib/hdbscan#607

@BobTourne
Copy link

It is indeed a problem with HDBSCAN, related to this issue.

Updating HDBSCAN to 0.8.33 worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants