Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dist_metrics error with default settings #139

Open
architec997 opened this issue Nov 16, 2017 · 14 comments
Open

dist_metrics error with default settings #139

architec997 opened this issue Nov 16, 2017 · 14 comments

Comments

@architec997
Copy link

Producing a simple dataframe via

x = np.linspace(0,100,200)
y = np.arange(0,200)
xy, _ = np.meshgrid(x,y)
noise = 0.3*np.random.random((200,200))
series = np.sin(xy+5*noise) + noise
series [0,:] += 10*np.random.random(200)
data = pd.DataFrame(series)

I try to run HDBSCAN clustering with the default arguments

clusterer = hdbscan.HDBSCAN().fit(data)

And get the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-58-31e012db38f8> in <module>()
      1 from sklearn.cluster import DBSCAN
----> 2 clusterer = hdbscan.HDBSCAN().fit(data)

~/anaconda3/envs/py36/lib/python3.6/site-packages/hdbscan/hdbscan_.py in fit(self, X, y)
    814          self._condensed_tree,
    815          self._single_linkage_tree,
--> 816          self._min_spanning_tree) = hdbscan(X, **kwargs)
    817 
    818         if self.prediction_data:

~/anaconda3/envs/py36/lib/python3.6/site-packages/hdbscan/hdbscan_.py in hdbscan(X, min_cluster_size, min_samples, alpha, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs)
    534                     _hdbscan_prims_kdtree)(X, min_samples, alpha,
    535                                            metric, p, leaf_size,
--> 536                                            gen_min_span_tree, **kwargs)
    537             else:
    538                 (single_linkage_tree, result_min_span_tree) = memory.cache(

~/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/externals/joblib/memory.py in __call__(self, *args, **kwargs)
    360 
    361     def __call__(self, *args, **kwargs):
--> 362         return self.func(*args, **kwargs)
    363 
    364     def call_and_shelve(self, *args, **kwargs):

~/anaconda3/envs/py36/lib/python3.6/site-packages/hdbscan/hdbscan_.py in _hdbscan_prims_kdtree(X, min_samples, alpha, metric, p, leaf_size, gen_min_span_tree, **kwargs)
    168 
    169     # TO DO: Deal with p for minkowski appropriately
--> 170     dist_metric = DistanceMetric.get_metric(metric, **kwargs)
    171 
    172     # Get distance to kth nearest neighbour

TypeError: descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'

I tried explicitly specifying other metrics with metric = 'manhattan' etc argument, did not help

@lmcinnes
Copy link
Collaborator

I suspect this is an arg order issue in the code somewhere, possibly due to additions. This is a little disconcerting. Let me see if I can track this down later today.

@lmcinnes
Copy link
Collaborator

Sorry, I ran out of time today. I'll have to try and get to this a little later. My apologies for the delay.

@farfan92
Copy link

Also getting "TypeError: descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'', even when just using the simple case in the documentation.

@farfan92
Copy link

Error occurs with RobustSingleLinkage as well.

When trying to avoid the get_metric method receiving the string 'euclidean' or 'manhattan' etc. instead of the expected object, I used a precomputed distance matrix. Now getting:

`clusterer = hdbscan.HDBSCAN(min_cluster_size=5, min_samples=None, metric='precomputed').fit(gower_df)

NameError Traceback (most recent call last)
in ()
----> 1 clusterer = hdbscan.HDBSCAN(min_cluster_size=5, min_samples=None, metric='precomputed').fit(D)

C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py in fit(self, X, y)
814 self._condensed_tree,
815 self._single_linkage_tree,
--> 816 self._min_spanning_tree) = hdbscan(X, **kwargs)
817
818 if self.prediction_data:

C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py in hdbscan(X, min_cluster_size, min_samples, alpha, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs)
526 _hdbscan_generic)(X, min_samples,
527 alpha, metric, p, leaf_size,
--> 528 gen_min_span_tree, **kwargs)
529 elif metric in KDTree.valid_metrics:
530 # TO DO: Need heuristic to decide when to go to boruvka;

C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\memory.py in call(self, *args, **kwargs)
281 return _load_output(self._output_dir, _get_func_fullname(self.func),
282 timestamp=self.timestamp,
--> 283 metadata=self.metadata, mmap_mode=self.mmap_mode,
284 verbose=self.verbose)
285

C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py in hdbscan_generic(X, min_samples, alpha, metric, p, leaf_size, gen_min_span_tree, **kwargs)
85 min_samples, alpha)
86
---> 87 min_spanning_tree = mst_linkage_core(mutual_reachability
)
88
89 # mst_linkage_core does not generate a full minimal spanning tree

hdbscan/_hdbscan_linkage.pyx in hdbscan._hdbscan_linkage.mst_linkage_core (hdbscan_hdbscan_linkage.c:2894)()

hdbscan/_hdbscan_linkage.pyx in hdbscan._hdbscan_linkage.mst_linkage_core (hdbscan_hdbscan_linkage.c:2281)()

NameError: name 'np' is not defined
`

@lmcinnes
Copy link
Collaborator

Sorry, I'm having trouble reproducing this. Can you tell me a little more about your setup?

@architec997
Copy link
Author

I also checked - I have the same error using a precomputed distance matrix as farfan92.

Ubuntu 17.10, Anaconda 5.0.1, Python 3.6. The versions of packages installed in my used venv are:
packages in environment at /home/vladimir/anaconda3/envs/py36:

  • asn1crypto 0.22.0 py36h265ca7c_1
  • bleach 2.0.0 py36h688b259_0
  • ca-certificates 2017.08.26 h1d4fec5_0
  • certifi 2017.7.27.1 py36h8b7b77e_0
  • cffi 1.10.0 py36had8d393_1
  • chardet 3.0.4 py36h0f667ec_1
  • chardet 3.0.4
  • click 6.7
  • clickclick 1.2.2
  • connexion 1.1.16
  • cryptography 2.0.3 py36ha225213_1
  • cycler 0.10.0 py36h93f1223_0
  • cython 0.26.1 py36h21c49d0_0
  • dbus 1.10.22 h3b5a359_0
  • decorator 4.1.2 py36hd076ac8_0
  • entrypoints 0.2.3 py36h1aec115_2
  • expat 2.2.4 h6ea4f2b_2
  • fastdtw 0.3.2
  • Flask 0.12.2
  • fontconfig 2.12.4 h88586e7_1
  • freetype 2.8 h52ed37b_0
  • glib 2.53.6 h5d9569c_2
  • gmp 6.1.2 hb3b607b_0
  • gst-plugins-base 1.12.2 he3457e5_0
  • gstreamer 1.12.2 h4f93127_0
  • h5py 2.7.0 py36he81ebca_1
  • hdbscan 0.8.11 py36_0 conda-forge
  • hdf5 1.10.1 hb0523eb_0
  • html5lib 0.999999999 py36h2cfc398_0
  • icu 58.2 h211956c_0
  • idna 2.6 py36h82fb2a8_1
  • idna 2.6
  • inflection 0.3.1
  • intel-openmp 2018.0.0 h15fc484_7
  • ipykernel 4.6.1 py36hbf841aa_0
  • ipython 6.1.0 py36hc72a948_1
  • ipython_genutils 0.2.0 py36hb52b0d5_0
  • ipywidgets 7.0.0 py36h7b55c3a_0
  • itsdangerous 0.24
  • jedi 0.10.2 py36h552def0_0
  • jinja2 2.9.6 py36h489bce4_1
  • jpeg 9b h024ee3a_2
  • jsonschema 2.6.0 py36h006f8b5_0
  • jupyter 1.0.0 py36h9896ce5_0
  • jupyter_client 5.1.0 py36h614e9ea_0
  • jupyter_console 5.2.0 py36he59e554_1
  • jupyter_core 4.3.0 py36h357a921_0
  • keras 2.0.8 py36hc0b6f7c_0
  • libedit 3.1 heed3624_0
  • libffi 3.2.1 hd88cf55_4
  • libgcc 7.2.0 h69d50b8_2
  • libgcc-ng 7.2.0 h7cc24e2_2
  • libgfortran 3.0.0 1
  • libgfortran-ng 7.2.0 h9f7466a_2
  • libgpuarray 0.6.9 0
  • libiconv 1.15 h63c8f33_5
  • libpng 1.6.32 hda9c8bc_2
  • libprotobuf 3.4.0 0
  • libsodium 1.0.13 h31c71d8_2
  • libstdcxx-ng 7.2.0 h7a57d05_2
  • libxcb 1.12 h84ff03f_3
  • libxml2 2.9.4 h6b072ca_5
  • mako 1.0.7 py36h0727276_0
  • markupsafe 1.0 py36hd9260cd_1
  • matplotlib 2.1.0 py36hba5de38_0
  • mistune 0.8.1 py36h3d5977c_0
  • mkl 2018.0.0 hb491cac_4
  • mkl-service 1.1.2 py36h17a0993_4
  • nbconvert 5.3.1 py36hb41ffb7_0
  • nbformat 4.4.0 py36h31c9010_0
  • ncurses 6.0 h9df7e31_2
  • networkx 2.0 py36h7e96fb8_0
  • nose 1.3.7 py36hcdf7029_2
  • notebook 5.2.1 py36h690a4eb_0
  • numpy 1.13.3
  • numpy 1.12.1 py36he24570b_1
  • openssl 1.0.2m h8cfc7e7_0
  • pandas 0.21.0 py36h78bd809_1
  • pandoc 1.19.2.1 hea2e7c5_1
  • pandocfilters 1.4.2 py36ha6701b7_1
  • path.py 10.3.1 py36he0c6f6d_0
  • pathlib 1.0.1
  • patsy 0.4.1 py36ha3be15e_0
  • pcre 8.41 hc71a17e_0
  • pexpect 4.2.1 py36h3b9d41b_0
  • pickleshare 0.7.4 py36h63277f8_0
  • pip 9.0.1 py36h6c6f9ce_4
  • plotly 2.1.0 py36h56a57e5_0
  • prompt_toolkit 1.0.15 py36h17d85b1_0
  • protobuf 3.4.0 py36_0
  • ptyprocess 0.5.2 py36h69acd42_0
  • pycparser 2.18 py36hf9f622e_1
  • pygments 2.2.0 py36h0d3125c_0
  • pygpu 0.6.9 py36_0
  • pyopenssl 17.2.0 py36h5cc804b_0
  • pyparsing 2.2.0 py36hee85983_1
  • pyqt 5.6.0 py36h0386399_5
  • pysocks 1.6.7 py36hd97a5b1_1
  • python 3.6.3 h1284df2_4
  • python-dateutil 2.6.1 py36h88d3b88_1
  • pytz 2017.2 py36hc2ccc2a_1
  • pyyaml 3.12 py36hafb9ca4_1
  • pyzmq 16.0.2 py36h3b0cf96_2
  • qt 5.6.2 h974d657_12
  • qtconsole 4.3.1 py36h8f73b5b_0
  • readline 7.0 ha6073c6_4
  • requests 2.18.4 py36he2e5f8d_1
  • requests 2.18.4
  • scikit-learn 0.19.1 py36h7aa7ec6_0
  • scipy 1.0.0 py36hbf646e7_0
  • scipy 1.0.0
  • seaborn 0.8.0 py36h197244f_0
  • seasonal 0.3.1
  • setuptools 36.5.0 py36he42e2e1_0
  • simplegeneric 0.8.1 py36h2cb9092_0
  • sip 4.18.1 py36h51ed4ed_2
  • six 1.11.0 py36h372c433_1
  • sqlite 3.20.1 hb898158_2
  • statsmodels 0.8.0 py36h8533d0b_0
  • swagger-spec-validator 2.1.0
  • tensorflow 1.1.0 np112py36_0
  • terminado 0.6 py36ha25a19f_0
  • testpath 0.3.1 py36h8cadb63_0
  • theano 0.9.0 py36_0
  • tk 8.6.7 hc745277_3
  • tornado 4.5.2 py36h1283b2a_0
  • traitlets 4.3.2 py36h674d592_0
  • tslearn 0.1.7.2
  • typing 3.6.2
  • urllib3 1.22
  • urllib3 1.22 py36hbe7ace6_0
  • wcwidth 0.1.7 py36hdf4376a_0
  • webencodings 0.5.1 py36h800622e_1
  • werkzeug 0.12.2 py36hc703753_0
  • wheel 0.29.0 py36he7f4e38_1
  • widgetsnbextension 3.0.2 py36hd01bb71_1
  • xz 5.2.3 h55aa19d_2
  • yaml 0.1.7 h014fa73_2
  • zeromq 4.2.2 hbedb6e5_2
  • zlib 1.2.11 ha838bed_2

@farfan92
Copy link

farfan92 commented Nov 22, 2017

Updating packages seems to have removed the NameError. (numpy and sklearn specifically).
Must have been a compatibility issue, after installing some other packages.

@lmcinnes
Copy link
Collaborator

I'm glad at least one of you got this resolved. Hopefully refreshing/updating packages might work twice? I am honestly at a little bit of a loss here.

@danielhelf
Copy link

Getting the exact same error message here (descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str') despite updating the packages.

@Vanwalleghem
Copy link

Got the same error message on an Ubuntu virtual machine with python 2.7 and a windows PC with python 3.6.4, both running the latest version of anaconda and having installed HDBscan through conda-forge. I may try to install it another way tomorrow

@Vanwalleghem
Copy link

Alright, I actually had some time so I tested that. On the same machine, the pip install hdbscan worked immediately (after I removed the conda-forge version). Hope it helps you narrow it down and/or to fix it for others

@linwoodc3
Copy link

I also had this error, but it was only present in the conda-forge installed version of hdbscan. pip install version of hdbscan. I removed the conda-forge version, ran pip install hdbsan for my conda environment, and hdbscan works find.

@lmcinnes
Copy link
Collaborator

lmcinnes commented Sep 1, 2018

@linwoodc3 That's a little weird; the conda-forge version gets synced with the pip version regularly. Perhaps a conda upgrade umap-learn would have doen the job? Regardless, you have a working version now, and that's what counts. Thanks for the report, I'll keep an eye out for something amiss like this somewhere along the line.

@kevinafra
Copy link

I just got this same error (descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'). I installed hdbscan just yesterday via pip. I did notice that when I tried to import it, it gave me an error about 'numpy.core.multiarray failed to import' but no reason why. So I imported numpy.core.multiarray manually, and then I was able to import hdbscan. Don't know whether that is a related problem. But attempting to fit some data that I had just fit with sklearn.cluster.DBSCAN failed with the above error when I tried to do it with hdbscan. I have python 2.7.13 and numpy 1.11.2. 'pip check' doesn't find any broken dependencies. What else can I try? I would really like to use hdbscan, as I have data whose clusters are certain to have variable density. Does hdbscan require python 3.x perhaps, along with all of the dependent versions of numpy, Cython, etc.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants