Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported #147

Open
acombos opened this issue Feb 11, 2020 · 2 comments

Comments

@acombos
Copy link

acombos commented Feb 11, 2020

I have X as a sparse matrix and y as a pandas Series.

I then proceed with the following code:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
estim = HyperoptEstimator(classifier=any_sparse_classifier('clf'),
preprocessing=[],
algo=tpe.suggest,
max_evals=100,
trial_timeout=120)
estim.fit(X_train, y_train)

I got the following error:
Scikit-learn - ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

After that, I updated all my conda packages, and re-installed hyperopt sklearn. Now I get the following error:
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

Note that same thing happens when using passive_aggressive as well. Also note that when I run sklearn's PassiveAgressiveClassifier with the same train data, it works fine.

I have tested both my sparse matrix as well as my target values (y) for NaN, infinity or too large numbers. No such entries exist.

It's interesting to note that running the following code:
estim.fit(X,y)
, which contains all the data, runs normally without any problems.
So I checked X_train and y_train for NaN, infinity or too large numbers (in case something is wrong with sklearn's train_test_split), but again, everything seems fine.

@acombos
Copy link
Author

acombos commented Feb 11, 2020

Tried now the same thing.
Converted the y_test to numpy with the following line:
y_train = y_train.to_numpy()

Everything seems to work fine now.
I am not familiar with your procedures, please advice if you want me to close the Issue.

@bjkomer
Copy link
Member

bjkomer commented Feb 12, 2020

Unfortunately this project wasn't originally built with pandas in mind and doesn't explicitly support it. Now that sklearn has more support for pandas it definitely would be useful to add it here as well. In the meantime I could add some type checks and do the conversion inside of fit. That should hopefully help with most cases.

Related to #122

Side note: I believe the reason why it only doesn't work after calling train_test_split is because the indices get split across the two objects and the normal numpy way of accessing elements no longer works on a pandas Series like this. The default behaviour of calling reindex fills the missing indices with NaN, which would explain that first error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants