New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any Advice on Avoiding 'NaN' errors #189
Comments
Hey there, could you provide me with a sample of your training data? |
I just got same error during HyperoptEstimator search, hope the following log helps 2|hpsklearnBlack | job exception: Input contains NaN.
98%|█████████▊| 48/49 [00:02<?, ?trial/s, best loss=?]
2|hpsklearnBlack | Traceback (most recent call last):
2|hpsklearnBlack | File "/home/ubuntu/python/painter/Test/playground_gbm.py", line 48, in <module>
2|hpsklearnBlack | find_best_model(X_train, y_train, X_test, y_test)
2|hpsklearnBlack | File "/home/ubuntu/python/painter/Test/playground_gbm.py", line 29, in find_best_model
2|hpsklearnBlack | estimator.fit(x, y)
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hpsklearn/estimator/estimator.py", line 464, in fit
2|hpsklearnBlack | fit_iter.send(increment)
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hpsklearn/estimator/estimator.py", line 339, in fit_iter
2|hpsklearnBlack | hyperopt.fmin(_fn_with_timeout,
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hyperopt/fmin.py", line 540, in fmin
2|hpsklearnBlack | return trials.fmin(
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hyperopt/base.py", line 671, in fmin
2|hpsklearnBlack | return fmin(
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hyperopt/fmin.py", line 586, in fmin
2|hpsklearnBlack | rval.exhaust()
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hyperopt/fmin.py", line 364, in exhaust
2|hpsklearnBlack | self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hyperopt/fmin.py", line 300, in run
2|hpsklearnBlack | self.serial_evaluate()
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hyperopt/fmin.py", line 178, in serial_evaluate
2|hpsklearnBlack | result = self.domain.evaluate(spec, ctrl)
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hyperopt/base.py", line 892, in evaluate
2|hpsklearnBlack | rval = self.fn(pyll_rval)
2|hpsklearnBlack | File "/home/ubuntu/python/painter/venv/lib/python3.10/site-packages/hpsklearn/estimator/estimator.py", line 311, in _fn_with_timeout
2|hpsklearnBlack | raise fn_rval[1]
2|hpsklearnBlack | ValueError: Input contains NaN. According the result from df.isnull().sum() there is no NaN in data, hence, the NaH error occurs during parameter injection. |
@RaistlinTAO Could I get a snippet of your code please? |
Of course, happy to help def find_best_model(x, y, test_x, test_y):
estimator = HyperoptEstimator(
regressor=gradient_boosting_regressor("T"),
algo=tpe.suggest,
max_evals=800,
trial_timeout=300)
estimator.fit(x, y)
print('HyperoptEstimator Score: ')
print(estimator.score(test_x, test_y))
print('Best Model: ')
print(estimator.best_model())
find_best_model(X_train, y_train, X_test, y_test) |
Update: With same code and same settings, it will bypass the error after give it another 3-5 tries. I think it related to hyperparameter combination. sometimes it just skip the wrong combination or preprocessing |
Thanks a lot @RaistlinTAO that is what I was thinking as well. I've noticed this behaviour before. I will work on a fix for this. But for the time being, retrying a few times should bypass the error. |
After several attempts on debug in local env I have located the root cause. In sklearn\ensemble_gb_losses.py def _update_terminal_region(...):
...
diff_minus_median = diff - median
...
# and
def update_terminal_regions(...):
raw_predictions[:, k] += learning_rate * tree.predict(X).ravel() SAMPLE OUTPUT E:\Projects\hyperopt-sklearn\venv\lib\site-packages\sklearn\ensemble\_gb_losses.py:231: RuntimeWarning: overflow encountered in square
* np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
E:\Projects\hyperopt-sklearn\venv\lib\site-packages\sklearn\ensemble\_gb_losses.py:231: RuntimeWarning: overflow encountered in square
* np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
E:\Projects\hyperopt-sklearn\venv\lib\site-packages\sklearn\ensemble\_gb_losses.py:288: RuntimeWarning: overflow encountered in multiply
raw_predictions[:, k] += learning_rate * tree.predict(X).ravel()
E:\Projects\hyperopt-sklearn\venv\lib\site-packages\sklearn\ensemble\_gb_losses.py:288: RuntimeWarning: invalid value encountered in add
raw_predictions[:, k] += learning_rate * tree.predict(X).ravel()
97%|█████████▋| 32/33 [00:06<?, ?trial/s, best loss=?]
job exception: Input contains NaN. There are two different outputs indicate that diff/median/learning_rate is NaN under certain circumstances. |
Hello,
Stack trace :
|
Well my suggestions are:
Pick any solution that fit your needs and have a good one |
Sometimes when I run this:
I get an error
ValueError: Input contains NaN.
during training. It doesn't happen every time and I know that the data has no nan's, infinites, or duplicates. This leads me to believe one of the operations is creating a NaN. Is there anyway to skip these operations or deduce what operation is causing this?The text was updated successfully, but these errors were encountered: