Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train/valid loss = NAN when loss = "mean_squared_error" due to exploding values #9

Open
OmaymaS opened this issue Jan 28, 2020 · 2 comments

Comments

@OmaymaS
Copy link
Contributor

OmaymaS commented Jan 28, 2020

This issue is based on a question raised during the workshop.

Problem

loss values appear to be nan when the loss parameter is set to "mean_squared_error" as shown below.

model %>% compile(
    optimizer_sgd(lr = 0.1),
    loss = "mean_squared_error",
    metrics = "mae"
  )

history <- model %>% fit(
  x_train,
  y_train
  batch_size = 16,
  validation_split = 0.2)
Train on 1640 samples, validate on 411 samples
Epoch 1/10
1640/1640 [==============================] - 1s 397us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Epoch 2/10
1640/1640 [==============================] - 1s 447us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Epoch 3/10
1640/1640 [==============================] - 0s 268us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Epoch 4/10
1640/1640 [==============================] - 1s 382us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan

Reason

This is probably related to the range of the predicted value (sales price) which reaches ~ 755000. When the error gets squared, the sum explodes! That's why msle was recommended in the instructions and not mean_squared_error.

@OmaymaS OmaymaS changed the title train/valid loss = NAN when loss = "mean_squared_error" train/valid loss = NAN when loss = "mean_squared_error" due to exploding values Jan 28, 2020
@bradleyboehmke
Copy link
Collaborator

Yeah, this seems to be an issue with regression models. See this Stackoverflow discussion: https://stackoverflow.com/questions/37232782/nan-loss-when-training-regression-network.

I think there are a few things to consider here:

  1. Since you used MSE instead of MSLE, the target values have a large, unbounded range. Using neural nets on large unbounded regression problems can be prone to exploding gradients. Consequently, when we use MSLE we somewhat bound our target values since we log transform prior to computing the loss (which gets used for computing the gradient). This would be a good reason to scale the response if you preferred to use MSE.

  2. The optimizer can help control this. For example, using RMSProp and Adam with MSE actually works fine so they seem to help control the gradient descent process. SGD on the other hand, has a harder time. However, when I could control the exploding gradients by using an extremely small learning rate (0.000001).

@dougmet
Copy link

dougmet commented Jan 28, 2020

One group changed the units to millions $ (/1e6) which kept the context but had the benefit of scaling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants