You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
model %>% compile(
optimizer_sgd(lr = 0.1),
loss = "mean_squared_error",
metrics = "mae"
)
history <- model %>% fit(
x_train,
y_train
batch_size = 16,
validation_split = 0.2)
Train on 1640 samples, validate on 411 samples
Epoch 1/10
1640/1640 [==============================] - 1s 397us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Epoch 2/10
1640/1640 [==============================] - 1s 447us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Epoch 3/10
1640/1640 [==============================] - 0s 268us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Epoch 4/10
1640/1640 [==============================] - 1s 382us/sample - loss: nan - mae: nan - val_loss: nan - val_mae: nan
Reason
This is probably related to the range of the predicted value (sales price) which reaches ~ 755000. When the error gets squared, the sum explodes! That's why msle was recommended in the instructions and not mean_squared_error.
The text was updated successfully, but these errors were encountered:
OmaymaS
changed the title
train/valid loss = NAN when loss = "mean_squared_error"
train/valid loss = NAN when loss = "mean_squared_error" due to exploding values
Jan 28, 2020
Since you used MSE instead of MSLE, the target values have a large, unbounded range. Using neural nets on large unbounded regression problems can be prone to exploding gradients. Consequently, when we use MSLE we somewhat bound our target values since we log transform prior to computing the loss (which gets used for computing the gradient). This would be a good reason to scale the response if you preferred to use MSE.
The optimizer can help control this. For example, using RMSProp and Adam with MSE actually works fine so they seem to help control the gradient descent process. SGD on the other hand, has a harder time. However, when I could control the exploding gradients by using an extremely small learning rate (0.000001).
This issue is based on a question raised during the workshop.
Problem
loss values appear to be
nan
when the loss parameter is set to "mean_squared_error" as shown below.dl-keras-tf/materials/03-recipe/02-mini-project-ames.Rmd
Line 224 in 3dca65d
Reason
This is probably related to the range of the predicted value (sales price) which reaches ~ 755000. When the error gets squared, the sum explodes! That's why
msle
was recommended in the instructions and notmean_squared_error
.The text was updated successfully, but these errors were encountered: