New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom loss defined as a class instance vs function #19601
Comments
I can make it work with a basic class instance without subclassing class QuantileLoss:
def __init__(self, quantile: float = 0.5):
self.quantile = quantile
def __call__(self, y_true, y_pred):
error = y_pred - y_true
loss = ops.maximum((self.quantile * error), (self.quantile - 1) * error)
return ops.mean(loss)
model.compile(loss=QuantileLoss(quantile=0.5)) Is it the way to go since |
The code looks fine, what is the error you encounter? |
My code is run by a jupyterlab server (using the lastest official docker images The crash is caused by the According to this link, the root cause could be a buggy installation of tensorflow/pytorch due to mixing |
I reproduced the bug with the latest Run the official image: docker run -it --rm tensorflow/tensorflow bash Install apt-get update && apt-get install vim
pip install pandas
vim test.py # and then copy and save the code below
python test.py Python code: import numpy as np
import pandas as pd
from keras.layers import Dense, Input
from keras.models import Model
from keras.losses import Loss
from keras import ops
class QuantileLoss(Loss):
def __init__(
self,
name: str = "quantile",
quantile: float = 0.5,
reduction="sum_over_batch_size",
) -> None:
super().__init__(name=name, reduction=reduction)
self.quantile = quantile
def call(self, y_true, y_pred):
error = y_pred - y_true
loss = ops.maximum((self.quantile * error), (self.quantile - 1) * error)
return ops.mean(loss)
X = np.random.random((100000, 100))
y = pd.Series(np.random.random((100000,)))
features = Input(shape=(X.shape[1],))
layers = Dense(200, activation="relu")(features)
labels = Dense(1, activation=None)(layers)
model = Model(features, labels)
model.compile(optimizer="adam", loss=QuantileLoss(quantile=0.5))
model.fit(
X,
y.to_numpy(), # Working well with just `y`
verbose=True,
epochs=50,
batch_size=10000,
) Training time and memory usage is very different depending on the type of the
My code runs on CPU (i7-9750H) / Ubuntu 23.10 / Docker 24.0.5 with Keras 3.0.5 and Tensorflow 2.16.1 |
I cannot reproduce with image Cross reference: Code run on CPU (Intel(R) Xeon(R) Silver 4210R) / GPU (Quadro RTX 4000, Compute Capability 7.5) / Ubuntu 22.04 (Container) with Keras 3.3.3, Numpy 1.26.4 and Tensorflow 2.16.1. |
This strange behavior may be CPU-specific. Could you reproduce the bug using only the CPU without CUDA? |
No. I cannot reproduce with image Cross reference: Code run on CPU (Intel(R) Xeon(R) Silver 4210R) / Ubuntu 22.04 (Container) with Keras 3.3.3, Numpy 1.26.4 and Tensorflow 2.16.1. |
Thanks @benz0li for testing it! @sachinprasadhs : Now that we know that this issue is not reproductible easily, is there something else I should look at and/or test to better diagnose the issue? |
P.S.: On my machine, I cannot reproduce the bug with the latest |
Yes: Output of |
I confirm that my issue happens on CPU with latest versions of Tensorflow 2.16.1 / Keras 3.3.3 / Numpy 1.26.4 / Pandas 2.2.2. It only happens when using my CPU (it is working well on my GPU with Running root@0a0414c2c84b:/# python test.py
2024-05-07 22:02:20.541375: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/50
2024-05-07 22:02:23.140812: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.300856: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.451243: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.637732: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.816659: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
10/10 ━━━━━━━━━━━━━━━━━━━━ 8s 663ms/step - loss: 0.1569
Epoch 2/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 645ms/step - loss: 0.1397
Epoch 3/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 657ms/step - loss: 0.1344
Epoch 4/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 663ms/step - loss: 0.1313
Epoch 5/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 658ms/step - loss: 0.1293
[...] Running root@0a0414c2c84b:/# python test.py
2024-05-07 22:04:24.869910: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1597
Epoch 2/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1419
Epoch 3/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1360
Epoch 4/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.1326
Epoch 5/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1298
[...] My docker version is 24.0.5. Haven't tested it with latest version of Docker but I could try it next week if necessary. |
When migrating my
keras 2
custom loss tokeras 3
, I noticed a weird behavior inkeras 3
. My class-defined loss crashes my jupyter kernel while my function-defined loss is working well. I don't understand what I am doing wrong when subclassingkeras.losses.Loss
?This is not working:
This is working:
Thanks in advance
The text was updated successfully, but these errors were encountered: