Model fails to train with Linux and Keras 3.3.2 #19623

jonbry · 2024-04-26T15:33:24Z

The following code from Deep Learning with Python, Second Edition fails to train when using Keras 3.3.2 and TensorFlow 2.16.1 on a Linux machine (Ubuntu 20.04):

import keras
from keras import layers

import pathlib
from keras.utils import image_dataset_from_directory

new_base_dir = pathlib.Path("cats_vs_dogs_small")

train_dataset = image_dataset_from_directory(
    new_base_dir / "train",
    image_size=(180, 180),
    batch_size=32)
validation_dataset = image_dataset_from_directory(
    new_base_dir / "validation",
    image_size=(180, 180),
    batch_size=32)
test_dataset = image_dataset_from_directory(
    new_base_dir / "test",
    image_size=(180, 180),
    batch_size=32)

data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)


inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)

x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=5, use_bias=False)(x)

for size in [32, 64, 128, 256, 512]:
    residual = x

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

    residual = layers.Conv2D(
        size, 1, strides=2, padding="same", use_bias=False)(residual)
    x = layers.add([x, residual])

x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)


model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])
              
history = model.fit(
    train_dataset,
    epochs=100,
    validation_data=validation_dataset)

The accuracy over 100 epochs hovers around 50%:

The same results were reproduced with different linux machines, regardless whether it was run on the GPU or CPU, as well as using a JAX backend

What is strange about this issue is that trains successfully with the following configurations:

Linux with Keras 2.15 and TensorFlow 2.15
M1 Mac with Keras 3.0.5 and TensorFlow 2.16.1

Any advice on what may be causing the issue? Let me know if there is any information that I can provide to help troubleshoot the issue.

Thank you!

fchollet · 2024-04-26T17:32:34Z

Any advice on what may be causing the issue? Let me know if there is any information that I can provide to help troubleshoot the issue.

This code is known to work, so it's likely a bad initialization. Some common steps you can take:

Just restart and try again (with a different random seed)
Lower the learning rate by 2x
Lower dropout rate (0.5 -> 0.25)

t-kalinowski · 2024-04-26T17:38:05Z

@fchollet I am able to reproduce this. I haven't had a chance to dig into the root cause yet, but I can confirm that this is a bug in Keras 3; the same code produces a model that trains just fine w/ TF 2.15 + Keras 2.

fchollet · 2024-04-26T21:46:53Z

Looking into it.

fchollet · 2024-04-26T23:24:02Z

I have fixed a related issue with dataset shuffling. Can you try installing v3.3.3 and checking if your code works with that version?

t-kalinowski · 2024-04-27T01:17:55Z

Thanks! Looks like it's fixed now. I can confirm the model trains fine with Keras v3.3.3

jonbry · 2024-04-27T15:23:54Z

Looks like v3.3.3 fixed the issue. Thanks for all of your help!

google-ml-butler · 2024-04-27T15:23:56Z

Are you satisfied with the resolution of your issue?
Yes
No

t-kalinowski · 2024-04-30T20:47:16Z

By the way, just noticed that github release tagged v3.3.3 has a typo in the title (Kears vs Keras): Kears 3.3.3

Maybe this is the reason v3.3.2 is still listed as the "latest release" on the repo landing page?

sachinprasadhs · 2024-04-30T21:09:15Z

@t-kalinowski , I just updated the latest release tag in the landing page

github-actions bot assigned sachinprasadhs Apr 26, 2024

jonbry mentioned this issue Apr 26, 2024

Mini Xception model differences when keras and keras3 packages are installed t-kalinowski/deep-learning-with-R-2nd-edition-code#16

Closed

sachinprasadhs added type:Bug keras-team-review-pending Pending review by a Keras team member. labels Apr 26, 2024

jonbry closed this as completed Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model fails to train with Linux and Keras 3.3.2 #19623

Model fails to train with Linux and Keras 3.3.2 #19623

jonbry commented Apr 26, 2024

fchollet commented Apr 26, 2024

t-kalinowski commented Apr 26, 2024 •

edited

fchollet commented Apr 26, 2024

fchollet commented Apr 26, 2024

t-kalinowski commented Apr 27, 2024

jonbry commented Apr 27, 2024 •

edited

google-ml-butler bot commented Apr 27, 2024

t-kalinowski commented Apr 30, 2024 •

edited

sachinprasadhs commented Apr 30, 2024

Model fails to train with Linux and Keras 3.3.2 #19623

Model fails to train with Linux and Keras 3.3.2 #19623

Comments

jonbry commented Apr 26, 2024

fchollet commented Apr 26, 2024

t-kalinowski commented Apr 26, 2024 • edited

fchollet commented Apr 26, 2024

fchollet commented Apr 26, 2024

t-kalinowski commented Apr 27, 2024

jonbry commented Apr 27, 2024 • edited

google-ml-butler bot commented Apr 27, 2024

t-kalinowski commented Apr 30, 2024 • edited

sachinprasadhs commented Apr 30, 2024

t-kalinowski commented Apr 26, 2024 •

edited

jonbry commented Apr 27, 2024 •

edited

t-kalinowski commented Apr 30, 2024 •

edited