Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model fails to train with Linux and Keras 3.3.2 #19623

Closed
jonbry opened this issue Apr 26, 2024 · 9 comments
Closed

Model fails to train with Linux and Keras 3.3.2 #19623

jonbry opened this issue Apr 26, 2024 · 9 comments
Assignees
Labels
keras-team-review-pending Pending review by a Keras team member. type:Bug

Comments

@jonbry
Copy link

jonbry commented Apr 26, 2024

The following code from Deep Learning with Python, Second Edition fails to train when using Keras 3.3.2 and TensorFlow 2.16.1 on a Linux machine (Ubuntu 20.04):

import keras
from keras import layers

import pathlib
from keras.utils import image_dataset_from_directory

new_base_dir = pathlib.Path("cats_vs_dogs_small")

train_dataset = image_dataset_from_directory(
    new_base_dir / "train",
    image_size=(180, 180),
    batch_size=32)
validation_dataset = image_dataset_from_directory(
    new_base_dir / "validation",
    image_size=(180, 180),
    batch_size=32)
test_dataset = image_dataset_from_directory(
    new_base_dir / "test",
    image_size=(180, 180),
    batch_size=32)

data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)


inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)

x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=5, use_bias=False)(x)

for size in [32, 64, 128, 256, 512]:
    residual = x

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

    residual = layers.Conv2D(
        size, 1, strides=2, padding="same", use_bias=False)(residual)
    x = layers.add([x, residual])

x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)


model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])
              
history = model.fit(
    train_dataset,
    epochs=100,
    validation_data=validation_dataset)

The accuracy over 100 epochs hovers around 50%:
mini_xception_keras3_linux

The same results were reproduced with different linux machines, regardless whether it was run on the GPU or CPU, as well as using a JAX backend

What is strange about this issue is that trains successfully with the following configurations:

  • Linux with Keras 2.15 and TensorFlow 2.15
  • M1 Mac with Keras 3.0.5 and TensorFlow 2.16.1

Any advice on what may be causing the issue? Let me know if there is any information that I can provide to help troubleshoot the issue.

Thank you!

@fchollet
Copy link
Member

Any advice on what may be causing the issue? Let me know if there is any information that I can provide to help troubleshoot the issue.

This code is known to work, so it's likely a bad initialization. Some common steps you can take:

  • Just restart and try again (with a different random seed)
  • Lower the learning rate by 2x
  • Lower dropout rate (0.5 -> 0.25)

@t-kalinowski
Copy link
Contributor

t-kalinowski commented Apr 26, 2024

@fchollet I am able to reproduce this. I haven't had a chance to dig into the root cause yet, but I can confirm that this is a bug in Keras 3; the same code produces a model that trains just fine w/ TF 2.15 + Keras 2.

@sachinprasadhs sachinprasadhs added type:Bug keras-team-review-pending Pending review by a Keras team member. labels Apr 26, 2024
@fchollet
Copy link
Member

Looking into it.

@fchollet
Copy link
Member

I have fixed a related issue with dataset shuffling. Can you try installing v3.3.3 and checking if your code works with that version?

@t-kalinowski
Copy link
Contributor

Thanks! Looks like it's fixed now. I can confirm the model trains fine with Keras v3.3.3
image

@jonbry
Copy link
Author

jonbry commented Apr 27, 2024

Looks like v3.3.3 fixed the issue. Thanks for all of your help!

@jonbry jonbry closed this as completed Apr 27, 2024
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@t-kalinowski
Copy link
Contributor

t-kalinowski commented Apr 30, 2024

By the way, just noticed that github release tagged v3.3.3 has a typo in the title (Kears vs Keras): Kears 3.3.3

Maybe this is the reason v3.3.2 is still listed as the "latest release" on the repo landing page?

@sachinprasadhs
Copy link
Collaborator

@t-kalinowski , I just updated the latest release tag in the landing page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keras-team-review-pending Pending review by a Keras team member. type:Bug
Projects
None yet
Development

No branches or pull requests

4 participants