Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I ran COVIDNet-CXR-2 on Kaggle and loss explodes after few epochs #211

Open
homerdiaz opened this issue Feb 3, 2022 · 2 comments
Open

Comments

@homerdiaz
Copy link

Quick question before give the details:
Is the COVIDNet-CXR-2 model trained already? Im asking you guys because I trained the model on Kaggle for 3 epochs and the loss seems to explode. Thanks! Details below.

I implemented COVIDNet-CXR-2 model on Kaggle using the benchmark dataset on kaggle:
Benchmark dataset: https://www.kaggle.com/andyczhao/covidx-cxr2

Before the first epoch I got the same results you guys reported. I got:
Output: ./COVIDNet-lr0.0002
13992 16490
Sens Negative: 0.970, Positive: 0.955
PPV Negative: 0.956, Positive: 0.970

After 3 epochs the loss explodes as you can see below:
Training started
1749/1749 [==============================] - 1538s 877ms/step
Epoch: 0001 Minibatch loss= 6443.208007812
[[ 1. 199.]
[ 1. 199.]]
Sens Negative: 0.005, Positive: 0.995
PPV Negative: 0.500, Positive: 0.500
Saving checkpoint at epoch 1
1749/1749 [==============================] - 3029s 837ms/step
Epoch: 0002 Minibatch loss= 39658.683593750
[[ 0. 200.]
[ 0. 200.]]
Sens Negative: 0.000, Positive: 1.000
PPV Negative: 0.000, Positive: 0.500
Saving checkpoint at epoch 2
1749/1749 [==============================] - 4489s 819ms/step
Epoch: 0003 Minibatch loss= 122188.039062500
[[ 0. 200.]
[ 2. 198.]]
Sens Negative: 0.000, Positive: 0.990
PPV Negative: 0.000, Positive: 0.497
Saving checkpoint at epoch 3
Optimization Finished!

@haydengunraj
Copy link
Collaborator

Hi @homerdiaz,

These models are in fact trained already. With respect to the loss instability, one thing you can try is reducing the learning rate. The default learning rate in our scripts may be too high, as these models are highly-optimized versions of much larger baseline models and were not trained from scratch in their current forms.

@homerdiaz
Copy link
Author

homerdiaz commented Feb 17, 2022

Thanks @haydengunraj !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants