Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sudden failure mode after nth epoch in Text Style Transfer #286

Open
rafiyajaved opened this issue Jan 1, 2020 · 1 comment
Open

Sudden failure mode after nth epoch in Text Style Transfer #286

rafiyajaved opened this issue Jan 1, 2020 · 1 comment
Assignees
Labels
question Further information is requested topic: examples Issue about examples

Comments

@rafiyajaved
Copy link

Hello all. Thank you for your work on this repo and apologies for opening another issue so soon. This issue has to do with the text style transfer example that @swapnull7 is in the process of adding.

Using the text style transfer example from the open PR, I'm finding that after a certain number of epochs, the loss_g_ae suddenly starts to increase rapidly and produce garbage results like this:

Original: by the way , matt is a great and professional server !
Decoded: i not not this this this this this this this this this this this this this this this this this

Original: their patio is awesome , very soothing atmosphere with a nice .
Decoded: their was the , the , the , the , the , the , the , the , the ,

Original: unfortunately we had to learn this tasteless lesson on such a special night .
Decoded: best such , and and and and and a and a a a a and and a and a a

I reproduced the error several times on 2 separate machines and the epoch after which this starts to happen is not consistent (sometimes it's after the 6th epoch, sometimes after the 10th, sometimes after the 17th etc. This happens even when there are no epochs using the joint autoencoder + discriminator loss, in other words, it usually happens during the n-2 pre-train epochs and I don't think it has to do with the discriminator term.)

The loss from a recent run I just did looks like this:

epoch: 1, loss_g_ae: 1.0822
...
epoch: 11, loss_g_ae: 0.2434
epoch: 12, loss_g_ae: 0.2385
epoch: 13, loss_g_ae: 0.2323
epoch: 14, loss_g_ae: 0.2054
epoch: 15, loss_g_ae: 0.7022
epoch: 16, loss_g_ae: 0.6007
epoch: 17, loss_g_ae: 0.3948
epoch: 18, loss_g_ae: 8.4702

@gpengzhi gpengzhi added question Further information is requested topic: examples Issue about examples labels Jan 2, 2020
@gpengzhi
Copy link
Collaborator

gpengzhi commented Jan 2, 2020

No worries. Thanks for your interest in texar-pytorch! We will look into this issue asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested topic: examples Issue about examples
Projects
None yet
Development

No branches or pull requests

3 participants