Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training problem #8

Open
b789 opened this issue Oct 27, 2017 · 7 comments
Open

Training problem #8

b789 opened this issue Oct 27, 2017 · 7 comments

Comments

@b789
Copy link

b789 commented Oct 27, 2017

Hi,
i've tried to train an example model using the example_cornell.yml configuration but after 6800 steps the model is responding to all the request with the same 'sentence' : ..............
it seem the the training isn't doing much since at step 200 it prints:
training loss = 5.797; training perplexity = 329.45
Validation loss = 5.764; val perplexity = 318.69
and at step 6800 the values are almost the same:
training loss = 5.597; training perplexity = 269.70
Validation loss = 5.692; val perplexity = 296.40

What can be the cause of this ? How can a reasonable output been obtained ?

@mckinziebrandon
Copy link
Owner

mckinziebrandon commented Oct 27, 2017

That's definitely not the expected behavior. It's been some time since I last ran the project, so let me try and reproduce your results over the weekend. I'll let you know if I see the same issue and how you may be able to resolve it.

In the meantime, could you try deleting all the extra files made by the model and running it again? By extra files, I mean the vocab, tfrecords, and *.ids files. I remember there being some subtle issues with them being reloaded that didn't occur when they were generated on a first run. I thought I had fixed those issues but this sounds related. I'll be able to provide more info after I try to reproduce the issue myself.

@mckinziebrandon
Copy link
Owner

Hi @b789 , I was (unfortunately) able to reproduce your results! This is pretty surprising/annoying, but thanks for bringing my attention to it. My main GPU has been tied up with other work this weekend, so I wasn't able to explore much, but my best guess is that the AttentionDecoder is to blame. It uses a custom attention implementation I wrote back in the days of tf version ~1.1 when the API for doing that was very fragile and actually broke between minor releases.

With that said, and after skimming my loss plots in the wiki, you can get reasonable outputs if you just change AttentionDecoder to BasicDecoder in example_cornell.yml. I have confirmed on a smaller machine of mine that losses decrease as expected in that case.

I'm definitely not satisfied with this...hopefully I have some time soon to see what is going on with AttentionDecoder. It used to work! And of course, if you happen to find a fix for it, contributions are more than welcome! Let me know if I can help with anything else, and if I do find out what's going on with AttentionDecoder soon, I will post updates here (and push the corrections to master).

@mckinziebrandon
Copy link
Owner

It's okay @Shaptic everything will be ok.

@b789
Copy link
Author

b789 commented Oct 31, 2017

Thank @mckinziebrandon for the advice to use the BasicDecoder, in this way is actually working.
I'll try to look what's the problem with the AttentionDecoder if i can.

@mckinziebrandon
Copy link
Owner

mckinziebrandon commented Nov 8, 2017

@b789 or anyone interested: BasicEncoder does not return it's full set of states over time (returns _, final_output only I believe), while BidirectionalEncoder returns both the set of states and the final output. When designed, this was a feature, but it is now a bug! Happy to accept a PR that makes the returned tuple of BasicEncoder the same form as returned by BidirectionalEncoder.

@DmLitov4
Copy link

@mckinziebrandon thank you for this wonderful library!
Can you elaborate on training proccess a little more? I mean, config file requires 'train_from.txt', 'train_to.txt', 'valid_from.txt', 'valid_to.txt' files, but I don't really understand what they should contain. I have one .txt file with Cornell Movies Corpus, it contains rows that look like "question, answer". How should I split it?

@DmLitov4
Copy link

@mckinziebrandon sorry, I've missed link to your Mega drive account, now I can see all your files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants