Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduled Sampling #152

Open
umgupta opened this issue Jul 2, 2018 · 7 comments
Open

Scheduled Sampling #152

umgupta opened this issue Jul 2, 2018 · 7 comments

Comments

@umgupta
Copy link

umgupta commented Jul 2, 2018

In the scheduled sampling paper, it is mentioned that if we try to train by tossing coin and deciding whether to provide predicted output for the whole sequence or not it performs worse. Instead one should choose to provide correct token or not at each time step. (see p3. footnote in the paper). Yet in the decoder, teacher forcing is either enabled for the whole sequence or not, I don't think that would work.

@AtmaHou
Copy link

AtmaHou commented Jul 12, 2018

You might be right, but the teacher forcing here could really improve the performance of 1 ~ 2 point~

@umgupta
Copy link
Author

umgupta commented Jul 12, 2018

@AtmaHou Did you mean the kind of teacher forcing that is implemented here? I tried that and it actually doesn't improve the perf (in agreement with scheduled sampling paper)

@AtmaHou
Copy link

AtmaHou commented Jul 15, 2018

Yep~~ You could have a try to tune the teacher forcing rate (default 0), 0.5 is worth trying. I found both 0 and 1 are not helping.
emmmmm..... From my point of view, scheduled sampling is just a trick to enable model to see its own output with a random rate, and both the two method achieve this.

@umgupta
Copy link
Author

umgupta commented Jul 15, 2018

@AtmaHou My experience has not been good with this kind of teacher forcing for non-trivial tasks so far. It worsens my result sometimes. Scheduled sampling method works better though.

Since this repo has so many stars and at one point I was using as a ref implementation, I thought I should point it out.

@AtmaHou
Copy link

AtmaHou commented Jul 18, 2018

@umgupta Ha~Your post has also deepened my understanding of teacher forcing.
Maybe I should implement a kind of teacher forcing that you pointed out, which can further enhance my model effect.

@umgupta
Copy link
Author

umgupta commented Jul 18, 2018

@AtmaHou Sure do so and let me know :).

Also, I am fairly new to learning sequences. Do you happen to know some toy problem to compare/judge the sanity of the algorithm (like mnist for images)? The one in this repo to learn to reverse is too trivial. (Too trivial because any kind of teacher forcing works ok, or even if you make some mistake in code it worked with good result)

@AtmaHou
Copy link

AtmaHou commented Jul 18, 2018

@umgupta Machine translation problem in pytorch tutorial is quite simple, which might satisfy you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants