Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tf_clean] CTC logits/posteriors/etc for augmented input buggy in tf_test.py #194

Open
efosler opened this issue Aug 18, 2018 · 6 comments
Assignees

Comments

@efosler
Copy link
Contributor

efosler commented Aug 18, 2018

Augmented streams (which have stacked frames shifted) create n copies of the input. At test time the logit streams are averaged together. However, this is buggy under CTC training, as the blank label can dominate other labels in the averaged stream under CTC. Documented more fully under #193 .

Proposed fix: change test code to not create shifted copies after stacking. Also have proposed changes to training to allow dumping of logit stream during cv pass; will output first encountered stream instead of averaging. Discussion welcome.

@efosler
Copy link
Contributor Author

efosler commented Aug 19, 2018

I'm mulling over different solutions for this. Input welcome, particularly from @ramonsanabria and/or @fmetze

  • The simplest solution is to print out a warning in tf_test if subsampling > 1. (--roll should also be discouraged). The current arguments do support stacking but not subsampling, but can get you into trouble if you don't know what you're doing (as I didn't).
  • More extreme is not allowing subsampling > 1 in test
  • Most involved would be to create an explicit combination scheme argument, which defaults to "use first" (which, in combination with --roll, would give you a random shift), but could also access the "average" scheme. It's just not clear to me that averaging makes sense under CTC (unlike regular frame-based systems), so whether the averaging technique should even be preserved is not clear.

@fmetze
Copy link
Contributor

fmetze commented Aug 19, 2018 via email

@efosler
Copy link
Contributor Author

efosler commented Aug 20, 2018

Last night's run, using only SWB grammar and giving the forward pass --subsampled_utt 1 (which basically will select the first copy it sees), resulted in 20.0 WER, which is the best that I've seen on this pipeline. So turning off averaging did really help.

@fmetze were you suggesting that there was another method you used for averaging that did work (outside the code base) or that you used this particular code and got reasonable results? What is in the code looked reasonable to me as I worked through it.

Given that there's been widely varying experiences with subsampling combination, I think the best thing is to do the right thing and add another flag to the decode portion and make averaging an option (but not on by default). I can code that today - hoping to put this bit to rest, clean up the code a bit and then submit pull changes for the baseline recipe. If I'm feeling feisty I might even add comments so that the next sojourner has some signposts... :-)

@efosler
Copy link
Contributor Author

efosler commented Aug 20, 2018

Side note: I just realized some of my questions (e.g. role of nnet.py) arose because I had broken out stuff from decode_ctc_lat_tf.sh and put it directly in run_ctc_phn.sh when I started this and forgot that I had done so.

@ramonsanabria
Copy link

Sorry for being disappeared from this thread. We had (still have) some evaluation going on.

Regarding the averaging vs taking one frame. In CharRNN decoding we experimented the exact same thing: taking one frame works better than averaging. However, @fmetze is right in the sense that we got that performing ROVER with different decoding strategies helped (I dont remember in which dataset or experiments). But yes @efosler is right, I think that the best way to go is having average as a flag rather than default.

We can clearly remove --roll is something that never worked for me.

@ramonsanabria
Copy link

thank you again for this Eric. This is great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants