Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Decoding with Zamia Speech's German wav2letter model using wav2letter Decoder executable #104

Open
realbaker1967 opened this issue May 7, 2020 · 8 comments

Comments

@realbaker1967
Copy link

First of all, really nice work!

I am interested in your German acoustic model for a benchmark.

I assume based on here you suggest to use wav2letter's Decoder executable to decode audios with your German acoustic model.

If we would use that executable from wav2letter, we would need to tune certain set of parameters which they mention in their Decoder executable explanation.

If possible, could you please share with us your tuned parameters for decoding?

Or, do we need to use the parameters in w2l_run_decode.sh.template?

Regards

@gooofy
Copy link
Owner

gooofy commented May 7, 2020

check out

data/src/speech/w2l_run_decode.sh.template

the script I used to run the decoder was based on this template. Please be aware that it is quite likely w2l has moved on from the state it was in back when I trained that model and used it, command line and/or file formats may have changed since then.

@realbaker1967
Copy link
Author

realbaker1967 commented May 7, 2020

Thank you for your fast answer!

Do you remember which commit of wav2letter you used when you trained and tested the model?

One more thing, do you remember which language model you used? I am assuming you used larger model of order 6 with less pruning.

I am trying to test your model with my audio files with the exact configuration you used to achieve your reported result in here.

@lagidigu
Copy link

@realbaker1967 I used the same decoder configuration as in the template file, as well as a the order 6 LM. Unfortunately, I cannot reproduce the reported WER of 3.97%.

Its probably due to the update of w2l I gues...

@realbaker1967
Copy link
Author

In my case, the model was decoding good except the beginnings and the endings of the audio files. For example:

Annotation: Sie pflegten die Kranken und verbanden die Verwundeten.
Hypothesis: pflegt eine kranken und verwandten die verwundeten en

Note that the word Sie is omitted and en is added.

I observe these two problems very frequently, especially adding non existing words at the ends.

Did you observe similar problems @lagidigu ?

@lagidigu
Copy link

lagidigu commented May 12, 2020

@realbaker1967 I get the same results after applying the template. This is strange. I will have to look into how the beam search decoder works exactly and will report back whether I made any progress.

@lagidigu
Copy link

@realbaker1967 unfortunately I couldn't troubleshoot the issue. @gooofy do you know what might have changed with the decoder? The WER is a lot higher than 3.97%, unfortunately :/

@gooofy
Copy link
Owner

gooofy commented May 14, 2020

@lagidigu no idea what exactly has changed but as I mentioned earlier I am not surprised wav2letter has moved on from the state it was in when I made my experiments. Actually, I suspect it is good news wav2letter continues to be developed and improved.

If you're serious about wav2letter I would suggest you train your own model from scratch using their current codebase - all training material from zamia speech is freely available as are the scripts used to train the model so that should give you a head start.

@realbaker1967
Copy link
Author

@gooofy I am interested to do only a benchmark. So no need to train from scratch.

For that, it would be really good if you could provide us which commit of wav2letter you used, if it would be possible of course.

In that case, i could safely run the decoder with your given template.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants