Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to calculate 500ms_context from am_500ms_future_context.arch? #985

Open
yuseungwoo opened this issue Aug 19, 2021 · 4 comments
Open
Labels

Comments

@yuseungwoo
Copy link

yuseungwoo commented Aug 19, 2021

Question

[A clear, concise description of your setup and question]

Thank you for reading my question in advance.

I have a question.

Paper "https://research.fb.com/wp-content/uploads/2020/01/Scaling-up-online-speech-recognition-using-ConvNets.pdf"
and Example, https://github.com/flashlight/wav2letter/tree/master/recipes/streaming_convnets/librispeech

Above paper and example say that am_500ms_future_context.arch, this model has the 500ms future context.. but I don't understand why the model has 500ms_future_context.

Could you explain how the model has 500ms future context using above architecture, am_500ms_future_context.arch ?

Best Regrad

Seung Woo

Additional Context

[Add any additional information here]

@tlikhomanenko
Copy link
Contributor

Hey,

You need to calculate what is the receptive field in your convolution network, so define which the future tokens / past tokens are used in the computations for particular output frame.

I believe in our code it was done automatically, as we define function for conv to compute its receptive field depending on the conv params and then propagate to the next layer. cc @vineelpratap if I am wrong.

@airlab-byeol
Copy link

airlab-byeol commented Aug 23, 2021

@tlikhomanenko
Thank you for peaking up a good point.
I calculated receptive field for one particular output frame.
image
Based on my math, it has about 1.5sec. receptive field. Is this related to 500ms anyhow?

@yuseungwoo
Copy link
Author

yuseungwoo commented Aug 23, 2021

Dear @tlikhomanenko

Appreciate your contribution of this paper and answering to my question.

I'm so surprised with your work and studying your model, am_500ms_future_context.arch.

Especially, I'm interested in model diet and model inference speed

Here, I want to ask you something.

According to your paper, future context 250msec quite as good as 500msec arch. However I can't find it.

Where can I find this model or could you provide this?

Sincerly

Seung Woo

@nguyenhuy1209
Copy link

@tlikhomanenko Thank you for peaking up a good point. I calculated receptive field for one particular output frame. image Based on my math, it has about 1.5sec. receptive field. Is this related to 500ms anyhow?

Hi @airlab-byeol, would you mind explaining clearly how did you come up with the figure? I feel like it is really close to the answer but don't understand why did you use 100 frames as input. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants