Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the MHANet run in real time #52

Open
hopkin-ghp opened this issue Feb 1, 2023 · 4 comments
Open

Can the MHANet run in real time #52

hopkin-ghp opened this issue Feb 1, 2023 · 4 comments

Comments

@hopkin-ghp
Copy link

hopkin-ghp commented Feb 1, 2023

Hi,

I am confused whether the MHANet works in real time. From my understrand, the masked attention only match causal scenario, may be not applicable to real tme.

Best Regards, looking forward to your reply.

@anicolson
Copy link
Owner

I did not get the chance to develop the model to run on a real-time system.

It would need some more development, but I assume its possible. You could do things like reuse past keys and queries for the attention mechanism to speed up processing times and determine a window of time-steps for the model that will allow it to be run fast enough on a device such that it is real time. So a few compromises would need to be made I assume. Also, a device with a GPU would make things much easier.

Maybe a paper like this could give you some ideas: https://arxiv.org/abs/2010.11395

I could be wrong, but I am sure it is very possible with some modifications.

Aaron.

@hopkin-ghp
Copy link
Author

Yes, i also think its possible that model run on a real-time system.

a) For a masked attention matrix(full history, 0 lookahead), like
1 0 0 0 0 0
1 1 0 0 0 0
1 1 1 0 0 0
1 1 1 1 0 0
1 1 1 1 1 0
1 1 1 1 1 1
I think it's different in traner and inferencer.

b) For a masked attetnion matrix(N history, 0 lookahead), in which N is the window size, if N=3, we can get
1 0 0 0 0 0
1 1 0 0 0 0
1 1 1 0 0 0
0 1 1 1 0 0
0 0 1 1 1 0
0 0 0 1 1 1
But i dont sure whether its suitable for real-time systems. specially, is it possible that such training method(b) can be applied to reasoning of streaming audio.

Thanks.

@anicolson
Copy link
Owner

Sounds like an interesting problem to investigate :) I am sure it could work with some constraints. Consider things like using previously computed keys to speed up processing, e.g., this is done with language models when generating text to speed up decoding: https://github.com/huggingface/transformers/blob/820c46a707ddd033975bc3b0549eea200e64c7da/src/transformers/models/gpt2/modeling_gpt2.py#L984

@hopkin-ghp
Copy link
Author

Thanks, i will learn relevant knowledge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants