Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about computation in paper 'Simple Recurrent Units for Highly Parallelizable Recurrence' ? #157

Open
liziru opened this issue Jan 22, 2021 · 4 comments
Labels

Comments

@liziru
Copy link

liziru commented Jan 22, 2021

Thanks for your job. @taoleicn @taolei87

In the paper 'Simple Recurrent Units for Highly Parallelizable Recurrence', I found the following computation:
image

To facilitate parallelization, our light recurrence component uses a point-wise multiplication vt*ct-1 instead.
With this simplification, each dimension of the state vectors becomes independent and hence parallelizable.

I have no idea how to become parallelizable when using a point-wise multiplication dimension of the state vectors instead.
Because of still using ct-1, I think this ‘sru’ cannot be parallelizable.

Besides, I also found 'Training RNNs as Fast as CNNs’ computation that I think it can be parallelizable.
image

And, what are the differences between these two papers? I also mentioned this question in Zhihu.

Looking forward to your reply.

@liziru liziru changed the title Confusion about computation in paper 'Simple Recurrent Units for Highly Parallelizable Recurrence' Confusion about computation in paper 'Simple Recurrent Units for Highly Parallelizable Recurrence' ? Jan 23, 2021
@taolei87
Copy link
Contributor

taolei87 commented Jan 23, 2021

hi @liziru , thanks for posting the great questions.

(1) Difference of the two versions. In our first arxiv version, we didn't include the element-wise hidden-to-hidden connection (v * c_{t-1}) for simplicity. It is easier to derive the backward computation in this case. We added the hidden-to-hidden connection to obtain additional modeling capacity and better empirical results in our second version (EMNLP submission). See for example a comparison of two variants in Table 5 and Figure 5 in the EMNLP paper.

(2) Parallelization. There are two different parallelization here. First, the matrix multiplications in (1)-(4) can be grouped into a single one and therefore be parallelized across both time step and hidden dimension. Second, you are correct that point-wise computation cannot be parallelized across time step. But they can be parallelized across hidden dimension since each dimension is independent. In other words, to compute c[t][i] you need to first compute c[t-1][i]; but you can compute c[*][i] and c[*][j] in parallel for i != j. Note the second parallelization is not available in LSTM / GRU since in these RNNs the hidden-to-hidden connections are fully connected.

Hope this helps.

@liziru
Copy link
Author

liziru commented Jan 25, 2021

hi @liziru , thanks for posting the great questions.

(1) Difference of the two versions. In our first arxiv version, we didn't include the element-wise hidden-to-hidden connection (v * c_{t-1}) for simplicity. It is easier to derive the backward computation in this case. We added the hidden-to-hidden connection to obtain additional modeling capacity and better empirical results in our second version (EMNLP submission). See for example a comparison of two variants in Table 5 and Figure 5 in the EMNLP paper.

(2) Parallelization. There are two different parallelization here. First, the matrix multiplications in (1)-(4) can be grouped into a single one and therefore be parallelized across both time step and hidden dimension. Second, you are correct that point-wise computation cannot be parallelized across time step. But they can be parallelized across hidden dimension since each dimension is independent. In other words, to compute c[t][i] you need to first compute c[t-1][i]; but you can compute c[][i] and c[][j] in parallel for i != j. Note the second parallelization is not available in LSTM / GRU since in these RNNs the hidden-to-hidden connections are fully connected.

Hope this helps.

Thanks for your reply. @taolei87
Is this GitHub implementation the first arxiv version or the second version (EMNLP submission)?
I want to do some tests on the first version due to some reasons. Where can I find the implementations of the first version?
I will appreciate it very much.

@taolei87
Copy link
Contributor

@liziru It is version 2 by default. But you can test v1 by passing v1 = True in SRU or SRUCell constructor.
See https://github.com/asappresearch/sru/blob/master/sru/modules.py#L444

@liziru
Copy link
Author

liziru commented Jan 25, 2021

@liziru It is version 2 by default. But you can test v1 by passing v1 = True in SRU or SRUCell constructor.
See https://github.com/asappresearch/sru/blob/master/sru/modules.py#L444

Great! Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants