Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper, Table 1, Convolution number of parameters #128

Open
ndvbd opened this issue Jan 17, 2024 · 2 comments
Open

Paper, Table 1, Convolution number of parameters #128

ndvbd opened this issue Jan 17, 2024 · 2 comments

Comments

@ndvbd
Copy link

ndvbd commented Jan 17, 2024

Hi, a few things that are not fully clear to me on Table 1. It says convolution has LH parameters. How can it be if only the A matrix, which is learnable, is of shape LxL. Maybe it is because A is diagonalizable plus low rank, and we only learn the diagonal, and neglect the low rank?

  1. in 3.1, it says:
    image

shouldn't the time complexity should O(N^3L)?

  1. In Table 1, why S4 number of parameters is H^2 and not LH? After all, section 3.4 says the number of parameters is L==N, and we need H dimensions, which makes it LH.
@albertfgu
Copy link
Contributor

  1. The convolution column of the table is not an SSM convolution, but directly parameterizing the convolution's kernel elements (like a standard convolution). (This is mentioned in the footnote.) See this work for an example of people attempting this in practice: https://hazyresearch.stanford.edu/blog/2023-02-15-long-convs

  2. It's a matrix-vector multiplication, not matrix-matrix, so $O(N^2)$ per $L$ iterations.

  3. I think you have misread something. S4's parameterization does not depend on sequence length and I don't see anything in Section 3.4 that implies so

@ndvbd
Copy link
Author

ndvbd commented Jan 24, 2024

Thank you @albertfgu,

  1. Thank you; I understand: it’s like a regular convolution kernel, with the size of the sequence length (L), multiplied by the dimension size (H)

  2. But in equation (5), in order to compute the kernel, you need to raise matrix A by the power of L, and A is nxn. Am I missing something?

  3. I see, thank you.

Do you know where can I find explanation to the training time of the Convolution and S4:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants