Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about temporal dependencies and feature correlations captured by DMSA #11

Closed
Stinger-Wiz opened this issue Mar 20, 2023 · 9 comments

Comments

@Stinger-Wiz
Copy link

你好,关于文章当中的自注意力我有问题想请教您。维度为N×N的自注意力矩阵Q·Kt,表示的是长度为N的一种维度之间的注意力关系,而您文章中提到的“Such a mechanism makes DMSA able to capture the temporal dependencies and feature correlations between time steps in the high dimensional space with only one attention operation”,DMSA的一个注意力矩阵能一次性同时捕获到两种维度之间的注意力,想问一次注意力操作捕获到两种类型的注意力是怎么做到的。

@WenjieDu
Copy link
Owner

Hi there,

Thank you so much for your attention to SAITS! If you find SAITS is helpful to your work, please star⭐️ this repository. Your star is your recognition, which can let others notice SAITS. It matters and is definitely a kind of contribution.

I have received your message and will respond ASAP. Thank you again for your patience! 😃

Best,
Wenjie

@How-Will
Copy link

I have a time series dataset where each column represents a different time series.
My understanding is that feature correlation describes the relationship between different time series, while temporal dependence describes the correlation between different time points within the same time series.

I am also puzzled about how a single attention operation can capture both temporal dependence and feature correlation?

Looking forward your reply

@WenjieDu
Copy link
Owner

WenjieDu commented Mar 20, 2023

Hi, first of all, thank you both @Stinger-Wiz @Will-Hor for raising this discussion.

In 1, BRITS utilizes LSTM to produce history-based estimation and builds another component to produce feature-based estimation (please refer to Section 4.3 in 1), and then combines both of them to form the final imputation.

We claim DMSA can capture the temporal dependencies and feature correlations between time steps with only one attention operation because, different from BRITS, we only need one DMSA, which can capture the temporal and feature correlations between time steps. The attention map has already embedded temporal dependencies between time steps. With diagonal masks applied, as we introduced in Section 3.2.1 in 2, input values at the t-th step can not see themselves and are prohibited from contributing to their own estimations. Consequently, estimations of the t-th step only depend on input values from other steps. It's worth mentioning that the component in BRITS to produce feature-based estimation is specially built to consider correlations between features of each time step, and its input is imputed data of the current step from the LSTM cell, namely this component works on the feature dimension. But DMSA works on the time dimension (this is why captured temporal dependencies and feature correlations are both between time steps). Due to that DMSA's input has already been projected into high dimensions (the features are fused) and SAITS does not make the imputation at this stage, DMSA does not need BRITS' component.

If you guys have new findings, you're welcome to share them with me 😊 Many thanks!

Footnotes

  1. Cao, W., Wang, D., Li, J., Zhou, H., Li, L., & Li, Y. (2018). BRITS: Bidirectional Recurrent Imputation for Time Series. NeurIPS 2018. 2

  2. Du, W., Cote, D., & Liu, Y. (2023). SAITS: Self-Attention-based Imputation for Time Series. Expert systems with applications.

@Stinger-Wiz
Copy link
Author

How should we understand the phrase 'DMSA works in the time dimension'? If attention maps represent the attention between each time step, where is the correlation between features reflected?

Looking forward your reply🌹

@WenjieDu
Copy link
Owner

WenjieDu commented Apr 3, 2023

Hi, thank you for your patience. The input of DMSA is fused information from the features in a d_model dimensional space (here we just ignore multi-head splitting for simplification). The fused information means the information of features gets leaked with each other. Please note this is the key point and it is different from BRITS' feature-regression module operating in the original space with n_features dimensions. Therefore, when DMSA working, it will extract correlations between features from other T-1 steps to estimate the missing part as best as possible. And note, the feature correlation here is different from the temporal dependency, because the latter represents the temporal correlations between time steps and is embedded in the attention map, while the former represents that DMSA manipulates fused information and utilizes the information leakage to extract the correlations. The truth is such feature correlation extraction is implicit and isn't explicit as the attention map or BRITS' feature regression, so it may be confusing to some of our readers. I'd like to thank you again for raising this issue. And you can validate my explanation above by appending a feature-regression module explicitly to the DMSA block. In my before experiments, it brings no accuracy improvement but only extra parameters.

@WenjieDu WenjieDu pinned this issue Apr 6, 2023
@WenjieDu WenjieDu changed the title 关于注意力捕获的疑问 Question about temporal dependencies and feature correlations captured by DMSA Apr 6, 2023
@WenjieDu
Copy link
Owner

Hi, guys @Stinger-Wiz @Will-Hor, does my previous reply sound reasonable to you? If you have any other questions about this issue, feel free to tell me :-)

@Stinger-Wiz
Copy link
Author

Thank you for your detailed explanation, now there is no problem with this research, thank you for your assistance :-)

@WenjieDu
Copy link
Owner

WenjieDu commented Apr 10, 2023

@Stinger-Wiz My pleasure. Also thank you very much for your attention to SAITS! If you think it is inspiring or helpful to your work, please star🌟 the repo to help more people notice this work. Also please take a look at our new work PyPOTS which may be useful. 😃 Many thanks for your contribution again!

@How-Will
Copy link

thank you very much. Your reply really helped me a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants