Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the attention map #19

Open
HelenMao opened this issue May 29, 2021 · 2 comments
Open

Questions about the attention map #19

HelenMao opened this issue May 29, 2021 · 2 comments

Comments

@HelenMao
Copy link

HelenMao commented May 29, 2021

Hi, I am trying the AFHQ dataset of your model and find your model can preserve the background of the source image very well. I think it is thanks to the attention map, and I visualize the attention map and find it can learn the mask.

However, when I am trying to copy the attention module to my own framework (this paper), I find it does not work at all and fail to learn the mask. The main difference between mine and yours lie in the mapping network usage and without KL/MMD related loss between the random noise distribution and the reference encoder embedding distribution (I directly replace your generator and fail to learn the mask too).

I am wondering do you have some experience with your attention map design. What do you think when it can learn the mask? It would be really great if you can share some experience with me, thanks a lot!

Looking forward to your reply!

@imlixinyang
Copy link
Owner

imlixinyang commented May 29, 2021

I've also tried the AFHQ dataset and found that HiSD only focuses on manipulating the shape and maintains the background and color, which will be presented in the camera ready supplemental material.

I think there are some key points why HiSD succeeds to learn the mask without any extra objective: 1. separate translator for each tag or semantic; 2. no diversification loss; and 3. applying the mask on the feature rather than the image (which means that both channel wise and spatial wise are important).

In previous works, a regularization objective is always needed, I think the reason is that a spatial-wise-only mask is hard to learn for the generator.

@HelenMao
Copy link
Author

I think the tag operation may not influence since I use one tag when I am running the AFHQ dataset.

The diversification loss may have some influence, and I need to do more experiments.

I directly copy your generator (including both the translator and decoder) to my own framework. I think your generator do use both channel-wise and spatial-wise attention map. However, it still cannot learn the mask. Therefore, I think it may not be the main reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants