Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It looks like z-sem is not being trained #77

Open
Kim-Sung-Hun opened this issue Mar 22, 2024 · 9 comments
Open

It looks like z-sem is not being trained #77

Kim-Sung-Hun opened this issue Mar 22, 2024 · 9 comments

Comments

@Kim-Sung-Hun
Copy link

Kim-Sung-Hun commented Mar 22, 2024

Hi, thank you for your excellent research!
While performing inference through an autoencoder, I consistently obtained the same output regardless of the input image(depends only x_T).
I tried training with my own data and FFHQ dataset, but the same phenomenon occurred in both cases.
I think it might be related to the issue of the gradient of z-sem becoming zero, which was raised by another person, and since there was no response to that post, I decided to raise it again.
(#63)
Thank you.

@phizaz
Copy link
Owner

phizaz commented Mar 22, 2024

Can you show me the smallest working code?

@Kim-Sung-Hun
Copy link
Author

Sure, here is the code

cond = model.encode(input_image)
xT = model.encode_stochastic(input_image, cond, T=50)
pred = model.render(noise= xT , cond=cond, T=20)
pred = (pred + 1) / 2
pred = pred[0]
pred = pred.permute(1, 2, 0).cpu().numpy()

plt.imsave('image.png', pred)

When I ran the code, the image was maked successfully, but the problem is that when I change the input_image to a different image, the same result is maked.

@phizaz
Copy link
Owner

phizaz commented Mar 22, 2024

I cannot reproduce your problem. Can you provide the whole notebook with the results of encoding of both images?

@Kim-Sung-Hun
Copy link
Author

Kim-Sung-Hun commented Mar 23, 2024

All right. I'll show you the whole process in detail

Fisrt, I used the 98 epochs model learned from run_ffhq128.py. As you know, the file is divided into four parts, and only the first part was executed to learn only the autoencoder part.

gpus = [0, 1, 2, 3, 4, 5, 6, 7]
conf = ffhq128_autoenc_130M()
train(conf, gpus=gpus)

Second, I used an images of a person's face captured on Google as input

To show the problem I was talking about, I conducted a total of four experiments.

  1. cond and x_T come from image1(1.png) : image1 is restored saved as image1
  2. cond and x_T come from image2(2.png) : image2 is restored saved as image2
  3. cond comes from image2(2.png) and x_T comes from image1(1.png) : image1 is restored and result image is exactly same as case1 saved as image3
  4. cond comes from image1(1.png) and x_T comes from image2(2.png) : image2 is restored and result image is exactly same as case1 saved as image4

According to the above results, cond has no effect on the result image at all. Result is only affected by x_T. This doesn't make sense, because according to the paper, z-sem(cond) has more influence on the resulting image than x_T.
I'll attach the inference code, image, and result image that I used. I couldn't attach the model because of the capacity.
Thank you.
attachment.zip

@phizaz
Copy link
Owner

phizaz commented Mar 24, 2024

  1. Do you mean this problem happens to your model trained from scratch on your own dataset?
  2. After looking at your attachment, I think I see some artifacts to suggest that you are encoding images that "the model is not trained with". If you use the checkpoint provided by us, it will definitely not work with your images because they are not "aligned" (FFHQ images are aligned in a particular way!). If you use your own checkpoint, make sure that you don't assume some kind of alignment in your training dataset.

@Kim-Sung-Hun
Copy link
Author

  1. That's right, and not only when learned with my data, but also learned with ffhq data, I get the same problem. In fact, the model used on attachment is that trained ffhq data (run_ffhq.py. only part of autoencoder)
  2. Thank you for your advice, but I think It is still strange that the results of the model are not affected by condition(z-sem). Even if new data is entered into the model, there is no reason why condition do not affect it at all. As I said before, when I ran run_ffhq.py, I ran only the autoencoder part (the rest is annotated), so do you think is it related to this problem?

@phizaz
Copy link
Owner

phizaz commented Mar 24, 2024

  1. The artifacts in Image1 and image2 shouldn't be there if the model is trained and used correctly.
  2. z_sem can only influence outputs that are NOT part of the noise (X_T).
  3. It is usually the case that when your training images/test images don't share the same properties (such as aligned in the same way), most information will be contained in X_T (because the semantic encoder has no clue how to encode the image). Then, even when you change z_sem, you won't see any meaningful change to the output because most information is kept in X_T in the first place.
  4. Good exercise, you may also plot x_T to see what information is in there.

@Kim-Sung-Hun
Copy link
Author

Thank you again for your advice. Additionally, I confirmed that the values of some parameters related to the encoder were zero for the checkpoint model I used.
Then, finally, my understanding is that if I preprocess my training data and test data to have the same properties(such as aligned in the same way), it will help me to solve this problem, right?

@Saranga7
Copy link

  1. Do you mean this problem happens to your model trained from scratch on your own dataset?
  2. After looking at your attachment, I think I see some artifacts to suggest that you are encoding images that "the model is not trained with". If you use the checkpoint provided by us, it will definitely not work with your images because they are not "aligned" (FFHQ images are aligned in a particular way!). If you use your own checkpoint, make sure that you don't assume some kind of alignment in your training dataset.

What does "aligned" mean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants