Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observations on eyeglasses and textures #2

Open
yaseryacoob opened this issue Apr 14, 2021 · 7 comments
Open

Observations on eyeglasses and textures #2

yaseryacoob opened this issue Apr 14, 2021 · 7 comments

Comments

@yaseryacoob
Copy link

Thanks for sharing your ideas and code. It is rather fun to compare it to StyleGAN2. I am wondering about this

  1. Why does your algorithm do poorly with eyeglasses?
    yycomp_5

  2. There is a certain blockiness to the images (almost like JPEG artifacts) Not sure why it is more common.

  3. I would have expected hair to be better rendered with your architecture, but for some odd reason (especially facial hair) is more iffy. Almost like too much regularity to the wavelet directions.

Thanks if you have any information to share. It is an interesting architecture you propose, so I missing intuition behind it.

@bes-dev
Copy link
Owner

bes-dev commented Apr 14, 2021

@yaseryacoob hey, thanks for you feedback!

Yes, we observed problems with glasses generation—detailed exploration of this problem we have postponed for further work. The main hypothesis about this problem is that this behavior can be related to "undersampling" problem. So, by design, our pipeline generates training data on the fly using a teacher network. If the probability of face with glasses is small, it will be hard to generalization for this class of samples.

What about 2-3, I think this problem related to "underfitting" of the network in common. In the last stages of the training, we observed some symptoms of "overfitting" the discriminator network that leads to stop training. It was partially fixed by using differentiable augmentations, but some artifacts still exist on the generated images.

Yes, we noticed the same problems in our model. But our work just the first step to fast production-ready style-based generative models. We will try to fix the known issues in further works and make the final quality more closure to the original network.

@yaseryacoob
Copy link
Author

I figured you must have seen these, and I fully understand the complexity of research code and also the challenge. I actually mean to be encouraging of the approach and architecture overall. I will look into it as well (as much as I can within your code to see what else may enable improving the generation. Here is another question, for the example above I used compare.py which inject a 512 latent, can I inject a W+ 18x512 latent instead? (obviously StyleGAN2 has no issue with that, but from the paper I couldn't tell if your architecture is more than W. Can you clarify?

Also, you can send me email directly to yaser@umd.edu if you rather not discuss in public.

@bes-dev
Copy link
Owner

bes-dev commented Apr 14, 2021

@yaseryacoob I hope my code is more "engineering" than "research" because I'm an engineer, not a research scientist in general 😂 But if you find it difficult, please feel free to write about it and I'll try to explain it. It will be great if my work helps to further research in GANs!

What about latent space: W+ space isn't equal between StyleGAN2 and MobileStyleGAN because:

  1. MobileStyleGAN has one building block less than StyleGAN2 (the reason is related to using wavelet-domain and had been described in the article).
  2. Building block of MobileStyleGAN has more skip connections to style than in StyleGAN2. You can look at the difference between building blocks for StyleGAN2 and MobileStyleGAN in code.

@yaseryacoob
Copy link
Author

Actually research and engineering blend when significant architecture changes are made. The changes you made can and should lead to different outcomes from StyleGAN2, but there are a number of issues, like the students/teacher framework. StyleGAN2 is of coarse a good teacher, but when does it stop and a student better than the teacher come to exist is a matter of deeper analysis. I obviously can't tell without further tinkering with the code if the student is better than the teacher given the architecture. The example above shows that given a specific W, you a bit less than StyleGAN2, but it doesn't mean that a W+delta can or can't match or beat StyleGAN2. Delta can be optimized or learned by an encoder. There are some interesting questions to answer.

@bes-dev
Copy link
Owner

bes-dev commented Apr 14, 2021

@yaseryacoob It's pretty cool idea is to train external networks that will predict the delta between student and teacher! If it works and computational complexity for both networks will be less than a teacher, it will be pretty results. I saw the same ideas about the iterative estimation of the output image in ReStyle paper: https://yuval-alaluf.github.io/restyle-encoder/ 🤔

@yaseryacoob
Copy link
Author

Yeah I forgot about the restyle approach! I had discussed things with them last week. You can see the thread there. We are missing the last 5-10% in quality, and I am not sure where it will come from, architecture/latent representation/optimization or god knows what. So testing your architecture by pushing it to the 100% is worth trying so we learn where the potential lies.

My gut feeling is that the architecture is got to be the first hurdle that needs improvement to perfectly capture all the frequencies adequately, it has to be just "right". I can't prove it for now... I was hoping your experiment will teach us something.

@bes-dev
Copy link
Owner

bes-dev commented Apr 15, 2021

@yaseryacoob I added an experiment with iterative refinement to my to-do list. I'll try it when I have free time :) It will be great if you research our work deeper and try to improve it too 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants