Questions and thoughts on model size and performance #81

jackylu0124 · 2022-03-22T01:23:14Z

jackylu0124
Mar 22, 2022

First of all, thank you for this incredible project!

I would like to hear about some of your insights on the trade-off between model performance and model quality, especially with regards to DiOr. One thing that I have observed during some profiling experiments of the DiOr pipeline is that some of the models are relatively large and require a relatively large number of FLOPs during inference, which makes it unsuitable for some application scenarios. For example, among all models some of the largest would be the generator network (~16.5 million parameters) at the end of pipeline, the flow network (~6.6 million parameters), and the segment encoder network (~1.2 million parameters).

I dug into the aforementioned models’ architecture a bit, and at first glance I found that convolutions with large kernel sizes (5 or 7) are used at quite a few locations, such as inside the ContentEncoder and Decoder in the generator, and the ADGANEncoder component in the segment encoder network. Do you think it would be a good idea to change these convolution layers with large kernel sizes into a series/stack of smaller ones in order to boost model performance and quality? For example, a stack of 3x3 convolution layers with stride of 1 would have the same receptive field as one 7x7 convolution layer, and not only do they have fewer parameters (3*(3^2C^2)) than the 7x7 convolution layer (7^2C^2), they will also give the model more non-linearities due to the greater depth.

Another idea that I have been toying around is the possibility of using separable convolutions like the ones used in MobileNets in order to reduce model size and latency. Nevertheless, I don’t know if using the separation convolution strategy would have lead a noticeable impact on the quality of the results produced by the DiOr models. I would really love to hear your opinions and thoughts on this.

Thanks a lot for the great work again!

cuiaiyu · 2022-03-23T00:04:14Z

cuiaiyu
Mar 23, 2022
Maintainer

Hi! This is a very interesting topic and thank you for the thoughtful question!

First, for the convolution specs (e.g., the [7,1,3], [4,2,1] kenel/stride/padding) in encoders and decoder, we directly inherited those hyperparameters from the prior work ADGAN for a fair comparison, so we have never tuned them. Therefore, they are not necessary (and very likely not) the optimal choices for the convolution layers.

From my intuition and limited experiments for those kernel sizes, replacing those with stacked 3x3 convolution kernels would not change the performance too much if there is any, because the convolution kernel size in the encoders/decoder is not the crucial part for DiOr to get its performance. Changing the channel size (--ngf) or the resnet blocks in the decoder would have a more direct impact on the performance.

For MobileNets-like convolutions, I personally never play around with them before, so it's hard to tell the behaviors before any experiments. However, I think this would be a very meaningful follow-up and I would like to know if it could make the model more computationally efficient as well! :)

0 replies

jackylu0124 · 2022-03-23T15:17:13Z

jackylu0124
Mar 23, 2022
Author

Thank you very much for your reply and the valuable insights! You brought up a very good point, the convolution layers with the large kernel sizes aren’t the ones that make up the core/bulk of the encoder and decoder, and modifying the channel size or the res blocks might lead to a more noticeable improvement to the inference performance.

In terms of using strategies from MobileNets in GANs, I investigated a bit online and found that there is certainly more literature that focus on techniques for improving performance in the traditional CV deep learning space (detection, segmentation, etc.) than in the relatively nascent “generative” deep learning space. Perhaps it’s because GANs are still famous for their power and the wow factor in generating entirely original photorealistic images, and that’s likely the key aspect that many have been working on. It would be very interesting like you said to explore the possibilities of integrating some of the techniques from the “traditional” CV deep learning space into GANs. There are now more and more practical application scenarios for GANs in real life, and the demand for high-quality yet fast results has never been greater.

Again, thank you very much for your insights and I cannot wait to see what you come up with next!

0 replies

imr555 · 2022-04-11T08:56:45Z

imr555
Apr 11, 2022

@jackylu0124 , If you are interested in effcient Depthwise Seperable Convolution based image generation in the GAN Space, you might find this work quite interesting. They provide comparisons with favorable GAN architectures before them

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis
https://arxiv.org/abs/2101.04775

It was published at ICLR21 and the infamous lucidrains created an awesome working repository based upon the conclusions in the paper in the following repo.
https://github.com/lucidrains/lightweight-gan

0 replies

cuiaiyu · 2022-04-11T17:32:29Z

cuiaiyu
Apr 11, 2022
Maintainer

By the way, this gan-compression paper introduces a distilling method to make an "equivalently" performing but much more efficient network copy for inference purpose, which might also be helpful for reducing FLOPs.

https://arxiv.org/pdf/2003.08936.pdf

0 replies

jackylu0124 · 2022-04-17T03:01:55Z

jackylu0124
Apr 17, 2022
Author

Hi Ifty and Aiyu,

Thank you very much for sharing the papers! I had some more time today and read them in details, and they are indeed very intriguing! It's interesting that the two papers choose two quite different routes in tackling the performance issues commonly found in GAN.

Based on my preliminary impressions of these two papers thus far, Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis seems to focus more on "human-driven" architectural design change, such as the introduction of the Skip-Layer channel-wise Excitation (SLE) module, and with an emphasis on unconditional GAN. On the other hand, GAN Compression: Efficient Architectures for Interactive Conditional GANs focuses on using distillation techniques as well as "machine-oriented" architecture search for improving the model's inference performance.

Both of them are very rich in technical details and ideas, and I definitely need to spend some more time to digest them. Thank you so much again for sharing these interesting works with me!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions and thoughts on model size and performance #81

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Questions and thoughts on model size and performance #81

jackylu0124 Mar 22, 2022

Replies: 5 comments

cuiaiyu Mar 23, 2022 Maintainer

jackylu0124 Mar 23, 2022 Author

imr555 Apr 11, 2022

cuiaiyu Apr 11, 2022 Maintainer

jackylu0124 Apr 17, 2022 Author

jackylu0124
Mar 22, 2022

cuiaiyu
Mar 23, 2022
Maintainer

jackylu0124
Mar 23, 2022
Author

imr555
Apr 11, 2022

cuiaiyu
Apr 11, 2022
Maintainer

jackylu0124
Apr 17, 2022
Author