Detail architecture of dynamic instance normalization #18

zy-xc · 2020-12-22T12:08:00Z

Hello @ycjing
Thanks for your brilliant works! I am interesting in paper "Dynamic Instance Normalization for Arbitrary Style Transfer" but I don't know the detail architecture of DIN and can't find the supplementary material.
Would you please provide the detailed network architecture of this paper?
Thank you!

ycjing · 2020-12-23T08:56:41Z

Hi @zy-xc

Thank you for your interest in our work. Here is the link for the corresponding supplement: https://drive.google.com/file/d/1sBFXqWaWOeMuaaVHMM-ddBssKr3OmutW/view?usp=sharing

Please feel free to contact me if there is any other question. Thank you!

Best,
Yongcheng

zy-xc · 2020-12-23T14:17:37Z

Thank you for your reply!

I am a bit confusing about size of weight generated by Weight/Bias network. Is the dynamic convolution layer set groups = 64(num_channels of content feature) ?

It seens that the size of style image should be large if we set groups=1. For example, considering standard DIN with kernel_size=1. The weight size generated by Weight Net should be 64 * 64 * 1 * 1. So the vgg features size of style image should be at least 64 * 64 * 64(C * H * W), and the size of style image should be at least 512 * 512. Then if we want to train standard DIN with kernel_size=3, the size of style image should be at least 1536 * 1536.

Or standard DIN set groups=64 and the size of generated weight should be 64 * kernel_size * kernel_size ?

Thank you!

ycjing · 2020-12-24T01:55:52Z

Hi @zy-xc

Thank you for your interests in our work! Regarding your question, yes, we indeed set group # to be equal to the feature channel, which is indicated in the "Architecture Details" in the supplement. Also, please kindly note that the size of the generated weight and bias is not correlated with the input size, since we use an adaptive pooling layer in the corresponding weight and bias networks. You can set the desired size of the weight and bias by controlling the adaptive pooling layer.

Please let me know if there is any other question. Thank you.

Best,
Yongcheng

sonnguyen129 · 2021-12-02T07:58:58Z

I find the supplementary detail confusing to implement. Has anyone implemented in Pytorch yet? Can you help me?
Thank you so much

ycjing · 2021-12-02T08:03:57Z

Hi @sonnguyen129

Thank you for your interests in our work! Could you please elaborate which part exactly is confusing? I am more than happy to clarify it. Also, if you would like our source code, please drop me an email to apply for the necessary permission that is required by the company. Thanks!

Best,
Yongcheng

sonnguyen129 · 2021-12-02T08:17:47Z

Hi @ycjing
I sent you an email. I hope to hear from you as soon as possible.
Thank you.

sonnguyen129 · 2021-12-02T13:14:47Z

Hi @ycjing
I have a few questions as follows:

as I understand it, that's the correlation between proposed architecture and illustration. Am I correct? (Sorry for the bad drawing)

2.Res layer and upsampling layer is quite lacking in information and I don't know where it is on the illustration
3. With DIN module, when training, input is each image style separately or in batch. If batch, does the bactch size need to match the content dataset?
4. Shilei Wen's mail on paper(wenshilei@baidu.com) is currently incorrect
I hope to be of your help. Thanks very much.

ycjing · 2021-12-02T13:23:36Z

Hi @sonnguyen129

Yes.
Since it is quite redundant to show the residual connections in the figure, I just use the blocks to represent the corresponding residual modules. Our used residual blocks have no differences with those used in other tasks, just the most common ones.
We follow the settings in AdaIN. Please refer to https://github.com/naoto0804/pytorch-AdaIN
As I already mentioned in my email, you can alternatively contact Dr. Errui Ding. Other information is also already provided in the mail.

Thanks for your interests again! Please feel free to reach me if there is anything else that is not clear.

Cheers,
Yongcheng

ycjing · 2021-12-03T02:30:28Z

Hi @sonnguyen129

Could you please provide the detailed log information? Thanks!

Best,

sonnguyen129 · 2021-12-03T02:34:29Z

Here is my test case:

c = torch.rand(8,64,224,224)
s = torch.rand(8,64,224,224)
out = DIN(3)(c, s)
print(out)

Logs:

Traceback (most recent call last):
  File "model.py", line 136, in <module>
    out = DIN(3)(c, s)
  File "model.py", line 70, in __init__
    self.weight_bias = WeightAndBias(inp = inp)
  File "model.py", line 49, in __init__
    self.dwconv1 = DepthWiseConv2d(inp, 128, 3, 128, 2)
  File "model.py", line 10, in __init__
    groups = groups, stride = stride, padding = 1)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 432, in __init__
    False, _pair(0), groups, bias, padding_mode, **factory_kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 84, in __init__
    raise ValueError('in_channels must be divisible by groups')
ValueError: in_channels must be divisible by groups

ycjing · 2021-12-03T02:38:33Z

Hi @sonnguyen129

As depicted in the log, the group # is wrong, which should be equal to in_channel.

Best,
Yongcheng

sonnguyen129 · 2021-12-03T20:54:17Z

Hi @ycjing
I have 2 questions:

Can you provide information about the AdaptivePooling layer, specifically the target size.
Is add method in Fig 4 a concat channel or just like basic residual block?
Thank you so much.

sonnguyen129 · 2021-12-05T08:59:27Z

Hi @ycjing
I got error.

Traceback (most recent call last):
  File "model.py", line 197, in <module>
    out = WeightAndBias(512)(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 79, in forward
    out = self.dwconv2(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 25, in forward
    out = self.pointwise(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/instancenorm.py", line 59, in forward
    self.training or not self.track_running_stats, self.momentum, self.eps)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2325, in instance_norm
    _verify_spatial_size(input.size())
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2292, in _verify_spatial_size
    raise ValueError("Expected more than 1 spatial element when training, got input size {}".format(size))
ValueError: Expected more than 1 spatial element when training, got input size torch.Size([8, 64, 1, 1])

Here is my code:

class DepthWiseConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, groups, stride):
        super(DepthWiseConv2d, self).__init__()
        self.depthwise = nn.Sequential(
                nn.Conv2d(in_channels, in_channels, kernel_size = kernel_size,
                    groups = groups, stride = stride, padding = 1),
                nn.InstanceNorm2d(in_channels),
                nn.ReLU(True)
        )
        self.pointwise = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size = kernel_size,
                    stride = stride),
                nn.InstanceNorm2d(out_channels),
                nn.ReLU(True)
        )

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

class VGGEncoder(nn.Module):
    def __init__(self):
        super().__init__()
        vgg = vgg19(pretrained=True).features
        self.slice1 = vgg[: 2]
        self.slice2 = vgg[2: 7]
        self.slice3 = vgg[7: 12]
        self.slice4 = vgg[12: 21]
        for p in self.parameters():
            p.requires_grad = False

    def forward(self, images, output_last_feature=False):
        h1 = self.slice1(images)
        h2 = self.slice2(h1)
        h3 = self.slice3(h2)
        h4 = self.slice4(h3)
        if output_last_feature:
            return h4
        else:
            return h1, h2, h3, h4

class WeightAndBias(nn.Module):
    """Weight/Bias Network"""

    def __init__(self, in_channels = 512):
        super(WeightAndBias,self).__init__()
        self.dwconv1 = DepthWiseConv2d(in_channels, 128, 3, 128, 2)
        self.dwconv2 = DepthWiseConv2d(128, 64, 3, 64, 2)
        # self.adapool1 = nn.AdaptiveMaxPool2d()
        self.dwconv3 = DepthWiseConv2d(64, 64, 3, 64, 2)
        # self.adapool2 = nn.AdaptiveMaxPool2d()

    def forward(self, x):
        out = self.dwconv1(x)
        out = self.dwconv2(out)
        print(out.shape)
        # out = self.adapool1(out)
        out = self.dwconv3(out)
        # out = self.adapool2(out)
        return out
#test case
s = torch.rand(8,3,256,256)
out = VGGEncoder()(s, True)
out = WeightAndBias(512)(out)
print(out.shape)

Hope you help me. Thank you so much.

ycjing · 2021-12-05T13:47:52Z

Hi @ycjing I have 2 questions:

Can you provide information about the AdaptivePooling layer, specifically the target size.

Is add method in Fig 4 a concat channel or just like basic residual block?
Thank you so much.

Our adaptive pooling layer is defined as follows:
nn.AdaptiveAvgPool2d((1,1))
Please be noted that the 'add' operation is not part of the residual blocks. It simply adds the output feature maps from the first few layers and the last few layers.

ycjing · 2021-12-05T13:49:33Z

Hi @ycjing I got error.

Traceback (most recent call last):
  File "model.py", line 197, in <module>
    out = WeightAndBias(512)(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 79, in forward
    out = self.dwconv2(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "model.py", line 25, in forward
    out = self.pointwise(out)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/modules/instancenorm.py", line 59, in forward
    self.training or not self.track_running_stats, self.momentum, self.eps)
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2325, in instance_norm
    _verify_spatial_size(input.size())
  File "/home/truongson/.local/bin/.virtualenvs/dl4cv/lib/python3.6/site-packages/torch/nn/functional.py", line 2292, in _verify_spatial_size
    raise ValueError("Expected more than 1 spatial element when training, got input size {}".format(size))
ValueError: Expected more than 1 spatial element when training, got input size torch.Size([8, 64, 1, 1])

Here is my code:

class DepthWiseConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, groups, stride):
        super(DepthWiseConv2d, self).__init__()
        self.depthwise = nn.Sequential(
                nn.Conv2d(in_channels, in_channels, kernel_size = kernel_size,
                    groups = groups, stride = stride, padding = 1),
                nn.InstanceNorm2d(in_channels),
                nn.ReLU(True)
        )
        self.pointwise = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size = kernel_size,
                    stride = stride),
                nn.InstanceNorm2d(out_channels),
                nn.ReLU(True)
        )

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

class VGGEncoder(nn.Module):
    def __init__(self):
        super().__init__()
        vgg = vgg19(pretrained=True).features
        self.slice1 = vgg[: 2]
        self.slice2 = vgg[2: 7]
        self.slice3 = vgg[7: 12]
        self.slice4 = vgg[12: 21]
        for p in self.parameters():
            p.requires_grad = False

    def forward(self, images, output_last_feature=False):
        h1 = self.slice1(images)
        h2 = self.slice2(h1)
        h3 = self.slice3(h2)
        h4 = self.slice4(h3)
        if output_last_feature:
            return h4
        else:
            return h1, h2, h3, h4

class WeightAndBias(nn.Module):
    """Weight/Bias Network"""

    def __init__(self, in_channels = 512):
        super(WeightAndBias,self).__init__()
        self.dwconv1 = DepthWiseConv2d(in_channels, 128, 3, 128, 2)
        self.dwconv2 = DepthWiseConv2d(128, 64, 3, 64, 2)
        # self.adapool1 = nn.AdaptiveMaxPool2d()
        self.dwconv3 = DepthWiseConv2d(64, 64, 3, 64, 2)
        # self.adapool2 = nn.AdaptiveMaxPool2d()

    def forward(self, x):
        out = self.dwconv1(x)
        out = self.dwconv2(out)
        print(out.shape)
        # out = self.adapool1(out)
        out = self.dwconv3(out)
        # out = self.adapool2(out)
        return out
#test case
s = torch.rand(8,3,256,256)
out = VGGEncoder()(s, True)
out = WeightAndBias(512)(out)
print(out.shape)

Hope you help me. Thank you so much.

Hi @sonnguyen129

Please refer to my previous reply and be careful about the output dimensions.

Best,

sonnguyen129 · 2021-12-06T00:53:53Z

Hi @ycjing
Thanks for your reply, thanks to that I fixed the error. Despite reading the paper quite carefully, I still don't understand how Weight/Bias Network generates weight and bias. How to get that weight and bias in Pytorch?
Thank you so much

ycjing · 2021-12-07T07:38:17Z

Hi @sonnguyen129

Thank you for your interests. From your code, I think you have already got the point, i.e., dynamically predicting the weight and bias via the weight and bias networks. Could you please further elaborate your question? Thanks!

Best,

sonnguyen129 · 2021-12-08T04:05:21Z

Hi @ycjing
Sorry for my unclear question. As I understand it, the style image after encoded by VGG will go through the weight and bias network. Do the generated weight and bias are the weights and biases of the last conv layer of the weight/bias network?(In my code in dwconv3).
Thank you so much.

ycjing · 2021-12-08T08:48:50Z

Hi @sonnguyen129

No problem. The weight and bias are, actually, the output of the corresponding weight/bias networks, which is somewhat similar to the dynamic filter network (https://arxiv.org/abs/1605.09673).

Cheers,
Yongcheng

sonnguyen129 · 2021-12-11T03:08:19Z

Hi @ycjing
I already read dynamic filter network. However, if the weight and bias are both outputs of the network, the values will be the same, right? But when reading about dynamic convolution in Pytorch, the weight and bias should be different. I hope you answer. Thank you so much.

ycjing · 2021-12-14T07:29:49Z

Hi @sonnguyen129

Thank you for your interest. The values are, in fact, not the same. As demonstrated in the figure and explained in the paper, we use a separate weight net and bias net to produce the corresponding weight and bias.

Best,
Yongcheng

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detail architecture of dynamic instance normalization #18

Detail architecture of dynamic instance normalization #18

zy-xc commented Dec 22, 2020 •

edited

ycjing commented Dec 23, 2020

zy-xc commented Dec 23, 2020 •

edited

ycjing commented Dec 24, 2020 •

edited

sonnguyen129 commented Dec 2, 2021

ycjing commented Dec 2, 2021

sonnguyen129 commented Dec 2, 2021

sonnguyen129 commented Dec 2, 2021

ycjing commented Dec 2, 2021 •

edited

ycjing commented Dec 3, 2021

sonnguyen129 commented Dec 3, 2021 •

edited

ycjing commented Dec 3, 2021

sonnguyen129 commented Dec 3, 2021

sonnguyen129 commented Dec 5, 2021

ycjing commented Dec 5, 2021

ycjing commented Dec 5, 2021

sonnguyen129 commented Dec 6, 2021

ycjing commented Dec 7, 2021

sonnguyen129 commented Dec 8, 2021

ycjing commented Dec 8, 2021

sonnguyen129 commented Dec 11, 2021

ycjing commented Dec 14, 2021

Detail architecture of dynamic instance normalization #18

Detail architecture of dynamic instance normalization #18

Comments

zy-xc commented Dec 22, 2020 • edited

ycjing commented Dec 23, 2020

zy-xc commented Dec 23, 2020 • edited

ycjing commented Dec 24, 2020 • edited

sonnguyen129 commented Dec 2, 2021

ycjing commented Dec 2, 2021

sonnguyen129 commented Dec 2, 2021

sonnguyen129 commented Dec 2, 2021

ycjing commented Dec 2, 2021 • edited

ycjing commented Dec 3, 2021

sonnguyen129 commented Dec 3, 2021 • edited

ycjing commented Dec 3, 2021

sonnguyen129 commented Dec 3, 2021

sonnguyen129 commented Dec 5, 2021

ycjing commented Dec 5, 2021

ycjing commented Dec 5, 2021

sonnguyen129 commented Dec 6, 2021

ycjing commented Dec 7, 2021

sonnguyen129 commented Dec 8, 2021

ycjing commented Dec 8, 2021

sonnguyen129 commented Dec 11, 2021

ycjing commented Dec 14, 2021

zy-xc commented Dec 22, 2020 •

edited

zy-xc commented Dec 23, 2020 •

edited

ycjing commented Dec 24, 2020 •

edited

ycjing commented Dec 2, 2021 •

edited

sonnguyen129 commented Dec 3, 2021 •

edited