Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Pixel Shuffle #21

Open
chautuankien opened this issue Nov 24, 2022 · 2 comments
Open

About Pixel Shuffle #21

chautuankien opened this issue Nov 24, 2022 · 2 comments

Comments

@chautuankien
Copy link

It is very interesting that you use Pixel Shuffle and Channel Attention for motion estimation without estimating optical flow.

I want to ask that in the paper you said that using Pixel Shuffle to maintain the large receptive field, so I want to ask how PS can do that.

One more question, in VFI, I usually see that people will use again the input images to reconstruct the color for the middle. So how just by applying Up Shuffle you can synthesize the middle frame?

Thank you.

@myungsub
Copy link
Owner

Hi @chautuankien , thanks for you interest in our work.

I want to ask that in the paper you said that using Pixel Shuffle to maintain the large receptive field, so I want to ask how PS can do that.

PixelShuffle downscales the spatial resolution (H x W) and increases the channel dimension (C), so applying convolution with the same kernel size can cover a larger region.

For instance, if you apply a 3x3(xC) kernel to a H x W x C, the receptive field will be just 3 x 3, but if you "downshuffle" the data to H/2 x W/2 x 4C and apply the 3x3(x4C) kernel, the receptive field will be twice larger for each spatial dimension.

One more question, in VFI, I usually see that people will use again the input images to reconstruct the color for the middle. So how just by applying Up Shuffle you can synthesize the middle frame?

From what I've understood, I think you're talking about optical flow based models that use the input images for warping. Our model focuses on direct synthesis without flow-based warping, so the method is very different. There are pros and cons for each method, but flow-based works are more popular these days, to be frank..

@chautuankien
Copy link
Author

chautuankien commented Nov 28, 2022

Thank you so much for your reply.

So, for the first question, how PS works is like Pooling layer, right? For example, in case of Max Pooling of stride 2, it chooses the maximum value in a 2x2 grid, to down-sampling H x W to H/2 x W/2. Therefore, the receptive field will be twice larger.

For the second question, from what I've understood, is your method a CNN-based method? You will use CNNs to directly synthesize the intermediate frame.

Another question is, why did you choose to down-shuffle only once but not more? (just like an encoder-decoder-based network, where Pooling layer is used more than once to down-sampling the data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants