Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frame malposition using transformed video #264

Open
lujingqiao opened this issue Feb 26, 2021 · 1 comment
Open

frame malposition using transformed video #264

lujingqiao opened this issue Feb 26, 2021 · 1 comment

Comments

@lujingqiao
Copy link

when i transforming video, the result video is frame malposition,see below two pic

image
image

is there other partner get it?

@hude-as
Copy link

hude-as commented Jan 11, 2022

Heya !

I had the same issue and address it with a quick fix.
I dont have enough background in NN to fix it properly.

TLDR; make sure the output of transform method has the same dimensions as your output video.

Context and debug

The issue come from the shape reduction and augmentation in the src/transform.py file that lead to a different format.

For example with a video of 640x338 you will output frames of 640x340, if you debug the network layer by layer
conv1 shape = (4, 338, 640, 32)
conv2 shape = (4, 169, 320, 64)
conv3 shape = (4, 85, 160, 128)
resid1 shape = (4, 85, 160, 128)
resid2 shape = (4, 85, 160, 128)
resid3 shape = (4, 85, 160, 128)
resid4 shape = (4, 85, 160, 128)
resid5 shape = (4, 85, 160, 128)
conv_t1 shape = (4, 170, 320, 64)
conv_t2 shape = (4, 340, 640, 32)

You can see the shape of each layer and notice the difference between the first and the last. (640x338 vs 640x340)
338 --> 169 --> 85 (instead of 84.5) --> 170 --> 340

In the evaluate.py file, when you write the video, the output file is written with frames of same size as the original clip, nethertheless the NN will output 640x340 images and this will slightly move your video frame by frame (in my example, 2 row at a time)

Quick fix :
Compute the video output size based on the preds shape instead of original clip

def ffwd_video(path_in, path_out, checkpoint_dir, device_t='/gpu:0', batch_size=1):
        [...]
        preds_size = [preds.shape[2], preds.shape[1]]
        video_writer = ffmpeg_writer.FFMPEG_VideoWriter(path_out, preds_size, video_clip.fps, codec="libx264",
                                                        preset="slow", bitrate="20000k",
                                                        audiofile=path_in, threads=None,
                                                        ffmpeg_params=None)
        [...]

A probably better way to deal with this is to keep the same output format as the input in the transform.net function.

Hope it will help in future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants