Skip to content

yccyenchicheng/p2pvg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Point-to-Point Video Generation

teaser

paper | project page | video

Tsun-Hsuan Wang*, Yen-Chi Cheng*, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun (* indicate equal contribution)

IEEE International Conference on Computer Vision (ICCV), 2019

This repo is the implementation of our ICCV 2019 paper: "Point-to-Point Video Generation" in PyTorch.

Paper: arXiv, CVF Open Access

Point-to-Point (P2P) Video Generation. Given a pair of (orange) start- and (red) end-frames in the video and 3D skeleton domains, our method generates videos with smooth transitional frames of various lengths.

Results

Generation with various length.

dylen

Multiple control points generation.

mulcp

Loop generation.

loop

Getting Started

Requirements

  • OS: Ubuntu 16.04
  • NVIDIA GPU + CUDA
  • Python 3.6
  • PyTorch 1.0
  • TensorFlow (for Tensorboard)

Prepare dataset

First clone this repo:

git clone https://github.com/yccyenchicheng/p2pvg.git
cd p2pvg

Then create a directory data_root, and for each of the dataset we used:

  • MovingMNIST. The testing sequence is created on the fly. Hence there is no need to preprocess or prepare anything for this dataset.

  • Weizmann. We crop each frame based on the bounding box from this url. Thus you can download the dataset from the above url and preprocess yourself. Also, you can download ours from this link. Extract the downloaded .zip file and put it under data_root.

  • Human 3.6M. First you have to download the dataset from this url. Then put it under data_root/processed/.

  • BAIR Robot Pushing. Download the dataset from this url (~30 gb). Then follows the steps below:

    • Create a directory data_root/bair, put the downloaded .tar file under data_root/bair and extract the .tar file
    tar -xvf data_root/bair/bair_robot_pushing_dataset_v0.tar -C data_root/bair
    
    • Then use the script data/convert_bair.py implemented in this repo to convert the data:
    python data/convert_bair.py --data_dir data_root/bair
    

    this will create the directory data_root/bair/preprocessed_data and the training data will be stored under it.

Usage

Training

To train with Stochastic MovingMNIST, run

python train.py --dataset mnist --channels 1 --num_digits 2 --max_seq_len 30 --n_past 1 \\
--weight_cpc 100 --weight_align 0.5 --skip_prob 0.5 --batch_size 100 \\
--backbone dcgan --beta 0.0001 --g_dim 128 --z_dim 10 --rnn_size 256

and the results, model checkpoints and .event files will stored in logs/. To visualize the training, run

tensorboard --logdir logs

and go to 127.0.0.1:6006 in your browser to see the visualization. To train with other datasets, replace --dataset <other_dataset>, the corresponding channels --channels <n_channels> and other parameters of your choices in the command.

P2P Generate

Given a video and a trained model, perform p2p generation via the following command:

python generate.py --ckpt <model.pth> --video <your_video.mp4>

and the output will be stored at gen_outputs.

Citation

@article{p2pvg2019,
  title={Point-to-Point Video Generation},
  author={Wang, Tsun-Hsuan and Cheng, Yen-Chi and Hubert Lin, Chieh and Chen, Hwann-Tzong and Sun, Min},
  journal={arXiv preprint},
  year={2019}
}

@inproceedings{p2pvg2019,
  title={Point-to-Point Video Generation},
  author={Wang, Tsun-Hsuan and Cheng, Yen-Chi and Hubert Lin, Chieh and Chen, Hwann-Tzong and Sun, Min},
  booktitle={The IEEE International Conference on Computer Vision (ICCV)},
  month={October},
  year={2019}
}

Acknowledgments

This code borrows heavily from the SVG. And we also adapt the code from VideoPose3D for the preprocessing of Human 3.6M. A huge thanks to them! :D