synthesize.AI

An implementation of Video-to-Video Synthesis for real-time synthesis of realistic image sequences from depth image stream, designed for more robust robotic development in simulated environments.

This project comes in two repositories. This repository, for general purpose scripts and documentation, and a forked version of the vid2vid repository which is modified to support 1 channel depth image as input. The presentation slides for this project are provided as Google Slides.

Prerequisites

Ubuntu 16.04 LTS
Python 3
NVIDIA GPU (compute capability 6.0+) & CUDA cuDNN
PyTorch 0.4 or higher

Setup

Installation

Install the required python libraries:
```
pip install dominate requests streamlit
```

Clone this repository to your home folder:

cd ~
git clone https://github.com/fniroui/synthesizeAI.git
cd depth2room

Clone the forked version of the vid2vid repository which has been modified for this project:
```
git clone https://github.com/fniroui/vid2vid.git
cd vid2vid
```
Download and compile a snapshot of FlowNet2 by running:
```
python scripts/download_flownet2.py
```

Download the FlowNet2 checkpoint:

python scripts/download_models_flownet2.py

Dataset

The SceneNet RGB-D dataset is used in this project. Download the complete or partial training dataset.
Navigate to the synthesizeAI directory and run:
```
python scripts/data/sceneNet_format.py --dir "sceneNet directory"
```
with the directory of the downloaded dataset to move and format the dataset to ./vid2vid/datasets/Scenenet.

Testing

Download the model and extract it to the .vid2vid/checkpoints folder:

https://drive.google.com/open?id=1ppXTHXsFaGB-vrNjJlPswWuVDrMka3zg

To use the provided test sequence located at ./vid2vid/dataset/sceneNet/test_A and test_B, run bash scripts/test/test_320.bash or:
```
bash scripts/test/test_320.bash
```

Training

Download the dataset and format it by following the above instructions.

If you have a single GPU, run bash scripts/train/train_g1_320.sh or:

cd ~/depth2room/vid2vid
python train.py --name depth2room_320_0 --dataroot datasets/sceneNet --input_nc 1 --loadSize 320 --n_downsample_G 2 --n_frames_total 2 --n_scales_spatial 2 -num_D 3 --max_frames_per_gpu 4 --max_dataset_size 20 --tf_log --display_freq 10

For multi-GPU training, run bash scripts/train/train_320.sh or:

cd ~/depth2room/vid2vid
python train.py --name depth2room_320_8g --dataroot datasets/sceneNet --input_nc 1 --loadSize 320 --gpu_ids 0,1,2,3,4,5,6,7 --n_gpus_gen 4 --n_frames_total 6 --niter_step 2 --niter_fix_global 8 --num_D 3 --n_scales_spatial 2 --tf_log --display_freq 100 --max_dataset_size 50

Analysis

The current model, trained using 50 sequences, can generate 2 synthetic images every second using a single NVIDIA Tesla V100 GPU. The surfaces have some texture and shadows are being generated:

License

This project is licensed under the MIT License - see the LICENSE.md file for details and the license of the other projects used within this repository.

Attribution

Thank you to Ting-Chun Wang¹, Ming-Yu Liu¹, Jun-Yan Zhu², Guilin Liu¹, Andrew Tao¹, Jan Kautz¹, and Bryan Catanzaro¹ for their fantastic work on Video-to-Video Synthesis.

¹NVIDIA Corporation, ²MIT CSAIL

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
images		images
scripts		scripts
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

scripts

scripts

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

Repository files navigation

synthesize.AI

Prerequisites

Setup

Installation

Dataset

Testing

Training

Analysis

License

Attribution

About

Releases

Packages

Contributors 2

Languages

License

fniroui/synthesizeAI

Folders and files

Latest commit

History

Repository files navigation

synthesize.AI

Prerequisites

Setup

Installation

Dataset

Testing

Training

Analysis

License

Attribution

About

Resources

License

Stars

Watchers

Forks

Languages