Skip to content

zenithfang/supervised_dispnet

Repository files navigation

Towards Good Practice for CNN Based Monocular Depth Estimation

This codebase is an official PyTorch implementation of the system described in the paper:
Towards Good Practice for CNN Based Monocular Depth Estimation
Zhicheng Fang, Xiaoran Chen, Yuhua Chen, Luc Van Gool In WACV 2020

Preamble

This codebase was developed and tested with Pytorch 0.4.1, CUDA 9.1 and Ubuntu 16.04. And it is built based on SfMLearner Pytorch version

Prerequisite

pip install -r requirements.txt

or install manually the following packages :

pytorch >= 0.4.1
imageio
scipy
argparse
tensorboardX
blessings
progressbar2
path.py
tqdm
torchvision
scikit-image

It is also advised to have python3 bindings for opencv for tensorboard visualizations

Preparing training data

Preparation is roughly the same as in the SfMLearner Pytorch version.

For KITTI, first download the dataset using this script provided on the official website, and then run the following command. The --with-depth option will save resized copies of groundtruth to help you setting hyper parameters. The --with-pose will dump the sequence pose in the same format as Odometry dataset (see pose evaluation)

python3 data/prepare_train_data.py /path/to/raw/kitti/dataset/ --dataset-format 'kitti' --dump-root /path/to/resulting/formatted/data/ --width 416 --height 128 --num-threads 8 [--static-frames data/static_frames.txt] [--with-depth] [--with-pose]

For NYU, first download the dataset using this script provided on the official website, then follow the instruction below and corresponding file like process_raw.m is saved in data/nyudepth_preparation.

How to process the training dataset:
1.) Extract the RAW dataset into a folder A (name not important)
2.) Download NYU Depth v2. toolbox from http://cs.nyu.edu/~silberman/code/toolbox_nyu_depth_v2.zip
3.) Extract scripts from the toolbox to folder 'tools' in folder A
4.) Write script with function supplied by toolbox in folder A and run it
5.) python nyud_raw_train_to_npy.py (modify the paths in that file to point to correct dirs, and also the resolution of training images can be modified here)\

How to process the testing dataset:
1.) Download labeled NYU Depth v2. dataset from http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
2.) Download splits.mat containing official train/test split http://horatio.cs.nyu.edu/mit/silberman/indoor_seg_sup/splits.mat
3.) Place all downloaded files into single folder
4.) Run script nyud_test_to_npy.py (modify the paths in that file to point to correct dirs)\

Training

Once the data are formatted following the above instructions, you should be able to train the model by running the following command

python3 train.py /path/to/the/formatted/data/ -b4 -m0.0 -s0.0 --epoch-size 3000 --sequence-length 3 --log-output [--with-gt] --network disp_vgg_BN [--pretrained-encoder] [--imagenet-normalization] --loss L1 --dataset nyu [--pretrained-disp /path/to/the/existing_weights/]

You can then start a tensorboard session in this folder by

tensorboard --logdir=checkpoints/

and visualize the training progress by opening https://localhost:6006 on your browser. If everything is set up properly, you should start seeing reasonable depth prediction after ~30K iterations when training on KITTI. As for the NYU Depth, image is not saved in tensorboard and it takes about over 40 epochs. This epoch means going through whole dataset, since our NYU Depth dataset consists of 67837 images and batchsize is 4, thus 17459 is the corresponding epochsize.

Evaluation

Disparity map generation can be done with run_inference.py

python3 run_inference.py --pretrained /path/to/dispnet --dataset-dir /path/pictures/dir --output-dir /path/to/output/dir --network disp_vgg_BN [--imagenet-normalization]

Will run inference on all pictures inside dataset-dir and save a jpg of disparity (or depth) to output-dir for each one see script help (-h) for more options.

Disparity evaluation is avalaible

python3 test_disp.py --pretrained-dispnet /path/to/dispnet --pretrained-posenet /path/to/posenet --dataset-dir /path/to/KITTI_raw --dataset-list /path/to/test_files_list --network disp_vgg_BN [--imagenet-normalization]

notice that imagenet-normalization is quite important if the encoder is pretrained on imagenet dataset.

Test file list is available in kitti eval folder.

Pretrained Nets

dataset specification Link
kitti disp_vgg_BN with L1 loss Download
nyu disp_vgg_BN with L1 loss Download

KITTI Depth Results

specification Abs Rel Sq Rel RMSE RMSE(log) Acc.1 Acc.2 Acc.3
disp_vgg_BN with L1 loss 0.105 0.723 4.537 0.186 0.873 0.959 0.983

NYU Depth Results

specification Abs Rel Sq Rel RMSE RMSE(log) Acc.1 Acc.2 Acc.3
disp_vgg_BN with L1 loss 0.102 0.075 0.410 0.157 0.868 0.962 0.988

Reference

If you find our work useful in your research please consider citing our paper:

@inproceedings{fang2020towards,
  title     = {Towards Good Practice for CNN Based Monocular Depth Estimation},
  author    = {Zhicheng Fang, Xiaoran Chen, Yuhua Chen, Luc Van Gool},
  booktitle = {Winter Conference on Applications of Computer Vision (WACV)},
  year = {2020}
}

About

Towards Good Practice for CNN Based Monocular Depth Estimation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published