Orthographic Feature Transform for Monocular 3D Object Detection

This is a PyTorch implementation of the OFTNet network from the paper Orthographic Feature Transform for Monocular 3D Object Detection. The code currently supports training the network from scratch on the KITTI dataset - intermediate results can be visualised using Tensorboard. The current version of the code is intended primarily as a reference, and for now does not support decoding the network outputs into bounding boxes via non-maximum suppression. This will be added in a future update. Note also that there are some slight implementation differences from the original code used in the paper.

Setup

This repository is updated to run on the pytorch container on NGC
Install the dependencies by running the setup.sh

sh setup.sh

If you have the data in the data/kitti/objects folder you can skip the next step otherwise you can run download_data.sh to download and unzip the data. Finally, you can run training by following the steps below or by running the train.sh file. This assumes that the script is being run on an 8 GPU single node system. The batch size and number of data loader workers have been updated to enable optimal data loading into the GPU.

Training

The training script can be run by calling train.py with the name of the experiment as a required position argument.

python train.py name-of-experiment --gpu 0

By default data will be read from data/kitti/objects and model checkpoints will be saved to experiments. The model is trained using the KITTI 3D object detection benchmark which can be downloaded from here. See train.py for a full list of training options.

Inference

To decode the network predictions and visualise the resulting bounding boxes, run the infer.py script with the path to the model checkpoint you wish to visualise:

python infer.py /path/to/checkpoint.pth.gz --gpu 0

Citation

If you find this work useful please cite the paper using the citation below.

@article{roddick2018orthographic,  
  title={Orthographic feature transform for monocular 3d object detection},  
  author={Roddick, Thomas and Kendall, Alex and Cipolla, Roberto},  
  journal={British Machine Vision Conference},  
  year={2019}  
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
oft		oft
scripts		scripts
.gitignore		.gitignore
architecture.png		architecture.png
download_data.sh		download_data.sh
infer.py		infer.py
license.txt		license.txt
readme.md		readme.md
requirements.txt		requirements.txt
setup.sh		setup.sh
train.py		train.py
train.py.fused		train.py.fused
train.py.old		train.py.old

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oft

oft

scripts

scripts

.gitignore

.gitignore

architecture.png

architecture.png

download_data.sh

download_data.sh

infer.py

infer.py

license.txt

license.txt

readme.md

readme.md

requirements.txt

requirements.txt

setup.sh

setup.sh

train.py

train.py

train.py.fused

train.py.fused

train.py.old

train.py.old

Repository files navigation

Orthographic Feature Transform for Monocular 3D Object Detection

Setup

Training

Inference

Citation

About

Releases

Packages

Languages

License

avinashahuja/oft

Folders and files

Latest commit

History

Repository files navigation

Orthographic Feature Transform for Monocular 3D Object Detection

Setup

Training

Inference

Citation

About

Resources

License

Stars

Watchers

Forks

Languages