MVM3Det: A Novel Framework for Multi-view Monocular 3D Object Detection[arXiv].

Introduction

We propose a novel multiview monocular 3D detection network MVM3Det and a dataset MVM3D for multi-view detection in occlusion scenarios.

Our results are shown in row No.1, row 2 shows ground truth results, and row No. 3 shows MVDet results.

Table of content

MVM3Det: A Novel Framework for Multi-view Monocular 3D Object Detection[arXiv].
MVM3D dataset

MVM3Det

Code Preparations

Clone this repository into your local folder.
Prepare MVM3D dataset, please refer to Downloads for detailed instructions.
Open /code/EX_CONST.py and modify the variable called data_root as data_root = '/your_dataset_path/MVM3D'

Training

In progress...

Inference

Please make sure you have finished Code Preparations.

Best MVM3Det model for MVM3D dataset could be obtained in BaiduNetDisk (pwd: 09t1), download ppn.pth and mbon.pth models to folder /codes/pretrained_models.

cd codes
python infer.py

The inference program should automatically returns similar results to 95.9% MODA, 49.0% AP(IoU = 0.5) and 45.5% AOS(IoU=0.5) reported in the paper.

Credit

Our code mainly refers to these two repos: MVDet and simple-faster-rcnn-pytorch and 3D-BoundingBox. Some of the origin codes still exist in our implementation.

MVM3D dataset

The MVM3D dataset is designed for multiview 3D detection in occlusion scenarios. Currently, monocular 3D detection dataset and multiview monocular detection dataset have thrived and emerged in recent years. However, less algorithms and datasets focus on 3D detection with occlusions. Inspired by WildTrack, MultiviewX and RoboMaster Univeristy AI Challenge, we develop a multiview monocular sentry detection dataset, MVM3D dataset.

The dataset is based on IEEE ICRA 2021 RoboMaster AI Challenge, including battle robot cars as detection targets and block obstacles as occlusion. The battleground is defined as a 4.49 meter by 8 meter plane ground, containing 9 fixed blocks as obstacles. The images are captured by 2 syncronized cameras, and the resolution of each image is 640 by 480, the image exposure time varies from 15000 microseconds to 30000 microseconds. The frame rate is 10 and each frame contains 1 to 4 robot cars as targets.

Label information

Images from left and right cameras.
Ground truth label for:
- Robot car world coordinates, denoted as [x, y], using ij indexing.
- Robot car orientations, calculated in radius.
- Robot car classification labels, denoted as [0, 1, 2, 3].
- 2D/3D/BEV bounding boxes.
- Robot car 3D measurements.
Camera calibration matrix, including intrinsics and extrinsics.

Downloads

Part of the data could be downloaded from Baidu Netdisk (Pwd: 0wp5) and OneDrive.

Extract the downloaded zip file called MVM3D.zip into /your_dataset_path/.

The dataset folder structure should be like this:

your_dataset_path
└─ MVM3D

Note that this is NOT the final version of this dataset, more images and annotations are in progress.

Toolkits

Please refer to repo DRL-CASIA/NeuronsDataset for more details. The program are suppose to visualize the ground truth per-view 2D/3D boxes, birds-eye-view robot locations and orientations.

Evaluation

For a intergraterd metric evaluator, please see Metric Calculator.

Or seperately:

For localization performance, we use the same evaluation metrics in MultiviewX and WildTrack multiview pedestrian detection datasets, which are MODA, MODP, Precision and Recall. The evaluation toolkit could be referenced from here.
For 3D detection metrics, we use AP, AOS and OS introduced in KITTI benchmark. The evaluation toolkit could be referenced from here.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.idea		.idea
codes		codes
misc		misc
sample_datasets		sample_datasets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

codes

codes

misc

misc

sample_datasets

sample_datasets

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

MVM3Det: A Novel Framework for Multi-view Monocular 3D Object Detection[arXiv].

Introduction

Table of content

MVM3Det

Code Preparations

Training

Inference

Credit

MVM3D dataset

Label information

Downloads

Toolkits

Evaluation

About

Releases

Packages

Contributors 2

Languages

License

DRL-CASIA/MVM3D

Folders and files

Latest commit

History

Repository files navigation

MVM3Det: A Novel Framework for Multi-view Monocular 3D Object Detection[arXiv].

Introduction

Table of content

MVM3Det

Code Preparations

Training

Inference

Credit

MVM3D dataset

Label information

Downloads

Toolkits

Evaluation

About

Resources

License

Stars

Watchers

Forks

Languages