add Stereo and KM3D

improve documentations
Owen-Liuyuxuan · Mar 18, 2021 · b3f8f6e · b3f8f6e
1 parent 920f6f8
commit b3f8f6e
Show file tree

Hide file tree

Showing 44 changed files with 4,745 additions and 52 deletions.
diff --git a/README.md b/README.md
@@ -3,11 +3,11 @@
 This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from the current directory, and treat ./visualDet3D as a package that we could modify and test directly instead of a library. Several useful scripts are provided in the main directory for easy usage.
 
 We believe that visual tasks are interconnected, so we make this library extensible to more experiments. 
-The package uses registry to register datasets, models, processing functions and more allowing easy inserting of new tasks/models while not interfere with the existing ones.
+The package uses registry to register datasets, models, processing functions and more, allowing easy inserting of new tasks/models while not interfere with the existing ones.
 
 ## Related Paper:
 
-This repo contains the official implementation of 2021 *RAL* paper [**Ground-aware Monocular 3D Object Detection for Autonomous Driving**](https://ieeexplore.ieee.org/document/9327478). [Arxiv Page](https://arxiv.org/abs/2102.00690). Pretrained model can be found at release pages.
+This repo contains the official implementation of 2021 *RAL* \& *ICRA* paper [**Ground-aware Monocular 3D Object Detection for Autonomous Driving**](https://ieeexplore.ieee.org/document/9327478). [Arxiv Page](https://arxiv.org/abs/2102.00690). Pretrained model can be found at [release pages](https://github.com/Owen-Liuyuxuan/visualDet3D/releases/tag/1.0).
 ```
 @ARTICLE{9327478,
   author={Y. {Liu} and Y. {Yuan} and M. {Liu}},
@@ -16,6 +16,20 @@ This repo contains the official implementation of 2021 *RAL* paper [**Ground-awa
   year={2021},
   doi={10.1109/LRA.2021.3052442}}
 ```
+
+Also the official implementation of 2021 *ICRA* paper [**YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection**](https://arxiv.org/abs/2103.09422). Pretrained model can be found at [release pages](https://github.com/Owen-Liuyuxuan/visualDet3D/releases/tag/1.1).
+```
+@inproceedings{liu2021yolostereo3d,
+  title={YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection},
+  author={Yuxuan Liu and Lujia Wang and Ming, Liu},
+  booktitle={2021 International Conference on Robotics and Automation (ICRA)},
+  year={2021},
+  organization={IEEE}
+}
+```
+
+We further incorperate an *Unofficial* re-implementation of **Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training** (KM3D) as a reference on how to integrate with other frameworks. (Notice that the codes are from the [originally official repo](https://github.com/Banconxuan/RTM3D), and we **DO NOT** guarantee a complete re-implementation).
+
 ## Key Features
 
 - **SOTA Performance** State of the art result on visual 3D detection.
@@ -26,7 +40,7 @@ This repo contains the official implementation of 2021 *RAL* paper [**Ground-awa
 - **Global Path-based IMDB** Do not need data placed inside the folder, convienient for managing data and code separately.
 
 
-We provide start-up solutions for [Mono3D](docs/mono3d.md), [Depth Predictions](docs/monoDepth.md) and more (until further publication).
+We provide start-up solutions for [Mono3D](docs/mono3d.md), [Stereo3D](docs/stereo3d.md), [Depth Predictions](docs/monoDepth.md) and more (until further publication).
 
 Reference: this repo borrows codes and ideas from [retinanet](https://github.com/yhenon/pytorch-retinanet),
 [mmdetection](https://github.com/open-mmlab/mmdetection),
@@ -44,13 +58,13 @@ pip3 install -r requirement.txt
 or manually check dependencies.
 
 ```bash
-# build ops (deform convs), We will not install operations into the system environment
+# build ops (deform convs and iou3d), We will not install operations into the system environment
 ./make.sh
 ```
 
 ## Start Training
 
-Please check the corresponding task: [Mono3D](docs/mono3d.md), [Depth Predictions](docs/monoDepth.md). More demo will be available through contributions and further paper submission.
+Please check the corresponding task: [Mono3D](docs/mono3d.md), [Stereo3D](docs/stereo3d.md) [Depth Predictions](docs/monoDepth.md). More demo will be available through contributions and further paper submission.
 
 ### Config and Path setup. 
 
@@ -78,12 +92,15 @@ Please check the template's comments and other comments in codes to fully exploi
 ## Other Resources
 
 - [RAM-LAB](https://www.ram-lab.com)
-- [Collections of Papers and Readings](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/); [Collection for Mono3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/RecentCollectionForMono3D/); [Ground-Aware 3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/GroundAwareConvultion/)
+- [Collections of Papers and Readings](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/);
+-  [Collection for Mono3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/RecentCollectionForMono3D/); [Ground-Aware 3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/GroundAwareConvultion/)
+- [Collection for Stereo3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/RecentCollectionForStereo3D/); [YOLOStereo3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/YOLOStereo3D/)
 
 ## Related Codes
 
 - [MMDetection](https://github.com/open-mmlab/mmdetection)
 - [M3D-RPN](https://github.com/garrickbrazil/M3D-RPN)
 - [Retinanet](https://github.com/yhenon/pytorch-retinanet)
 - [DORN](https://github.com/dontLoveBugs/SupervisedDepthPrediction)
-- [det3](https://github.com/pyun-ram/FL3D)
+- [det3](https://github.com/pyun-ram/FL3D)
+- [RTM3D](https://github.com/Banconxuan/RTM3D)
diff --git a/config/KM3D_example b/config/KM3D_example
@@ -0,0 +1,165 @@
+from easydict import EasyDict as edict
+import os 
+import numpy as np
+
+cfg = edict()
+cfg.obj_types = ['Car', 'Pedestrian', 'Cyclist']
+cfg.anchor_prior = False
+## trainer
+trainer = edict(
+    gpu = 0,
+    max_epochs = 200,
+    disp_iter = 50,
+    save_iter = 20,
+    test_iter = 20,
+    cudnn = True,
+    training_func = "train_rtm3d",
+    test_func = "test_mono_detection",
+    evaluate_func = "evaluate_kitti_obj",
+)
+
+cfg.trainer = trainer
+
+## path
+path = edict()
+path.data_path = "/home/kitti_obj/training"
+path.test_path = "/home/kitti_obj/testing"
+path.visualDet3D_path = "/home/stereo_kitti/visualDet3D"
+path.project_path = "/home/stereo_kitti/workdirs"
+
+if not os.path.isdir(path.project_path):
+    os.mkdir(path.project_path)
+path.project_path = os.path.join(path.project_path, 'RTM3D')
+if not os.path.isdir(path.project_path):
+    os.mkdir(path.project_path)
+
+path.log_path = os.path.join(path.project_path, "log")
+if not os.path.isdir(path.log_path):
+    os.mkdir(path.log_path)
+
+path.checkpoint_path = os.path.join(path.project_path, "checkpoint")
+if not os.path.isdir(path.checkpoint_path):
+    os.mkdir(path.checkpoint_path)
+
+path.preprocessed_path = os.path.join(path.project_path, "output")
+if not os.path.isdir(path.preprocessed_path):
+    os.mkdir(path.preprocessed_path)
+
+path.train_imdb_path = os.path.join(path.preprocessed_path, "training")
+if not os.path.isdir(path.train_imdb_path):
+    os.mkdir(path.train_imdb_path)
+
+path.val_imdb_path = os.path.join(path.preprocessed_path, "validation")
+if not os.path.isdir(path.val_imdb_path):
+    os.mkdir(path.val_imdb_path)
+
+cfg.path = path
+
+## optimizer
+optimizer = edict(
+    type_name = 'adam',
+    keywords = edict(
+        lr        = 1.25e-4,
+        weight_decay = 0,
+    ),
+    clipped_gradient_norm = 35.0
+)
+cfg.optimizer = optimizer
+## scheduler
+scheduler = edict(
+    type_name = 'MultiStepLR',
+    keywords = edict(
+        milestones = [90, 120]
+    )
+)
+cfg.scheduler = scheduler
+
+## data
+data = edict(
+    batch_size = 32,
+    num_workers = 4,
+    rgb_shape = (384, 1280, 3),
+    train_dataset = "KittiRTM3DDataset",
+    val_dataset   = "KittiMonoDataset",
+    test_dataset  = "KittiMonoTestDataset",
+    train_split_file = os.path.join(cfg.path.visualDet3D_path, 'data', 'kitti', 'chen_split', 'train.txt'),
+    val_split_file   = os.path.join(cfg.path.visualDet3D_path, 'data', 'kitti', 'chen_split', 'val.txt'),
+    max_occlusion = 4,
+    min_z = 3,
+)
+
+data.augmentation = edict(
+    rgb_mean = np.array([0.485, 0.456, 0.406]),
+    rgb_std  = np.array([0.229, 0.224, 0.225]),
+    cropSize = (data.rgb_shape[0], data.rgb_shape[1]),
+)
+data.train_augmentation = [
+    edict(type_name='ConvertToFloat'),
+    edict(type_name='RandomWarpAffine', keywords=edict(output_w=data.augmentation.cropSize[1], output_h=data.augmentation.cropSize[0])),
+    #edict(type_name='Resize', keywords=edict(size=data.augmentation.cropSize)),
+    edict(type_name="Shuffle", keywords=edict(
+            aug_list=[
+                edict(type_name="RandomBrightness", keywords=edict(distort_prob=1.0)),
+                edict(type_name="RandomContrast", keywords=edict(distort_prob=1.0, lower=0.6, upper=1.4)),
+                edict(type_name="Compose", keywords=edict(
+                   aug_list=[
+                       edict(type_name="ConvertColor", keywords=edict(transform='HSV')),
+                       edict(type_name="RandomSaturation", keywords=edict(distort_prob=1.0, lower=0.6, upper=1.4)),
+                       edict(type_name="ConvertColor", keywords=edict(current='HSV', transform='RGB')),
+                   ] 
+                ))
+            ]
+        )
+    ),
+    edict(type_name='RandomEigenvalueNoise', keywords=edict(alphastd=0.1)),
+    edict(type_name='RandomMirror', keywords=edict(mirror_prob=0.5)),
+    edict(type_name="FilterObject"),
+    edict(type_name='Normalize', keywords=edict(mean=data.augmentation.rgb_mean, stds=data.augmentation.rgb_std))
+]
+data.test_augmentation = [
+    edict(type_name='ConvertToFloat'),
+    #edict(type_name='CropTop', keywords=edict(crop_top_index=data.augmentation.crop_top)),
+    edict(type_name='Resize', keywords=edict(size=data.augmentation.cropSize)),
+    edict(type_name='Normalize', keywords=edict(mean=data.augmentation.rgb_mean, stds=data.augmentation.rgb_std))
+]
+cfg.data = data
+
+## networks
+detector = edict()
+detector.obj_types = cfg.obj_types
+detector.name = 'KM3D'
+detector.backbone = edict(
+    depth=18,
+    pretrained=True,
+    frozen_stages=-1,
+    num_stages=4,
+    out_indices=(3, ),
+    norm_eval=False,
+    dilations=(1, 1, 1, 1),
+)
+head_loss = edict(
+    gamma=2.0,
+    rampup_length = 100,
+    output_w = data.rgb_shape[1] // 4
+)
+head_test = edict(
+    score_thr=0.3,
+)
+
+head_layer = edict(
+    input_features=256,
+    head_features=64,
+    head_dict={'hm': len(cfg.obj_types), 'wh': 2, 'hps': 18,
+               'rot': 8, 'dim': 3, 'prob': 1,
+               'reg': 2, 'hm_hp': 9, 'hp_offset': 2}
+)
+detector.head = edict(
+    num_classes     = len(cfg.obj_types),
+    num_joints      = 9,
+    max_objects     = 32,
+    layer_cfg       = head_layer,
+    loss_cfg        = head_loss,
+    test_cfg        = head_test
+)
+detector.loss = head_loss
+cfg.detector = detector