SECOND for Lyft 3d object detection challenge

This is the source code for my 19th place solution in Kaggle's Lyft 3d Object Detection Challenge.

I used original second.pytorch and modified it to get it working for the lyft competition.

Modifications:

Support for Lyft's level 5 dataset.
Some small tweaks to get the nuscenes version working for lyft.
The evaluation code is modified to include competition's evaluation metric which uses a range of IoU thresholds for mAP (unlike the original metric which used a range distance thresholds).
second/notebooks/*.ipynb files contain my submission and inference testing code, it needs some cleanup

ONLY support python 3.6+, pytorch 1.0.0+. Tested in Ubuntu 16.04/18.04/Windows 10.

Install

1. Clone code

git clone https://github.com/pyaf/second.pytorch.git
cd ./second.pytorch/second

2. Install dependencies

It is recommend to use Anaconda package manager.

conda install scikit-image scipy numba pillow matplotlib

pip install fire tensorboardX protobuf opencv-python

Follow instructions in spconv to install spconv.

If you want to train with fp16 mixed precision (train faster in RTX series, Titan V/RTX and Tesla V100, but I only have 1080Ti), you need to install apex.

3. add second.pytorch/ to PYTHONPATH

Add following line to your .bashrc, update the path accordingly

export PYTHONPATH="${PYTHONPATH}:/media/ags/DATA/CODE/kaggle/lyft-3d-object-detection/second.pytorch"

Prepare dataset

Lyft Dataset preparation

Download Lyft dataset:

└── LYFT_TRAINVAL_DATASET_ROOT
       ├── lidar         <-- lidar files
       ├── maps          <-- unused
       ├── images        <-- unused
       ├── data          <-- metadata and annotations
       └── v1.0-trainval <-- softlink to `data`

└── NUSCENES_TEST_DATASET_ROOT
       ├── lidar         <-- lidar files
       ├── maps          <-- unused
       ├── images        <-- unused
       ├── data          <-- metadata and annotations
       └── v1.0-test     <-- softlink to `data`

NOTE: v1.0-* folders in train/test folders are soft links to corresponding data folders

python create_data.py nuscenes_data_prep --root_path=LYFT_TRAINVAL_DATASET_ROOT  --version="v1.0-trainval" --dataset_name="NuScenesDataset" --max_sweeps=10
python create_data.py nuscenes_data_prep --root_path=LYFT_TEST_DATASET_ROOT  --version="v1.0-test" --dataset_name="NuScenesDataset" --max_sweeps=10

LYFT_TRAINVAL_DATASET_ROOT are full path to train set of the dataset, similaryly for LYFT_TEST_DATASET_ROOT.

Prepare gt_data_train.json/gt_data_val.json files using prepare.ipynb, follow the comments.

Rest of this readme is from original second implementation.

Modify config file

There is some path need to be configured in config file:

train_input_reader: {
  ...
  database_sampler {
    database_info_path: "/path/to/dataset_dbinfos_train.pkl"
    ...
  }
  dataset: {
    dataset_class_name: "DATASET_NAME"
    kitti_info_path: "/path/to/dataset_infos_train.pkl"
    kitti_root_path: "DATASET_ROOT"
  }
}
...
eval_input_reader: {
  ...
  dataset: {
    dataset_class_name: "DATASET_NAME"
    kitti_info_path: "/path/to/dataset_infos_val.pkl"
    kitti_root_path: "DATASET_ROOT"
  }
}

Usage

train with single GPU

python ./pytorch/train.py train --config_path=./configs/car.fhd.config --model_dir=/path/to/model_dir

train with multiple GPU (need test, I only have one GPU)

Assume you have 4 GPUs and want to train with 3 GPUs:

CUDA_VISIBLE_DEVICES=0,1,3 python ./pytorch/train.py train --config_path=./configs/car.fhd.config --model_dir=/path/to/model_dir --multi_gpu=True

Note: The batch_size and num_workers in config file is per-GPU, if you use multi-gpu, they will be multiplied by number of GPUs. Don't modify them manually.

You need to modify total step in config file. For example, 50 epochs = 15500 steps for car.lite.config and single GPU, if you use 4 GPUs, you need to divide steps and steps_per_eval by 4.

train with fp16 (mixed precision)

Modify config file, set enable_mixed_precision to true.

Make sure "/path/to/model_dir" doesn't exist if you want to train new model. A new directory will be created if the model_dir doesn't exist, otherwise will read checkpoints in it.
training process use batchsize=6 as default for 1080Ti, you need to reduce batchsize if your GPU has less memory.
Currently only support single GPU training, but train a model only needs 20 hours (165 epoch) in a single 1080Ti and only needs 50 epoch to reach 78.3 AP with super converge in car moderate 3D in Kitti validation dateset.

evaluate

python ./pytorch/train.py evaluate --config_path=./configs/car.fhd.config --model_dir=/path/to/model_dir --measure_time=True --batch_size=1

detection result will saved as a result.pkl file in model_dir/eval_results/step_xxx or save as official KITTI label format if you use --pickle_result=False.

pretrained model

You can download pretrained models in google drive. The car_fhd model is corresponding to car.fhd.config.

Note that this pretrained model is trained before a bug of sparse convolution fixed, so the eval result may slightly worse.

Try Kitti Viewer Web

I've modified original Kitti viewer to get it working for lyft inference, do give it a try after training.

Major step

run python ./kittiviewer/backend/main.py main --port=xxxx in your server/local.
run cd ./kittiviewer/frontend && python -m http.server to launch a local web server.
open your browser and enter your frontend url (e.g. http://127.0.0.1:8000, default]).
input backend url (e.g. http://127.0.0.1:16666)
input root path, info path and det path (optional)
click load, loadDet (optional), input image index in center bottom of screen and press Enter.

Inference step

Firstly the load button must be clicked and load successfully.

input checkpointPath and configPath.
click buildNet.
click inference.

Try Kitti Viewer (Deprecated)

You should use kitti viewer based on pyqt and pyqtgraph to check data before training.

run python ./kittiviewer/viewer.py, check following picture to use kitti viewer:

Concepts

Kitti lidar box

A kitti lidar box is consist of 7 elements: [x, y, z, w, l, h, rz], see figure.

All training and inference code use kitti box format. So we need to convert other format to KITTI format before training.

Kitti camera box

A kitti camera box is consist of 7 elements: [x, y, z, l, h, w, ry].

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
images		images
second		second
torchplus		torchplus
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
NUSCENES-GUIDE.md		NUSCENES-GUIDE.md
README.md		README.md
RELEASE.md		RELEASE.md
__init__.py		__init__.py

License

pyaf/second.pytorch

Folders and files

Latest commit

History

Repository files navigation