YOLOv3 for Object Detection

TensorFlow implementation of YOLOv3 for object detection.
Both inference and training pipelines are implemented.
For inference using pre-trained model, the model stored in .weights file is first downloaded from official YOLO website (Section 'Performance on the COCO Dataset', YOLOv3-416 link), then converted to .npy file and finally loaded by the TensorFlow model for prediction.
For training, the pre-trained DarkNet-53 is used as the feature extractor and the YOLO prediction layers at three scales are trained from scratch. Data augmentation such as random flipping, cropping, resize, affine transformation and color change (hue, saturation, brightness) are applied. Anchor clustering and multiple scale training (rescale training images every 10 epochs) are implemented as well.

TODO

Convert pre-trained .weights model to .npy file (detail).
Pre-trained DarkNet-53 for image classification (detail).
Object detection using pre-trained YOLOv3 trained on COCO dataset (detail).
YOLOv3 training pipeline
Train on VOC dataset (detail).
Performance evaluation.
Train on custom dataset.

Requirements

Python 3.0
TensorFlow 1.12.0+
Numpy
Scipy
imageio
Matplotlib

Use pre-trained model for object detection (80 classes)

Download pre-trained model

Download the pre-trained model yolov3.npy from here. This model is converted from the .weights file from here (Section 'Performance on the COCO Dataset', YOLOv3-416 link).
More details for converting models can be found here.

Setup configuration

Modified the config file configs/config_path.cfg with the following content:
```
 [path]
 coco_pretrained_npy = DIRECTORY/TO/MODEL/yolov3.npy
 save_path = DIRECTORY/TO/SAVE/RESULT/
 test_image_path = DIRECTORY/OF/TEST/IMAGE/
 test_image_name = .jpg
```
- Put the converted pretrained model yolov3.npy in coco_pretrained_npy.
- Put testing images in test_image_path.
- Part of testimg image names is specified by test_image_name.
- Result images will be saved in save_path.
Use obj_score_thresh and nms_iou_thresh in config file configs/coco80.cfg to setup the parameters of non-maximum suppression to remove multiple bounding boxes for one detected object.
- obj_score_thresh is the threshold for deciding if a bounding box detects an object class based on the score. Default is 0.8.
- nms_iou_thresh is the threshold for deciding if two bounding boxes overlap too much based on the IoU. Default is 0.45.

Prediction

Put testing images in test_image_path in pretrain_coco_path.cfg and go to experiment\, run
```
python yolov3.py --detect
```
Testing images are rescaled to 416 * 416 fed into the network.
Result images are saved in save_path setting in configs/pretrain_coco_path.cfg.

Sample results

Train on VOC2012 dataset (20 classes)

Prepare dataset and pre-trained feature extractor

Download VOC2012 training/validation data from here (2GB tar file).
Download the pre-trained Darknet-53 yolov3_feat.npy from here. This model is converted from the .weights file from here (Section 'Pre-Trained Models', Darknet53 448x448 link).
More details for converting models can be found here.

Setup configuration

Modified the config file configs/config_path.cfg with the following content:
```
 [path]
 yolo_feat_pretraind_npy = DIRECTORY/TO/MODEL/yolov3_feat.npy
 train_data_path = DIRECTORY/OF/TRAINING/SET/
 save_path = DIRECTORY/TO/SAVE/RESULT/
```
- Put the converted pretrained model yolov3_feat.npy in yolo_feat_pretraind_npy.
- train_data_path is the parent directory JPEGImages and Annotations for training/validation set.
- Tensorboard summary and trained model will be saved in save_path.
Use config file configs/voc.cfg to setup the hyper-parameters for training on VOC2012. Default values are the current setting. anchor are the 9 anchors (width and height) obtained from anchor clustering in ascending order. obj_weight and nobj_weight are the weights of object loss and non-object loss. multiscale is the set of scales used for training.

Training

Go to experiment\, run
```
python yolov3.py --train
```
The entire dataset is randomly divided into 14556 training samples (85%) and 2568 validation images (15%).
Data augmentation (flipping, cropping, resize, affine transformation and color change) is applied to the training set. The training images are rescaled every 10 epochs (randomly picked from multiscale in configs/voc.cfg).
Validation image are all rescaled to 416 * 416 without augmentation for validation.
The learning rate schedule needs to be further tuned, but the current setting is: 0.1 (1-50 epochs), 0.01 (51-100 epochs) and 0.001 (101-150 epochs).
Tensorboard summary includes losses and sample predictions for both training set (every 100 steps) and validation set (every epoch) are saved in save_path in configs/config_path.cfg. Note that non-maximum suppression does not used in sample predictions and only top 20 predicted bounding boxes based on class score are shown. You can see how the model is doing during training:

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
docs		docs
experiment		experiment
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docs

docs

experiment

experiment

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

YOLOv3 for Object Detection

TODO

Requirements

Use pre-trained model for object detection (80 classes)

Download pre-trained model

Setup configuration

Prediction

Sample results

Train on VOC2012 dataset (20 classes)

Prepare dataset and pre-trained feature extractor

Setup configuration

Training

Sample results

Reference code

Author

About

Releases

Packages

Languages

License

conan7882/yolov3

Folders and files

Latest commit

History

Repository files navigation

YOLOv3 for Object Detection

TODO

Requirements

Use pre-trained model for object detection (80 classes)

Download pre-trained model

Setup configuration

Prediction

Sample results

Train on VOC2012 dataset (20 classes)

Prepare dataset and pre-trained feature extractor

Setup configuration

Training

Sample results

Reference code

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages