Skip to content

Object detector to detect free and occupied bagchairs in an image or video for sandbags monitor project.

Notifications You must be signed in to change notification settings

a1usha/bag-chair-model

Repository files navigation

Bag chair detection model

Machine learning lives here 🤖

I’m going to train an object detector to detect free and occupied bagchairs in an image or video for sandbags monitor project. For this purpose, I will use the deep learning technique called Transfer learning with help of Tensorflow Object Detection API.

What is Transfer Learning

In most cases, training a convolutional neural network is a difficult and time-consuming process that requires a lot of computing power and data. In modern realities, both components are quite possible to find - ImageNet (or others) as data library and Google Colab (or others) as power. However, the use of cloud computations can cost the user a pretty penny.

Therefore, in order to speed up the process and save the wallet, people use transfer learning: use already trained (mostly called pre-trained) convolution network as the starting point for their own model (use pre-trained weights as the initial weights for own model).

The whole process can be divided into three large steps (however, this is how any model training works):

  • Collect data - This may already be data collected by someone (as ImageNet) or data received manually (this is my case, I have not found large collections of images of bagchairs other than those that can be found in Google Images).

  • Annotate the data - In short, it is the process of marking the location of objects and specifying their classes in the data.

  • Fine-tune the net - Re-train the weights of the ConvNet using regular backpropagation.

Tensorflow object detection API

Not so far ago, Tensorflow developers made available an Object Detection API for simplifying process of fine-tuning of a pre-trained model. API is provided as a set of scripts, which with minor modifications can be used for your own purposes.

Next, I will describe my own experience and approach to using the above methods.

  1. I collected images and annotated them. There are several tools for annotating dataset that can be found on the Internet - I used MakeSence. It is important to note that there are several annotation formats - COCO, Pascal VOC and YOLO. The code in the future will also depend on the chosen format. I used Pascal VOC, it stores annotation in XML file. Also, you should create label map file (.pbtxt) for future processing.
<annotation>
	<folder>images</folder>
	<filename>image0.jpg</filename>
	<path>download_data/downloads/images/image0.jpg</path>
	<source>
		<database>Unspecified</database>
	</source>
	<size>
		<width>522</width>
		<height>481</height>
		<depth>3</depth>
	</size>
	<object>
		<name>occupied_bagchair</name>
		<pose>Unspecified</pose>
		<truncated>Unspecified</truncated>
		<difficult>Unspecified</difficult>
		<bndbox>
			<xmin>4</xmin>
			<ymin>2</ymin>
			<xmax>521</xmax>
			<ymax>479</ymax>
		</bndbox>
	</object>
</annotation>

pascal_label_map.pbtxt file

item {
  id: 1
  name: 'empty_bagchair'
}

item {
  id: 2
  name: 'occupied_bagchair'
}

More info about annotaion formats: Image data labeling and annotation

  1. Create TF Records. I took the script from the API as a basis and changed it a little (rather, simplified it). It is worth mentioning why this format is needed - TFRecord is Tensorflow's own binary storage format, using it for storage of dataset can have significant impact on performance of import pipeline and for training in future. More info: Tensorflow Records? What they are and how to use them
python create_tfrecords_from_xml.py `
     --image_dir=data\images `
     --annotations_dir=data\annotations `
     --label_map_path=data\label_map\pascal_label_map.pbtxt `
     --output_path=tf_data\
  1. Choose and download pre-trained model. In our main project, we are planning to use single-board computer called Raspberry Pi 4 for model inference. Therefore, models adapted to work on mobile devices were considered as a basis for training. MobileNet is a good example of such a model. The creators of this model architecture have achieved great speed by using depthwise separable convolutions. As a result, my choice fell on an model called SSD MobileNet-v2, which is an improved version of MobileNet-v1. The pre-trained model can be downloaded from here.

  2. Fill in the required fields of the configuration file. Typically, such a file is called pipeline.config. It is necessary to specify the path to the train/test tfrecord files, number of classes (in my case - 2), path to label map file and path to checkpoints (downloaded model) in it.

  3. Train the model. I used Google Colab to speed up my training process. It provides user with powerfull GPU for free (as I remember, for ~9 hours). I prepared this notebook for transfer learning using Tensorflow Object Detection API. It is worth noting that even with a powerful graphics accelerator, the learning process can take a fair amount of time.

  4. Export the frozen graph. This part also included to the training notebook.

  5. Convert model to tf lite format (optional). I prepared this notebook for model tflite model convertion. You can use this repository to run your tflite model on a Raspberry Pi or Android device

  6. Start using your model. I prepared this notebook with my results.

Results

After successfully completing model training (honestly, my free Colab session time has expired 👽), I tested the model on a few photos:

case 1

case 2

case 3

However, there are small flaws ...
case 4

At the moment I have a couple of ideas on how to improve the quality of the model, they all relate to data preparation.

to be continued...

About

Object detector to detect free and occupied bagchairs in an image or video for sandbags monitor project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published