Skip to content

Latest commit

 

History

History
122 lines (87 loc) · 6.99 KB

detectnet-tao.md

File metadata and controls

122 lines (87 loc) · 6.99 KB

Back | Next | Contents
Object Detection

Using TAO Detection Models

NVIDIA's TAO Toolkit includes highly-accurate high-resolution object detection models, optimized/pruned and quantized for INT8 precision. jetson-inference supports for TAO models that are based on the DetectNet_v2 DNN architecture, including the following pre-trained models:

Model CLI argument Object classes
TAO PeopleNet peoplenet person, bag, face
TAO PeopleNet (pruned) peoplenet-pruned person, bag, face
TAO DashCamNet dashcamnet person, car, bike, sign
TAO TrafficCamNet trafficcamnet person, car, bike, sign
TAO FaceDetect facedetect face

Although a section below covers how to load your own TAO models, let's take a look at using the pre-trained models first.

PeopleNet

PeopleNet is a high-resolution 960x544 model with up to ~90% accuracy for detecting people, bags, and faces. It's based on DetectNet_v2 with a ResNet-34 backbone. Launching detectnet/detectnet.py with --model=peoplenet will run the TAO PeopleNet model with INT8 precision on platforms that support it (FP16 otherwise). There's also the peoplenet-pruned model which is faster and slightly less accurate.

# Download test video
wget https://nvidia.box.com/shared/static/veuuimq6pwvd62p9fresqhrrmfqz0e2f.mp4 -O pedestrians.mp4

# C++
$ detectnet --model=peoplenet pedestrians.mp4 pedestrians_peoplenet.mp4

# Python
$ detectnet.py --model=peoplenet pedestrians.mp4 pedestrians_peoplenet.mp4

You can also adjust the --confidence and --clustering thresholds - these TAO models seem not introduce too many false positives with lower thresholds due to their increased accuracy. The Flask webapp is a convenient tool for playing around with these settings interactively.

DashCamNet

Like PeopleNet, DashCamNet is a 960x544 detector based on DetectNet_v2 and ResNet-34. It's intended use is for detecting people and vehicles from street-level viewpoints and first-person perspectives. TrafficCamNet is similar, for imagery taken from a higher vantage point.

# C++
$ detectnet --model=dashcamnet input.mp4 output.mp4

# Python
$ detectnet.py --model=dashcamnet input.mp4 output.mp4

note: you can run this with any input/output from the Camera Streaming and Multimedia page

FaceDetect

FaceDetect is a TAO model for just detecting faces. It was trained with up to ~85% accuracy on a dataset with more than 1.8M samples taken from a variety of camera angles. It has a resolution of 736x416 and uses DetectNet_v2 with a ResNet-18 backbone.

# C++
$ detectnet --model=facedetect "images/humans_*.jpg" images/test/facedetect_humans_%i.jpg

# Python
$ detectnet.py --model=facedetect "images/humans_*.jpg" images/test/facedetect_humans_%i.jpg

Importing Your Own TAO Detection Models

Although jetson-inference can automatically download, convert, and load the pre-trained TAO detection models above, you may wish to use a different version of those models or your own DetectNet_v2 model that you trained or fine-tuned using TAO. To do that, copy your trained ETLT model to your Jetson, along with the appropriate version of the tao-converter tool. Then depending on your model's configuration (the details of which are typically found on the model card), you can run a script like below to generate the TensorRT engine from the ETLT:

# model config
MODEL_DIR="peoplenet_deployable_quantized_v2.6.1"
MODEL_INPUT="$MODEL_DIR/resnet34_peoplenet_int8.etlt"
MODEL_OUTPUT="$MODEL_INPUT.engine"

INPUT_DIMS="3,544,960"
OUTPUT_LAYERS="output_bbox/BiasAdd,output_cov/Sigmoid"
MAX_BATCH_SIZE="1"

WORKSPACE="4294967296" # 4GB (default)
PRECISION="int8"       # fp32, fp16, int8
CALIBRATION="$MODEL_DIR/resnet34_peoplenet_int8.txt"

ENCRYPTION_KEY="tlt_encode"

# generate TensorRT engine
tao-converter \
	-k $ENCRYPTION_KEY \
	-d $INPUT_DIMS \
	-o $OUTPUT_LAYERS \
	-m $MAX_BATCH_SIZE \
	-w $WORKSPACE \
	-t $PRECISION \
	-c $CALIBRATION \
	-e $MODEL_OUTPUT \
	$MODEL_INPUT

After converting it, you can load it with detectnet/detectnet.py like so:

$ detectnet \
	--model=$MODEL_DIR/resnet34_peoplenet_int8.etlt.engine \
	--labels=$MODEL_DIR/labels.txt \
	--input-blob=input_1 \
	--output-cvg=output_cov/Sigmoid \
	--output-bbox=output_bbox/BiasAdd \
	input.mp4 output.mp4

note: only TAO DetectNet_v2 models are currently supported in jetson-inference, as it is setup for that network's pre/post-processing

In your own applications, you can also load them directly from C++ or Python by using the extended form of the detectNet API.

Next | Object Tracking on Video
Back | Coding Your Own Object Detection Program

© 2016-2023 NVIDIA | Table of Contents