[Bug]: Failure to allocate cuda resources when using face_recognition library with "cnn" on Jetson Nano #2948

marcjasner · 2024-04-18T01:06:59Z

What Operating System(s) are you seeing this problem on?

Linux (aarch64)

dlib version

19.24

Python version

3.6.9

Compiler

gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)

Expected Behavior

I'm attempting to test face detection and landmark generation on a Jetson Nano (running Ubuntu 18.04 and Jetpack 4.6) using dlib (via the face_recognition library) with this simple python script to measure timings:

#!/usr/bin/python3
import cv2
import numpy as np
import face_recognition as faceRegLib
import time

def current_milli_time():
    return round(time.time() * 1000)

img_bgr = faceRegLib.load_image_file('willsmith.jpg')
img_rgb = cv2.cvtColor(img_bgr,cv2.COLOR_BGR2RGB)

for i in range(10):
  startTime=current_milli_time()
  face = faceRegLib.face_locations(img_rgb, model="cnn")[0]
  locTime=current_milli_time()-startTime
  demo_encode = faceRegLib.face_encodings(img_rgb)[0]
  elapsedTime = current_milli_time()-startTime
  print("Elapsed Time = {}ms    detection Time = {}ms" .format(elapsedTime, locTime))

When face_locations() is called with model="hog" then things work fine, with an average measurement of about 2.5 seconds to do face detection and landmark generation

When I changed the argument to model="cnn", which calls

cnn_face_detector = dlib.cnn_face_detection_model_v1(cnn_face_detection_model)

then the expectation is that it also works fine, but much faster as dlib (the latest github source: 19.24) is compiled with CUDA support enabled.

Current Behavior

Running the script with model="cnn" results in the following errors:

Traceback (most recent call last): File "./faceboxes.py", line 16, in <module> face = faceRegLib.face_locations(img_rgb, model="cnn")[0] File "/home/marc/.local/lib/python3.6/site-packages/face_recognition/api.py", line 119, in face_locations return [_trim_css_to_bounds(_rect_to_css(face.rect), img.shape) for face in _raw_face_locations(img, number_of_times_to_upsample, "cnn")] File "/home/marc/.local/lib/python3.6/site-packages/face_recognition/api.py", line 103, in _raw_face_locations return cnn_face_detector(img, number_of_times_to_upsample) RuntimeError: Error while calling cudnnFindConvolutionForwardAlgorithm( context(), descriptor(data), (const cudnnFilterDescriptor_t)filter_handle, (const cudnnConvolutionDescriptor_t)conv_handle, descriptor(dest_desc), num_possible_algorithms, &num_algorithms, perf_results.data()) in file /home/marc/src/dlib_github/dlib/dlib/cuda/cudnn_dlibapi.cpp:827. code: 2, reason: CUDA Resources could not be allocated. cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated
The cudaFree() and cudaFreeHots() errors repeat many times. Syslog contains a lot of messages like:

Apr 17 20:41:54 elroy kernel: [ 737.438691] 504-gm20b, pid 7436, refs 4: Apr 17 20:41:54 elroy kernel: [ 737.438693] channel status: not in use pending busy Apr 17 20:41:54 elroy kernel: [ 737.438699] RAMFC : TOP: 8000001f0005f8c0 PUT: 00000001001dbb3c GET: 00000001001dbb28 FETCH: 00000201001dbb3c

Steps to Reproduce

On a Jetson Nano (4gb) use git to clone the latest dlib repo. Then compile it using the following steps:

cd dlib/
sed -i 's,forward_algo = forward_best_algo;,//forward_algo = forward_best_algo;,g' dlib/cuda/cudnn_dlibapi.cpp
mkdir build
cd build
cmake .. -DDLIB_USE_CUDA=1
cmake --build .
cd ..
sudo python3 setup.py install
pip3 install face_recognition

Step 2 fixes a known issue on Jetson Nano devices

Then run the python code above.

Anything else?

No response

The text was updated successfully, but these errors were encountered:

davisking · 2024-04-27T12:47:58Z

Those cmake commands have no effect on the resulting python install. cmake outputs are not used when running setup.py. I would not do that sed command either, it is not recommended.

Anyway, your cuda install is probably not done correctly. The cuda toolkit is very particular about being installed just right. If you don't follow exactly the instructions nvidia gives for your platform it will often fail like this. Which is a common problem for people and not something dlib has any control over.

marcjasner · 2024-04-28T00:45:19Z

Thanks for the reply. I'll rebuild it without the sed change.

Cuda was pre-installed by NVIDIA in the Jetson Nano OS image. I'll double check the installation and see if anything sticks out though. Thanks again

alexandreofbh · 2024-05-14T23:49:03Z

In 2022 I used the Jetson Nano 4GB, I remember that cuDnn was not correctly installed in the image provided by NVidia, and there were problems when I compiled and ran programs made in C++. After reinstalling cuDnn, everything started working perfectly. I think you need to reinstall cuDnn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Failure to allocate cuda resources when using face_recognition library with "cnn" on Jetson Nano #2948

[Bug]: Failure to allocate cuda resources when using face_recognition library with "cnn" on Jetson Nano #2948

marcjasner commented Apr 18, 2024

davisking commented Apr 27, 2024

marcjasner commented Apr 28, 2024

alexandreofbh commented May 14, 2024

[Bug]: Failure to allocate cuda resources when using face_recognition library with "cnn" on Jetson Nano #2948

[Bug]: Failure to allocate cuda resources when using face_recognition library with "cnn" on Jetson Nano #2948

Comments

marcjasner commented Apr 18, 2024

What Operating System(s) are you seeing this problem on?

dlib version

Python version

Compiler

Expected Behavior

Current Behavior

Steps to Reproduce

Anything else?

davisking commented Apr 27, 2024

marcjasner commented Apr 28, 2024

alexandreofbh commented May 14, 2024