Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Failure to allocate cuda resources when using face_recognition library with "cnn" on Jetson Nano #2948

Open
marcjasner opened this issue Apr 18, 2024 · 3 comments

Comments

@marcjasner
Copy link

What Operating System(s) are you seeing this problem on?

Linux (aarch64)

dlib version

19.24

Python version

3.6.9

Compiler

gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)

Expected Behavior

I'm attempting to test face detection and landmark generation on a Jetson Nano (running Ubuntu 18.04 and Jetpack 4.6) using dlib (via the face_recognition library) with this simple python script to measure timings:

#!/usr/bin/python3
import cv2
import numpy as np
import face_recognition as faceRegLib
import time

def current_milli_time():
    return round(time.time() * 1000)

img_bgr = faceRegLib.load_image_file('willsmith.jpg')
img_rgb = cv2.cvtColor(img_bgr,cv2.COLOR_BGR2RGB)

for i in range(10):
  startTime=current_milli_time()
  face = faceRegLib.face_locations(img_rgb, model="cnn")[0]
  locTime=current_milli_time()-startTime
  demo_encode = faceRegLib.face_encodings(img_rgb)[0]
  elapsedTime = current_milli_time()-startTime
  print("Elapsed Time = {}ms    detection Time = {}ms" .format(elapsedTime, locTime))

When face_locations() is called with model="hog" then things work fine, with an average measurement of about 2.5 seconds to do face detection and landmark generation

When I changed the argument to model="cnn", which calls

cnn_face_detector = dlib.cnn_face_detection_model_v1(cnn_face_detection_model)

then the expectation is that it also works fine, but much faster as dlib (the latest github source: 19.24) is compiled with CUDA support enabled.

Current Behavior

Running the script with model="cnn" results in the following errors:

Traceback (most recent call last): File "./faceboxes.py", line 16, in <module> face = faceRegLib.face_locations(img_rgb, model="cnn")[0] File "/home/marc/.local/lib/python3.6/site-packages/face_recognition/api.py", line 119, in face_locations return [_trim_css_to_bounds(_rect_to_css(face.rect), img.shape) for face in _raw_face_locations(img, number_of_times_to_upsample, "cnn")] File "/home/marc/.local/lib/python3.6/site-packages/face_recognition/api.py", line 103, in _raw_face_locations return cnn_face_detector(img, number_of_times_to_upsample) RuntimeError: Error while calling cudnnFindConvolutionForwardAlgorithm( context(), descriptor(data), (const cudnnFilterDescriptor_t)filter_handle, (const cudnnConvolutionDescriptor_t)conv_handle, descriptor(dest_desc), num_possible_algorithms, &num_algorithms, perf_results.data()) in file /home/marc/src/dlib_github/dlib/dlib/cuda/cudnn_dlibapi.cpp:827. code: 2, reason: CUDA Resources could not be allocated. cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated cudaStreamDestroy() failed. Reason: the launch timed out and was terminated cudaFree() failed. Reason: the launch timed out and was terminated cudaFreeHost() failed. Reason: the launch timed out and was terminated
The cudaFree() and cudaFreeHots() errors repeat many times. Syslog contains a lot of messages like:

Apr 17 20:41:54 elroy kernel: [ 737.438691] 504-gm20b, pid 7436, refs 4: Apr 17 20:41:54 elroy kernel: [ 737.438693] channel status: not in use pending busy Apr 17 20:41:54 elroy kernel: [ 737.438699] RAMFC : TOP: 8000001f0005f8c0 PUT: 00000001001dbb3c GET: 00000001001dbb28 FETCH: 00000201001dbb3c

Steps to Reproduce

On a Jetson Nano (4gb) use git to clone the latest dlib repo. Then compile it using the following steps:

  1. cd dlib/
  2. sed -i 's,forward_algo = forward_best_algo;,//forward_algo = forward_best_algo;,g' dlib/cuda/cudnn_dlibapi.cpp
  3. mkdir build
  4. cd build
  5. cmake .. -DDLIB_USE_CUDA=1
  6. cmake --build .
  7. cd ..
  8. sudo python3 setup.py install
  9. pip3 install face_recognition

Step 2 fixes a known issue on Jetson Nano devices

Then run the python code above.

Anything else?

No response

@davisking
Copy link
Owner

Those cmake commands have no effect on the resulting python install. cmake outputs are not used when running setup.py. I would not do that sed command either, it is not recommended.

Anyway, your cuda install is probably not done correctly. The cuda toolkit is very particular about being installed just right. If you don't follow exactly the instructions nvidia gives for your platform it will often fail like this. Which is a common problem for people and not something dlib has any control over.

@marcjasner
Copy link
Author

Thanks for the reply. I'll rebuild it without the sed change.

Cuda was pre-installed by NVIDIA in the Jetson Nano OS image. I'll double check the installation and see if anything sticks out though. Thanks again

@alexandreofbh
Copy link

In 2022 I used the Jetson Nano 4GB, I remember that cuDnn was not correctly installed in the image provided by NVidia, and there were problems when I compiled and ran programs made in C++. After reinstalling cuDnn, everything started working perfectly. I think you need to reinstall cuDnn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants