Google Colab: GPUs: None detected #1644

talmo · 2023-12-19T18:56:37Z

TLDR: Google Colab no longer works with TensorFlow <2.15.

This is an issue since some of our dependencies break with TensorFlow >2.11ish.

This is likely because of the CUDA/CuDNN versions. As of Dec 19, 2023 nvidia-smi reports:

Driver Version: 535.104.05
CUDA Version: 12.2

Potential workarounds:

Use Paperspace as an alternative to Google Colab
!apt update && apt install cuda-11-8 before installing [source] -- Note: Tested to work with SLEAP v1.3.3, but takes ~5-10 minutes to install.
Tools -> Command palette -> type in and select 'use fallback runtime'. But this will only work until early Jan 2024 unfortunately [source]

Proper fix: Update usage of dependencies to work with Python 3.10 + TensorFlow 2.15 while maintaining backwards compatibility with at least TF 2.10 for Windows support.

Discussed in #1642

^{Originally posted by delaroob December 17, 2023}
Hi everyone,

I'm trying to continue training a SLEAP network in Colab. I've done the process (importing the same stuff, running the same code blocks etc.) several times in the past few days without any problems, however, it seems like I can't connect to any GPUs.

As the matter of fact, I can't run anything in colab right now except for like saving variables, importing packages and stuff that doesn't really require much comp power. Deeplabcut doesn't work either, the runtime colapses and restarts without further information.

In runtime python3 with a v100 GPU is selected and I still have 122 comp units available.

Thanks in advance for any help and let me know if additional information is required to solve the issue!

Here is the stuff I run (it's basically the demo notebook):

!pip uninstall -qqq -y opencv-python opencv-contrib-python
!pip install -qqq "sleap[pypi]>=1.3.3"

from google.colab import drive
drive.mount('/content/drive/')

(i've already done the next "iteration" of training yesterday, so I skipped the unzip and training part, since I just wanted to run inference and predict instances)

!sleap-track "/content/drive/MyDrive/sleap/colab2/male.mp4" -m "/content/drive/MyDrive/sleap/colab2/models/231213_081111.single_instance"

output:

INFO:numexpr.utils:NumExpr defaulting to 8 threads.
2023-12-17 16:30:34.863435: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:34.863471: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Started inference at: 2023-12-17 16:30:37.969681
Args:
{
│   'data_path': '/content/drive/MyDrive/sleap/colab2/male.mp4',
│   'models': ['/content/drive/MyDrive/sleap/colab2/models/231213_081111.single_instance'],
│   'frames': '',
│   'only_labeled_frames': False,
│   'only_suggested_frames': False,
│   'output': None,
│   'no_empty_frames': False,
│   'verbosity': 'rich',
│   'video.dataset': None,
│   'video.input_format': 'channels_last',
│   'video.index': '',
│   'cpu': False,
│   'first_gpu': False,
│   'last_gpu': False,
│   'gpu': 'auto',
│   'max_edge_length_ratio': 0.25,
│   'dist_penalty_weight': 1.0,
│   'batch_size': 4,
│   'open_in_gui': False,
│   'peak_threshold': 0.2,
│   'max_instances': None,
│   'tracking.tracker': None,
│   'tracking.max_tracking': None,
│   'tracking.max_tracks': None,
│   'tracking.target_instance_count': None,
│   'tracking.pre_cull_to_target': None,
│   'tracking.pre_cull_iou_threshold': None,
│   'tracking.post_connect_single_breaks': None,
│   'tracking.clean_instance_count': None,
│   'tracking.clean_iou_threshold': None,
│   'tracking.similarity': None,
│   'tracking.match': None,
│   'tracking.robust': None,
│   'tracking.track_window': None,
│   'tracking.min_new_track_points': None,
│   'tracking.min_match_points': None,
│   'tracking.img_scale': None,
│   'tracking.of_window_size': None,
│   'tracking.of_max_levels': None,
│   'tracking.save_shifted_instances': None,
│   'tracking.kf_node_indices': None,
│   'tracking.kf_init_frame_count': None
}

2023-12-17 16:30:37.999611: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-17 16:30:37.999983: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.000129: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.000255: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.000375: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.045719: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.046198: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Versions:
SLEAP: 1.3.3
TensorFlow: 2.8.4
Numpy: 1.22.4
Python: 3.10.12
OS: Linux-6.1.58+-x86_64-with-glibc2.35

System:
GPUs: None detected.

Video: /content/drive/MyDrive/sleap/colab2/male.mp4
2023-12-17 16:30:38.122476: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?2023-12-17 16:30:41.717931: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -36 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -18 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2000 num_cores: 8 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: -40 } dim { size: -41 } dim { size: 1 } } }
Predicting... ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   5% ETA: 0:49:26 4.0 FPS

The text was updated successfully, but these errors were encountered:

NeuTTH · 2024-03-26T17:04:00Z

Following up on this. Facing the same issue with using SLEAP on google collab

talmo · 2024-04-10T18:08:00Z

Hi @amblypatty,

Did you try installing the older version of cuda first with !apt update && apt install cuda-11-8?

Thanks!

Talmo

talmo pinned this issue Dec 19, 2023

roomrys assigned roomrys and unassigned roomrys Jan 5, 2024

coderabbitai bot mentioned this issue Mar 29, 2024

Update to new TensorFlow conda package #1726

Merged

11 tasks

This comment was marked as resolved.

Sign in to view

talmo mentioned this issue May 21, 2024

Update documentation / guides #1778

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Colab: GPUs: None detected #1644

Google Colab: GPUs: None detected #1644

talmo commented Dec 19, 2023 •

edited

NeuTTH commented Mar 26, 2024 •

edited

This comment was marked as resolved.

talmo commented Apr 10, 2024

Google Colab: GPUs: None detected #1644

Google Colab: GPUs: None detected #1644

Comments

talmo commented Dec 19, 2023 • edited

Discussed in #1642

NeuTTH commented Mar 26, 2024 • edited

This comment was marked as resolved.

talmo commented Apr 10, 2024

talmo commented Dec 19, 2023 •

edited

NeuTTH commented Mar 26, 2024 •

edited