You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
!apt update && apt install cuda-11-8 before installing [source] -- Note: Tested to work with SLEAP v1.3.3, but takes ~5-10 minutes to install.
Tools -> Command palette -> type in and select 'use fallback runtime'. But this will only work until early Jan 2024 unfortunately [source]
Proper fix: Update usage of dependencies to work with Python 3.10 + TensorFlow 2.15 while maintaining backwards compatibility with at least TF 2.10 for Windows support.
Originally posted by delaroob December 17, 2023
Hi everyone,
I'm trying to continue training a SLEAP network in Colab. I've done the process (importing the same stuff, running the same code blocks etc.) several times in the past few days without any problems, however, it seems like I can't connect to any GPUs.
As the matter of fact, I can't run anything in colab right now except for like saving variables, importing packages and stuff that doesn't really require much comp power. Deeplabcut doesn't work either, the runtime colapses and restarts without further information.
In runtime python3 with a v100 GPU is selected and I still have 122 comp units available.
Thanks in advance for any help and let me know if additional information is required to solve the issue!
Here is the stuff I run (it's basically the demo notebook):
from google.colab import drive
drive.mount('/content/drive/')
(i've already done the next "iteration" of training yesterday, so I skipped the unzip and training part, since I just wanted to run inference and predict instances)
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
2023-12-17 16:30:34.863435: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:34.863471: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Started inference at: 2023-12-17 16:30:37.969681
Args:
{
│ 'data_path': '/content/drive/MyDrive/sleap/colab2/male.mp4',
│ 'models': ['/content/drive/MyDrive/sleap/colab2/models/231213_081111.single_instance'],
│ 'frames': '',
│ 'only_labeled_frames': False,
│ 'only_suggested_frames': False,
│ 'output': None,
│ 'no_empty_frames': False,
│ 'verbosity': 'rich',
│ 'video.dataset': None,
│ 'video.input_format': 'channels_last',
│ 'video.index': '',
│ 'cpu': False,
│ 'first_gpu': False,
│ 'last_gpu': False,
│ 'gpu': 'auto',
│ 'max_edge_length_ratio': 0.25,
│ 'dist_penalty_weight': 1.0,
│ 'batch_size': 4,
│ 'open_in_gui': False,
│ 'peak_threshold': 0.2,
│ 'max_instances': None,
│ 'tracking.tracker': None,
│ 'tracking.max_tracking': None,
│ 'tracking.max_tracks': None,
│ 'tracking.target_instance_count': None,
│ 'tracking.pre_cull_to_target': None,
│ 'tracking.pre_cull_iou_threshold': None,
│ 'tracking.post_connect_single_breaks': None,
│ 'tracking.clean_instance_count': None,
│ 'tracking.clean_iou_threshold': None,
│ 'tracking.similarity': None,
│ 'tracking.match': None,
│ 'tracking.robust': None,
│ 'tracking.track_window': None,
│ 'tracking.min_new_track_points': None,
│ 'tracking.min_match_points': None,
│ 'tracking.img_scale': None,
│ 'tracking.of_window_size': None,
│ 'tracking.of_max_levels': None,
│ 'tracking.save_shifted_instances': None,
│ 'tracking.kf_node_indices': None,
│ 'tracking.kf_init_frame_count': None
}
2023-12-17 16:30:37.999611: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-17 16:30:37.999983: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.000129: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.000255: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.000375: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.045719: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-12-17 16:30:38.046198: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Versions:
SLEAP: 1.3.3
TensorFlow: 2.8.4
Numpy: 1.22.4
Python: 3.10.12
OS: Linux-6.1.58+-x86_64-with-glibc2.35
System:
GPUs: None detected.
Video: /content/drive/MyDrive/sleap/colab2/male.mp4
2023-12-17 16:30:38.122476: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-12-17 16:30:41.717931: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -36 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -18 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2000 num_cores: 8 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -18 } dim { size: -40 } dim { size: -41 } dim { size: 1 } } }
Predicting... ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5% ETA: 0:49:26 4.0 FPS
The text was updated successfully, but these errors were encountered:
TLDR: Google Colab no longer works with TensorFlow <2.15.
This is an issue since some of our dependencies break with TensorFlow >2.11ish.
This is likely because of the CUDA/CuDNN versions. As of Dec 19, 2023
nvidia-smi
reports:Here's a notebook for testing.
Potential workarounds:
!apt update && apt install cuda-11-8
before installing [source] -- Note: Tested to work with SLEAP v1.3.3, but takes ~5-10 minutes to install.Proper fix: Update usage of dependencies to work with Python 3.10 + TensorFlow 2.15 while maintaining backwards compatibility with at least TF 2.10 for Windows support.
Discussed in #1642
Originally posted by delaroob December 17, 2023
Hi everyone,
I'm trying to continue training a SLEAP network in Colab. I've done the process (importing the same stuff, running the same code blocks etc.) several times in the past few days without any problems, however, it seems like I can't connect to any GPUs.
As the matter of fact, I can't run anything in colab right now except for like saving variables, importing packages and stuff that doesn't really require much comp power. Deeplabcut doesn't work either, the runtime colapses and restarts without further information.
In runtime python3 with a v100 GPU is selected and I still have 122 comp units available.
Thanks in advance for any help and let me know if additional information is required to solve the issue!
Here is the stuff I run (it's basically the demo notebook):
(i've already done the next "iteration" of training yesterday, so I skipped the unzip and training part, since I just wanted to run inference and predict instances)
output:
The text was updated successfully, but these errors were encountered: