Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting a graph execution error / JIT compilation failed with latest versions #99

Open
bluenote10 opened this issue Jan 8, 2024 · 0 comments

Comments

@bluenote10
Copy link

Using the latest versions of

  • crepe==0.0.14
  • tensorflow==2.15.0.post1

I'm getting a graph execution error / JIT compilation failed with the following reproduction code:

import crepe
import numpy as np

sr = 16000
step_size = 4

signal = np.random.normal(0.0, 0.1, size=sr)

times, f0, f0_conf, _ = crepe.predict(
    signal,
    sr,
    step_size=step_size,
    verbose=1,
)

The full output is somewhat lengthy:

2024-01-08 22:51:49.982191: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-08 22:51:49.982227: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-08 22:51:49.983128: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-08 22:51:49.988289: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-08 22:51:50.702792: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-08 22:51:51.233335: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-08 22:51:51.279192: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-08 22:51:51.279367: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-08 22:51:51.279894: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-08 22:51:51.280020: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-08 22:51:51.280132: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-08 22:51:51.323194: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-08 22:51:51.323360: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-08 22:51:51.323500: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-08 22:51:51.323593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2881 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 980, pci bus id: 0000:01:00.0, compute capability: 5.2
2024-01-08 22:51:51.947998: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8902
2024-01-08 22:51:52.719045: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:504] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-12.2
  /usr/local/cuda
  /home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
  /home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2024-01-08 22:51:53.513126: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.59GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-01-08 22:51:53.529700: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.59GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-01-08 22:51:53.607953: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:542] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
error: libdevice not found at ./libdevice.10.bc
2024-01-08 22:51:53.608118: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:207] INTERNAL: Generating device code failed.
2024-01-08 22:51:53.608752: W tensorflow/core/framework/op_kernel.cc:1827] UNKNOWN: JIT compilation failed.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/me/git/AudioML/audio_ml/experiments/debug/debug_crepe.py", line 13, in <module>
    times, f0, f0_conf, _ = crepe.predict(
  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/crepe/core.py", line 255, in predict
    activation = get_activation(audio, sr, model_capacity=model_capacity,
  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/crepe/core.py", line 212, in get_activation
    return model.predict(frames, verbose=verbose)
  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node model/classifier/Sigmoid defined at (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main

  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code

  File "/home/me/git/AudioML/audio_ml/experiments/debug/debug_crepe.py", line 13, in <module>

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/crepe/core.py", line 255, in predict

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/crepe/core.py", line 212, in get_activation

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/training.py", line 2655, in predict

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/training.py", line 2440, in predict_function

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/training.py", line 2425, in step_function

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/training.py", line 2413, in run_step

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/training.py", line 2381, in predict_step

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/training.py", line 590, in __call__

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/functional.py", line 515, in call

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/functional.py", line 672, in _run_internal_graph

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/layers/core/dense.py", line 255, in call

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/activations.py", line 400, in sigmoid

  File "/home/me/.virtualenvs/ddsp_pytorch/lib/python3.10/site-packages/keras/src/backend.py", line 5915, in sigmoid

JIT compilation failed.
	 [[{{node model/classifier/Sigmoid}}]] [Op:__inference_predict_function_759]

crepe used to work fine before on my machine with older versions so this may be a regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant