cuDNN fMHA failures return unhelpful error messages #11129

thomasjoerg · 2024-04-02T08:52:44Z

XLA GPU calls into cuDNN for fused multi-headed attention (for specific HLO patterns). When errors occur during the construction of a cuDNN graph a RuntimeError with minimal information is surfaced to the XLA GPU user. For example, using bfloat16 types can produce the following error message.

RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM

To be more helpful, the error message should state that the error happened during cuDNN graph construction / finalization and can be worked-around with --xla_gpu_enable_cudnn_fmha=false. Ideally, the error message would also include a serialized cuDNN graph to help debugging the issue.

The text was updated successfully, but these errors were encountered:

jprabhas added the NVIDIA-GPU XLA on Nvidia GPU label Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuDNN fMHA failures return unhelpful error messages #11129

cuDNN fMHA failures return unhelpful error messages #11129

thomasjoerg commented Apr 2, 2024

cuDNN fMHA failures return unhelpful error messages #11129

cuDNN fMHA failures return unhelpful error messages #11129

Comments

thomasjoerg commented Apr 2, 2024