xpu: provide a way to debug explicit CPU fallback #126488

dvrogozh · 2024-05-17T00:54:09Z

@fengyuan14 - The commit intel/torch-xpu-ops@5bf9e0c muted debug logs of "explicit" CPU fallbacks. This complicated debug for 3d party contributors trying to evaluate XPU backend capabilities - now I am forced to revert noted commit to understand which operations are not currently implemented by XPU. Please:

Explain what "explicit CPU fallback" means - this seems to be internal to xpu team classification which is unclear and confusing
Extend PYTORCH_DEBUG_XPU_FALLBACK=1 to track any CPU fallback happening in XPU backend. Note: I am fine if "explicit" fallback will be muted by default, but I really need a way to be able to track it.

commit 5bf9e0cc768f7a3b13d829118683275f324399f1 (origin/meng_max_2d)
Author: Feng Yuan <feng1.yuan@intel.com>
Date:   Mon Apr 29 13:05:51 2024 +0800

    Register operator's implementation lazily. (#177)

    1. Avoid dangling operator's implementation (m.impl(torchvision::nms) is
    ahead of `import torchvision` sometime)
    2. Mute debug log of explicit CPU fallback.
    3. Add torchvision.roi_align/_roi_align_backward example case

CC: @jgong5 @mingfeima @XiaobingSuper @ashokei @jingxu10 @gujinghui @EikanWang @fengyuan14 @guangyey

cc @gujinghui @EikanWang @fengyuan14 @guangyey

The text was updated successfully, but these errors were encountered:

dvrogozh · 2024-05-17T00:55:01Z

Also filed intel/torch-xpu-ops#262

dvrogozh · 2024-05-17T01:09:39Z

Note: I am fine if "explicit" fallback will be muted by default, but I really need a way to be able to track it.

I still want to comment on that. I personally will be fine with muted logs on fallback by default because I know that currently there are a number of operations not yet implemented in XPU. However, I argue that for other people who just spotted and want to try XPU backend and having limited knowledge on it - for these people such muted behavior might be a problem. They will spot immediately that XPU backend significantly underperforms, sometimes even compared to CPU, and they won't have any obvious reason at hand why. Log messages with warnings that CPU fallback is happening were quote handy here - they were setting correct impression that currently XPU backend might underperform.

My recommendation is to always print a debug message that CPU fallback is happening regardless whether it's explicit (whatever this means) or implicit.

fengyuan14 · 2024-05-17T02:43:24Z

Got your requirement. In my understanding, the log is not informative for DL workload customers. It should be a debugging requirement.

As to release build, we would keep existing implementation. I think, we could add the feature in debug build.

fengyuan14 · 2024-05-17T02:47:07Z

@EikanWang Please comment.

dvrogozh · 2024-05-17T13:26:48Z

As to release build, we would keep existing implementation. I think, we could add the feature in debug build.

Can you, please, have this feature controlled by environment variable, let's say same as before - PYTORCH_DEBUG_XPU_FALLBACK=1? In this case you can have it disabled by default for Release build and enabled by default for Debug build. Then, end user can decide whether he want it enabled for Release build or disabled for debug via environment variable.

dvrogozh · 2024-05-24T18:37:43Z

I opened intel/torch-xpu-ops#318 with the implementation I propose (which is - always warn on cpu fallback :) ). Let's continue discussion in the PR.

EikanWang · 2024-05-30T02:11:58Z

We will close the issue as long as the PR is landed.

dvrogozh mentioned this issue May 17, 2024

Provide a way to debug explicit CPU fallback intel/torch-xpu-ops#262

Open

guangyey added module: xpu Intel XPU related issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 17, 2024

dvrogozh mentioned this issue May 24, 2024

Always warn on cpu fallback intel/torch-xpu-ops#318

Open

EikanWang assigned dvrogozh May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xpu: provide a way to debug explicit CPU fallback #126488

xpu: provide a way to debug explicit CPU fallback #126488

dvrogozh commented May 17, 2024 •

edited by pytorch-bot bot

dvrogozh commented May 17, 2024

dvrogozh commented May 17, 2024

fengyuan14 commented May 17, 2024

fengyuan14 commented May 17, 2024

dvrogozh commented May 17, 2024

dvrogozh commented May 24, 2024

EikanWang commented May 30, 2024

xpu: provide a way to debug explicit CPU fallback #126488

xpu: provide a way to debug explicit CPU fallback #126488

Comments

dvrogozh commented May 17, 2024 • edited by pytorch-bot bot

dvrogozh commented May 17, 2024

dvrogozh commented May 17, 2024

fengyuan14 commented May 17, 2024

fengyuan14 commented May 17, 2024

dvrogozh commented May 17, 2024

dvrogozh commented May 24, 2024

EikanWang commented May 30, 2024

dvrogozh commented May 17, 2024 •

edited by pytorch-bot bot