Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E2E] Basic/event_profiling_info.cpp seems flaky #13591

Open
uditagarwal97 opened this issue Apr 30, 2024 · 2 comments
Open

[E2E] Basic/event_profiling_info.cpp seems flaky #13591

uditagarwal97 opened this issue Apr 30, 2024 · 2 comments
Labels
bug Something isn't working confirmed

Comments

@uditagarwal97
Copy link
Contributor

Describe the bug

Failed run: https://github.com/intel/llvm/actions/runs/8886566095/job/24401423571?pr=13588
Successful run: https://github.com/intel/llvm/actions/runs/8886566095/job/24406513670

I observed this behavior L0 GPU on Windows, but now sure if we could also reproduce this flaky behavior on other Linux or devices.

FAIL: SYCL :: Basic/event_profiling_info.cpp (220 of 2017)
******************** TEST 'SYCL :: Basic/event_profiling_info.cpp' FAILED ********************
Exit Code: 3221226505

Command Output (stdout):
--
# RUN: at line 2
D:/github/actions-runner/_work/llvm/llvm/install/bin/clang++.exe   -fsycl -fsycl-targets=spir64 D:\github\actions-runner\_work\llvm\llvm\llvm\sycl\test-e2e\Basic\event_profiling_info.cpp -o D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out
# executed command: D:/github/actions-runner/_work/llvm/llvm/install/bin/clang++.exe -fsycl -fsycl-targets=spir[64](https://github.com/intel/llvm/actions/runs/8886566095/job/24401423571?pr=13588#step:12:65) 'D:\github\actions-runner\_work\llvm\llvm\llvm\sycl\test-e2e\Basic\event_profiling_info.cpp' -o 'D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out'
# RUN: at line 4
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu 'D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out'
# .---command stderr------------
# | Assertion failed: Submit <= Start, file D:/github/actions-runner/_work/llvm/llvm/llvm/sycl/test-e2e/Basic/event_profiling_info.cpp, line 30
# `-----------------------------
# error: command failed with exit status: 0xc0000409

To reproduce

DPC++ commit: c2cc3a1

Environment

OS: Windows
Device: L0 Gen12

sycl-ls --verbose

Platform [#2]:
    Version  : 1.3
    Name     : Intel(R) Level-Zero
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 1.3
        Name       : Intel(R) Iris(R) Xe Graphics
        Vendor     : Intel(R) Corporation
        Driver     : 1.3.28044
        Aspects    : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_limited_graph ext_oneapi_private_alloca
        info::device::sub_group_sizes: 8 16 32

Additional context

No response

@steffenlarsen
Copy link
Contributor

Tag @againull for awareness. Could this be due to the known timing approximation issues?

@aarongreig
Copy link
Contributor

I'm observing a similar problem with Basic/submit_time.cpp on linux/CL. I've found you need a bit of system load and a lot of runs to reproduce but it's consistently do-able within 20 or so iterations. An interesting data point would be whether this reproduces on cuda/hip.

On l0 and cl this could be explained by discrepancies between the timers used for the common DeviceAndHostTimer implementation they both share, which is used to cache the event's submit time here, and the separate mechanisms both adapters have for retrospectively querying out an event's start time (l0, cl).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed
Projects
None yet
Development

No branches or pull requests

4 participants