[E2E] Basic/event_profiling_info.cpp seems flaky #13591

uditagarwal97 · 2024-04-30T05:01:07Z

Describe the bug

Failed run: https://github.com/intel/llvm/actions/runs/8886566095/job/24401423571?pr=13588
Successful run: https://github.com/intel/llvm/actions/runs/8886566095/job/24406513670

I observed this behavior L0 GPU on Windows, but now sure if we could also reproduce this flaky behavior on other Linux or devices.

FAIL: SYCL :: Basic/event_profiling_info.cpp (220 of 2017)
******************** TEST 'SYCL :: Basic/event_profiling_info.cpp' FAILED ********************
Exit Code: 3221226505

Command Output (stdout):
--
# RUN: at line 2
D:/github/actions-runner/_work/llvm/llvm/install/bin/clang++.exe   -fsycl -fsycl-targets=spir64 D:\github\actions-runner\_work\llvm\llvm\llvm\sycl\test-e2e\Basic\event_profiling_info.cpp -o D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out
# executed command: D:/github/actions-runner/_work/llvm/llvm/install/bin/clang++.exe -fsycl -fsycl-targets=spir[64](https://github.com/intel/llvm/actions/runs/8886566095/job/24401423571?pr=13588#step:12:65) 'D:\github\actions-runner\_work\llvm\llvm\llvm\sycl\test-e2e\Basic\event_profiling_info.cpp' -o 'D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out'
# RUN: at line 4
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu 'D:\github\actions-runner\_work\llvm\llvm\build-e2e\Basic\Output\event_profiling_info.cpp.tmp.out'
# .---command stderr------------
# | Assertion failed: Submit <= Start, file D:/github/actions-runner/_work/llvm/llvm/llvm/sycl/test-e2e/Basic/event_profiling_info.cpp, line 30
# `-----------------------------
# error: command failed with exit status: 0xc0000409

To reproduce

DPC++ commit: c2cc3a1

Environment

OS: Windows
Device: L0 Gen12

sycl-ls --verbose

Platform [#2]:
    Version  : 1.3
    Name     : Intel(R) Level-Zero
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 1.3
        Name       : Intel(R) Iris(R) Xe Graphics
        Vendor     : Intel(R) Corporation
        Driver     : 1.3.28044
        Aspects    : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_limited_graph ext_oneapi_private_alloca
        info::device::sub_group_sizes: 8 16 32

Additional context

No response

The text was updated successfully, but these errors were encountered:

steffenlarsen · 2024-05-08T06:38:13Z

Tag @againull for awareness. Could this be due to the known timing approximation issues?

aarongreig · 2024-05-24T11:24:54Z

I'm observing a similar problem with Basic/submit_time.cpp on linux/CL. I've found you need a bit of system load and a lot of runs to reproduce but it's consistently do-able within 20 or so iterations. An interesting data point would be whether this reproduces on cuda/hip.

On l0 and cl this could be explained by discrepancies between the timers used for the common DeviceAndHostTimer implementation they both share, which is used to cache the event's submit time here, and the separate mechanisms both adapters have for retrospectively querying out an event's start time (l0, cl).

uditagarwal97 added the bug Something isn't working label Apr 30, 2024

uditagarwal97 mentioned this issue Apr 30, 2024

[NFC] Copy sycl::vec implementation into a dedicated preview header #13588

Merged

KornevNikita added the confirmed label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[E2E] Basic/event_profiling_info.cpp seems flaky #13591

[E2E] Basic/event_profiling_info.cpp seems flaky #13591

uditagarwal97 commented Apr 30, 2024

steffenlarsen commented May 8, 2024

aarongreig commented May 24, 2024

[E2E] Basic/event_profiling_info.cpp seems flaky #13591

[E2E] Basic/event_profiling_info.cpp seems flaky #13591

Comments

uditagarwal97 commented Apr 30, 2024

Describe the bug

To reproduce

Environment

Additional context

steffenlarsen commented May 8, 2024

aarongreig commented May 24, 2024