Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI jobs with alpaka_DEBUG=2 #2130

Open
bernhardmgruber opened this issue Sep 5, 2023 · 10 comments · May be fixed by #2133
Open

Add CI jobs with alpaka_DEBUG=2 #2130

bernhardmgruber opened this issue Sep 5, 2023 · 10 comments · May be fixed by #2133

Comments

@bernhardmgruber
Copy link
Member

Currently, all (but one analysis) Debug CI jobs run with alpaka_DEBUG=0. This means, that extra debugging code is never tested by the CI. We should add at least a few CI runs testing different debug levels.

@SimeonEhrig
Copy link
Member

I will fix it. When I created the job generator, I was not aware about the cmake flag.

@SimeonEhrig SimeonEhrig linked a pull request Sep 5, 2023 that will close this issue
1 task
@bernhardmgruber
Copy link
Member Author

Thx a lot! It could be that this causes some tests to become very noisy, so we may need to have a look at the logs, whether this is still managable.

@SimeonEhrig
Copy link
Member

Your are right: https://gitlab.com/hzdr/crp/alpaka/-/jobs/5021535754

Do you have an idea to handle it?

@bernhardmgruber
Copy link
Member Author

Well, we could have a discussion whether anything but the 4th line here is meaningful to anyone:

29: [-] BufCpuImpl
29: [-] allocBuf
29: [+] operator()
29: printDebug e: (1) ewb: 1 de: (1) dptr: 0x55ba59952570 dpitchb: (1) se: (1) sptr: 0x55ba59962ba0 spitchb: (1)
29: [-] operator()
29: [+] ~BufCpuImpl
29: [-] ~BufCpuImpl
29: [+] ~BufCpuImpl
29: [-] ~BufCpuImpl
29: [+] getDevByIdx
29: [+] getDevCount
29: [-] getDevCount
29: [-] getDevByIdx
29: [+] getDevByIdx
29: [+] getDevCount
29: [-] getDevCount
29: [-] getDevByIdx
29: [+] QueueGenericThreadsBlocking
29: [-] QueueGenericThreadsBlocking
29: [+] allocBuf
29: [+] BufCpuImpl

and just reduce the amount of output. Especially all simple queries like getDevCount etc. are just noise in IMO. I can see a use for the logs when buffer implementations are destroyed, because the shared pointers make it harder to understand lifetimes.

But in general, we should just have very little CI runs with alpaka_DEBUG=2. I don't know whether you can steer that in the job generator.

@SimeonEhrig
Copy link
Member

If I add the BUILD_TYPES CMAKE_DEBUG1 and CMAKE_DEBUG2 and map it to alpaka_DEBUG=1 and alpaka_DEBUG=2, it will be nearly distributed to the same amount. But I can also implement some custom rules, like set for one job for each device compiler version alpaka_DEBUG=2 and let the rest be alpaka_DEBUG=1.

@psychocoderHPC
Copy link
Member

MAybe we should think about adding debug lvl 3 where all entries and exits of alpaka functions will be visible and disable this for debug lvl 2.

@j-stephan
Copy link
Member

But then we will need CI jobs for alpaka_DEBUG=3...

@AuroraPerego
Copy link
Contributor

FYI in debug mode with the CUDA back-end I see:

93% tests passed, 2 tests failed out of 30

Total Test time (real) = 316.42 sec

The following tests FAILED:
          3 - mandelbrotTest (ILLEGAL)
          4 - matMulTest (ILLEGAL)

Both the tests failed with : 'cudaErrorLaunchOutOfResources': 'too many resources requested for launch'!`

@SimeonEhrig
Copy link
Member

But then we will need CI jobs for alpaka_DEBUG=3...

Actual yes and no. It only tests, if a std::cout is working. So, this is not critical. On the other side, we know that printing to the terminal can change the execution order in a parallel program. But a std::cout should not fix our application. Therefore, I'm for a alpaka_DEBUG=3. This level is only for human developer ;-)

@bernhardmgruber
Copy link
Member Author

But then we will need CI jobs for alpaka_DEBUG=3...

Actual yes and no. It only tests, if a std::cout is working. So, this is not critical.

The std::cout << getWidth(extent) ...; was exactly what was broken, so we must include it in the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants