Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert error message is not usable #254

Open
XL64 opened this issue Mar 24, 2022 · 8 comments
Open

assert error message is not usable #254

XL64 opened this issue Mar 24, 2022 · 8 comments

Comments

@XL64
Copy link
Contributor

XL64 commented Mar 24, 2022

When I run geosx I happen to have assert errors in LVArray that are not easy to understand as strings are not correctly built:
ArraySlice.hpp:294: std::enable_if<<expression>, T &>::type LvArray::ArraySlice<T, NDIM_TPARAM, USD_TPARAM, INDEX_TYPE>::operator[](INDEX_TYPE) const [with int U = 1; T = const int; int NDIM_TPARAM = 1; int USD_TPARAM = -1; INDEX_TYPE = int]: block: [31,0,0], thread: [4,0,0] Assertion `false && "EXP = " "index < 0 || index >= m_dims[ 0 ]" "MSG = " "\"Array Bounds Check Failed: index=\" << index << \" m_dims[0]=\" << m_dims[0]"` failed.

@corbett5
Copy link
Collaborator

You're getting an out of bounds error on the GPU . In this case it's coming from an ArraySlice< int const, 1 >

FYI this is about as nice an error as you can possibly get out of a GPU...

@XL64
Copy link
Contributor Author

XL64 commented Mar 25, 2022

Indeed i understood that. It's hard to follow as cuda gdb does not stop on it. I understand this is due ti the face we are on device section. Message could have been prettier if using printf as in the NDEBUG section. And even prettier if MSG was a FORMAT string that could be added to the printf argument and some variables (here index/dim) passed as argument to printf. But it is a large change as i understand from the code.

@corbett5
Copy link
Collaborator

My bad, I thought you were confused about the meaning of the message.

When NDBUG is defined LVARRAY_ERROR uses the standard assert macro, why cuda gdb doesn't stop on this is beyond me. Personally I've found that good old fashioned print debugging is often a decent way to do things for the GPU.

We could add printf but assert already prints out information so we would double the output for little gain other than that you get a nice error and location header. But you still have STRINGIZE( EXP ) and STRINGIZE( MSG ) which is producing that un-formatted output.

As you mentioned short of rewriting all our errors to use printf there's not much we can do about this. That said if someone were to add a printf set of error macros and switch the checks in host-device functions to use that I'd happily approve it.

@XL64
Copy link
Contributor Author

XL64 commented Mar 25, 2022

I tried some patch but it became complicated when trying to use my macro in void checkIndices(...), need to convert printDimsAndIndices to method constructing a char[givensize]... which is not really C++ like... It is not that simple.

@XL64
Copy link
Contributor Author

XL64 commented Mar 25, 2022

Basically something like this, maybe I could have 2 macros, one with printf compatible msg and the original one... :

#if defined(__CUDA_ARCH__)
#  define CUDA_INFORMATION_FMT "***** Block: [%u, %u, %u]\n***** Thread: [%u, %u, %u]\n"
#  define CUDA_INFORMATION_PARAMS blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y, threadIdx.z
#  define CALL_ERROR_HANDLER do { asm ( "trap;" ); } while ( false )
#else
#  define CUDA_INFORMATION_FMT
#  define CUDA_INFORMATION_PARAMS
#  define CALL_ERROR_HANDLER do { LvArray::system::callErrorHandler(); } while ( false )
#endif

#define NUMARGS(...)  (sizeof((int[]){__VA_ARGS__})/sizeof(int))
/**
 * @brief Abort execution if @p EXP is true.
 * @param EXP The expression to check.
 * @param MSG_FMT The format of the message to associate with the error, can be anything understood by printf().
 * @note This macro can be used in both host and device code.
 * @note Tries to provide as much information about the location of the error
 *       as possible. On host this should result in the file and line of the error
 *       and a stack trace along with the provided message. On device none of this is
 *       guaranteed. In fact it is only guaranteed to abort the current kernel.
 */
#define LVARRAY_ERROR_IF( EXP, MSG_FMT, ... )                           \
  do                                                                    \
  {                                                                     \
    if( EXP )                                                           \
    {                                                                   \
      constexpr char const * formatString = "***** ERROR\n"             \
        "***** LOCATION: " LOCATION "\n"                                \
        CUDA_INFORMATION_FMT                                            \
        "***** Controlling expression (should be false): "              \
        STRINGIZE( EXP ) "\n"                                           \
        "***** MSG: " MSG_FMT "\n\n";                                   \
      if (NUMARGS(__VA_ARGS__) == 0)                                    \
      {                                                                 \
        printf( formatString, CUDA_INFORMATION_PARAMS);                 \
      }                                                                 \
      else                                                              \
      {                                                                 \
        printf( formatString, CUDA_INFORMATION_PARAMS, __VA_ARGS__);    \
      }                                                                 \
      CALL_ERROR_HANDLER();                                             \
    }                                                                   \
  } while( false )

@corbett5
Copy link
Collaborator

Yeah we'd definitely want two macros. The original one is used all over the place (all the GEOSX macros flow through LvArray), and while streams can be annoying they're less prone to error and more flexible than printf.

@rrsettgast
Copy link
Member

Well...we don't really do any error checking inside a kernel except for range checking...do we?

@XL64
Copy link
Contributor Author

XL64 commented Mar 25, 2022

I don't really now, I just modified the macro and followed the called path, but indeed maybe I should not change all. I seem to have something working, I'll see if it's helpfull or not and submit a PR if it's worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants