Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Race due to num_error variable #585

Open
4 of 11 tasks
ghost opened this issue Jul 5, 2021 · 3 comments
Open
4 of 11 tasks

Data Race due to num_error variable #585

ghost opened this issue Jul 5, 2021 · 3 comments

Comments

@ghost
Copy link

ghost commented Jul 5, 2021

What type of issue is this?

  • Bug in the code or other problem
  • Inadequate/incorrect documation
  • Feature request

If this is a bug report, please use the following template.
Otherwise, please delete the rest of the template.

Where does this bug appear?

Check all that apply:

  • MacOS
  • Linux
  • Cray
  • GCC
  • Clang
  • Intel compiler
  • MPICH and derivatives (MVAPICH2, Intel MPI, Cray MPI, etc.)
  • Open-MPI

Operating system

What is the output of uname -a?
Linux 299fdde96882 5.4.72-microsoft-standard-WSL2 #1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Compiler

What is the output of ${COMPILER} -v or ${COMPILER} --version?
clang version 10.0.1

PRK build information

Please attach or inline make.defs.

#name of MPI C compiler, e.g. mpiicc, mpicc
MPICC=

#name of C compiler, e.g. icc, xlc, gcc
CC=clang-10

#name of MPI Fortran compiler, e.g. mpifort, mpif90
MPIF90=

#name of Fortran compiler, e.g. ifort, xlf_r, gfortran
FC=

#name of compile line flag enabling OpenMP, e.g. -openmp, -qopenmp, -fopenmp
OPENMPFLAG=-fopenmp
OFFLOADFLAG=

#default compiler optimization flags
DEFAULT_OPT_FLAGS:=

Output showing problem

I detected a data race occurring in all of the OpenMP Kernels except for Refcount. All the Kernels have the same data race in involving the num_error variable, specifically when one thread will try to write num_error=1 while another will try to read bail_out(num_error). An example from branch:

 #pragma omp parallel private(i, my_ID, iter, aux, nfunc, rank) reduction(+:total)
  {
  int * RESTRICT vector; int * RESTRICT index;

  #pragma omp master
  {
  nthread = omp_get_num_threads();
  if (nthread != nthread_input) {
    num_error = 1;
    printf("ERROR: number of requested threads %d does not equal ",
           nthread_input);
    printf("number of spawned threads %d\n", nthread);
  }
  else {
    printf("Number of threads          = %d\n", nthread_input);
    printf("Vector length              = %d\n", vector_length);
    printf("Number of iterations       = %d\n", iterations);
    printf("Branching type             = %s\n", branch_type);
#if RESTRICT_KEYWORD
    printf("No aliasing                = on\n");
#else
    printf("No aliasing                = off\n");
#endif
  }
  }
  bail_out(num_error);

The data race occurs between lines 9 and 26 in this snippet, or lines 207 and 224 of branch.c. I found this data race using the Coderrect Scanner https://coderrect.com/

Please do not attach screenshots of your terminal.

@jeffhammond
Copy link
Member

Okay, I'll try to fix soon but that might still be a while.

@AtlantaPepsi
Copy link
Contributor

Technically this is a race condition as there could be write after read for num_error. But functionally it doesn't make a difference, since master thread will always catch error value after master construct and exit. Other thread will wait at barrier inside bait_out function immediately after the function call before all thread including master confirm valid inputs.

@jeffhammond what do you think? shall we close this?

@tgmattso
Copy link

tgmattso commented Sep 9, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants