Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defect: Spurious coarray bounds error with -fcheck=bounds #748

Open
1 task done
nncarlson opened this issue Jan 27, 2022 · 0 comments
Open
1 task done

Defect: Spurious coarray bounds error with -fcheck=bounds #748

nncarlson opened this issue Jan 27, 2022 · 0 comments

Comments

@nncarlson
Copy link

  • I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: caf version 2.9.2-13-g235167d

  • Fortran Compiler: GNU Fortran (GCC) 11.2.0

  • C compiler used for building lib: gcc (GCC) 11.2.0

  • Installation method: make; make install

  • All flags & options passed to the installer: cmake -DCMAKE_BUILD_TYPE=Release

  • Output of uname -a: Linux thelio.indiana 5.14.18-100.fc33.x86_64 #1 SMP Fri Nov 12 17:38:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

  • MPI library being used: MPICH 3.3.2

  • Machine architecture and number of physical cores: 12 core AMD Threadripper 2920X

  • Version of CMake: 3.19.7

To help us debug your issue please explain:

What you were trying to do (and why)

I am developing code to perform a halo exchange associated with domain decomposition methods for PDEs and have encountered a problem. To debug the problem I've extracted the problem loop into a subroutine and wrapped it with a main program that initializes some structure arrays with data from a debugging data set for 4 images. The subroutine loop is performing the equivalent of an MPI_Alltoallv or MPI_Neighbor_alltoall.After calling the subroutine, the program compares the result of the subroutine with reference data to check for a successful completion. Here is the problem reproducer:

program main

  type :: index_map
    integer, allocatable :: offP_index(:)
    integer, allocatable :: recv_displs(:), recv_counts(:)
    integer, allocatable :: send_displs(:), send_image(:), send_index(:)
    integer, allocatable :: send_index_ref(:)
  end type
  type(index_map) :: imap

  select case (this_image())
  case (1)
    imap%send_image  = [2, 3, 4]
    imap%recv_counts = [3, 5, 4]
    imap%recv_displs = [0, 3, 8]
    imap%send_displs = [0, 0, 0]
    imap%offP_index = [8, 10, 12, 13, 14, 16, 18, 19, 21, 23, 25, 27]
    imap%send_index_ref = [1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
  case (2)
    imap%send_image  = [1, 3, 4]
    imap%recv_counts = [6, 2, 4]
    imap%recv_displs = [0, 6, 8]
    imap%send_displs = [0, 5, 4]
    imap%offP_index = [1, 2, 3, 4, 5, 6, 15, 16, 20, 21, 24, 25]
    imap%send_index_ref = [8, 10, 12, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12]
  case (3)
    imap%send_image  = [1, 2, 4]
    imap%recv_counts = [5, 4, 6]
    imap%recv_displs = [0, 5, 9]
    imap%send_displs = [6, 3, 8]
    imap%offP_index = [2, 3, 4, 5, 6, 9, 10, 11, 12, 21, 23, 24, 25, 26, 27]
    imap%send_index_ref = [13, 14, 16, 18, 19, 15, 16, 13, 14, 15, 16, 17, 18, 19]
  case (4)
    imap%send_image  = [1, 2, 3]
    imap%recv_counts = [6, 6, 7]
    imap%recv_displs = [0, 6, 12]
    imap%send_displs = [11, 7, 7]
    imap%offP_index = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
    imap%send_index_ref = [21, 23, 25, 27, 20, 21, 24, 25, 21, 23, 24, 25, 26, 27]
  end select
  allocate(imap%send_index, mold=imap%send_index_ref)
  imap%send_index = 0

  call sub(imap)

  if (any(imap%send_index /= imap%send_index_ref)) error stop

contains

  subroutine sub(this)

    type(index_map), intent(inout), target :: this

    integer :: j, k
    type :: box
      integer, pointer :: array(:)
    end type
    type(box), allocatable :: buffer[:]
  
    allocate(buffer[*])
    buffer%array => this%send_index
    sync all
    do j = 1, size(this%recv_counts)
      associate (i => this%send_image(j), n => this%recv_counts(j), &
                 s0 => this%send_displs(j), r0 => this%recv_displs(j))
        do k = 1, n
          !write(*,'(i0,a,5(1x,i0))') this_image(), ': ', j, i, &
          !    lbound(buffer[i]%array), s0+k, ubound(buffer[i]%array)
          buffer[i]%array(s0+k) = this%offP_index(r0+k)
        end do
      end associate
    end do
    sync all
  end subroutine

end program

What happened (include command output, screenshots, logs, etc.)

When compiled with -fcheck=bounds the runtime reports a spurious out-of-bounds error:

At line 69 of file gfortran-20220127.f90
Fortran runtime error: Index '15' of dimension 1 of array 'buffer%array' above upper bound of 14

Error termination. Backtrace:
#0  0x402b81 in ???
#1  0x4050b5 in ???
#2  0x40545e in ???
#3  0x7f325e2981e1 in ???
#4  0x40253d in ???
#5  0xffffffffffffffff in ???
Error: Command:
   `/opt/lib/gcc-11_2/mpich/3.3.2/bin/mpiexec -n 4 ./a.out`
failed to run.

What you expected to happen

The program should run to completion without error. That there is no actual out-of-bounds reference can be confirmed by uncommenting the write statement and verifying the index in each iteration of the loop lies within the array bounds -- until it aborts on the following line. Also the NAG and Intel compilers report no out-of-bounds access.

Step-by-step reproduction instructions to reproduce the error/bug

caf -fcheck=bounds reproducer.f90
cafrun -n 4 ./a.out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant