Problem with push_slice #125

airmler · 2021-07-15T07:09:39Z

Dear Edgar,

The following code snipped leads to a MPI deadlock when running on 72 mpi ranks.

#include <ctf.hpp>
int main(int argc, char ** argv){

  MPI_Init(&argc, &argv);
  CTF::World dw;

  int64_t n(278);
  int64_t n2(2);
  int syms[] = {NS};

  int64_t m[] = {n};
  int64_t k[] = {n2};
  CTF::Tensor<double> e(1, m, syms, dw, "e");
  CTF::Tensor<double> f(1, k, syms, dw, "e");

  int epsiStart[] = { 0 }, epsiEnd[] = { 2 };

  f = e.slice(epsiStart, epsiEnd);

  MPI_Finalize();
  return 0;
}

The reason for the deadlock seems to be following: the two tensors have different mappings

The tensor e has the mapping e[p72(0)c0] whereas f has
f topo (, 2, 2, 2, 3, 3); CTF: Tensor mapping is ETXS01[p2(0)c0]

This leads to wrong numbers in src/redistribuion/slice.cxx:190 which implies an incorrect MPI_Sendrecv_replace operation in src/redistribution/slice.cxx:207
(only rank 2-71 reach the MPI_Sendrecv_replace command with send/recv destination 0 or 1).

The most crude workaround would be changing the if statement in src/tensor/untyped_tensor.cxx:1043. Is there any other solution or am I generally missing something here?

Best,
Andreas

The text was updated successfully, but these errors were encountered:

…ttempt to fix (#125)

solomonik · 2021-07-16T22:01:30Z

@airmler I was able to reproduce this, thanks for reporting. Please try out the fix on branch fix_slice. The problem is that the new optimized slice code seems erroneous when a tensor is not distributed over all processors, such as the case with the small vector here. I simply turned off fast slice for such cases, which should hopefully be a robust fix. If performance of slice becomes an issue as a result, please let me know, it should be possible to extend the optimized code to handle such cases as well, will just require more work.

airmler · 2021-07-19T12:54:21Z

Thanks for the fast fix. This works perfect for us.
In our calculations the large tensors are fully distributed, that is why this change is satisfactory.
Cannot see an example where the extension would be really necessary.

solomonik pushed a commit that referenced this issue Jul 16, 2021

disable fast slice when tensors are not mapped to all processors in a…

6e89d84

…ttempt to fix (#125)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with push_slice #125

Problem with push_slice #125

airmler commented Jul 15, 2021

solomonik commented Jul 16, 2021

airmler commented Jul 19, 2021

Problem with push_slice #125

Problem with push_slice #125

Comments

airmler commented Jul 15, 2021

solomonik commented Jul 16, 2021

airmler commented Jul 19, 2021