Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with push_slice #125

Open
airmler opened this issue Jul 15, 2021 · 2 comments
Open

Problem with push_slice #125

airmler opened this issue Jul 15, 2021 · 2 comments

Comments

@airmler
Copy link

airmler commented Jul 15, 2021

Dear Edgar,

The following code snipped leads to a MPI deadlock when running on 72 mpi ranks.

#include <ctf.hpp>
int main(int argc, char ** argv){

  MPI_Init(&argc, &argv);
  CTF::World dw;

  int64_t n(278);
  int64_t n2(2);
  int syms[] = {NS};

  int64_t m[] = {n};
  int64_t k[] = {n2};
  CTF::Tensor<double> e(1, m, syms, dw, "e");
  CTF::Tensor<double> f(1, k, syms, dw, "e");

  int epsiStart[] = { 0 }, epsiEnd[] = { 2 };

  f = e.slice(epsiStart, epsiEnd);

  MPI_Finalize();
  return 0;
}

The reason for the deadlock seems to be following: the two tensors have different mappings

The tensor e has the mapping e[p72(0)c0] whereas f has
f topo (, 2, 2, 2, 3, 3); CTF: Tensor mapping is ETXS01[p2(0)c0]

This leads to wrong numbers in src/redistribuion/slice.cxx:190 which implies an incorrect MPI_Sendrecv_replace operation in src/redistribution/slice.cxx:207
(only rank 2-71 reach the MPI_Sendrecv_replace command with send/recv destination 0 or 1).

The most crude workaround would be changing the if statement in src/tensor/untyped_tensor.cxx:1043. Is there any other solution or am I generally missing something here?

Best,
Andreas

@solomonik
Copy link
Collaborator

@airmler I was able to reproduce this, thanks for reporting. Please try out the fix on branch fix_slice. The problem is that the new optimized slice code seems erroneous when a tensor is not distributed over all processors, such as the case with the small vector here. I simply turned off fast slice for such cases, which should hopefully be a robust fix. If performance of slice becomes an issue as a result, please let me know, it should be possible to extend the optimized code to handle such cases as well, will just require more work.

@airmler
Copy link
Author

airmler commented Jul 19, 2021

Thanks for the fast fix. This works perfect for us.
In our calculations the large tensors are fully distributed, that is why this change is satisfactory.
Cannot see an example where the extension would be really necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants