Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault executing sparse inner product #138

Open
rohany opened this issue Jan 25, 2022 · 18 comments
Open

segfault executing sparse inner product #138

rohany opened this issue Jan 25, 2022 · 18 comments

Comments

@rohany
Copy link
Contributor

rohany commented Jan 25, 2022

The following code raises different segfaults depending on the process count (on a single node), when run on nell-2 tensor.

void innerprod(int nIter, int warmup, std::string filename, std::string tensorC, std::vector<int> dims, World& dw) {
  Tensor<double> B(3, true /* is_sparse */, dims.data(), dw);
  Tensor<double> C(3, true /* is_sparse */, dims.data(), dw);
  Scalar<double> a(dw);

  B.read_sparse_from_file(filename.c_str());
  C.read_sparse_from_file(filename.c_str());
  
  a[""] = B["ijk"] * C["ijk"];
}

When run with a single process, it segfaults with the following backtrace:

/g/g15/yadav2/ctf/src/redistribution/sparse_rw.cxx:948 (discriminator 7)
/g/g15/yadav2/ctf/src/tensor/untyped_tensor.cxx:1302
/g/g15/yadav2/ctf/examples/../include/../src/interface/tensor.cxx:609
/g/g15/yadav2/ctf/examples/../include/../src/interface/tensor.cxx:940
/g/g15/yadav2/ctf/examples/../include/../src/interface/tensor.cxx:952
/g/g15/yadav2/ctf/examples/spbench.cxx:199
/g/g15/yadav2/ctf/examples/spbench.cxx:317 (discriminator 7)

When run with 40 processes (1 process per core on my system): it segfaults with the following backtrace:

/g/g15/yadav2/ctf/src/contraction/contraction.cxx:119 (discriminator 3)
/g/g15/yadav2/ctf/src/interface/term.cxx:983
/g/g15/yadav2/ctf/src/interface/idx_tensor.cxx:227
/g/g15/yadav2/ctf/examples/../include/../src/interface/idx_tensor.h:262
/g/g15/yadav2/ctf/examples/spbench.cxx:209 (discriminator 6)
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/std_function.h:299
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/std_function.h:687
/g/g15/yadav2/ctf/examples/spbench.cxx:9 (discriminator 2)
/g/g15/yadav2/ctf/examples/spbench.cxx:208 (discriminator 1)
/g/g15/yadav2/ctf/examples/spbench.cxx:317 (discriminator 7)
??:0
??:0

Both the of "segfaults" are internal assertion failures, as it seems.

@raghavendrak
Copy link
Collaborator

The nell-2.tensor dimensions specified are 12092 x 9184 x 28818. I am assuming you are using the same in dims.data(), but if you look at the indices specified in the tensor, there are values with index 28818. Using 12093 X 9185 X 28819 will fix this.

@rohany
Copy link
Contributor Author

rohany commented Feb 2, 2022

The .tns file format encodes all tensor indices in 1-indexed format. Does the CTF read operation assume they are zero indexed?

@solomonik
Copy link
Collaborator

Yes, the documentation I think is consistent with that.

@solomonik
Copy link
Collaborator

One fix is to just read a tensor with dims larger by 1 and take a slice starting from 1, I think we did that to preprocess to get results elsewhere

@rohany
Copy link
Contributor Author

rohany commented Feb 2, 2022

If ctf should be reading in the coordinates correctly, why is incrementing the dimensions necessary? Either way, I’ll give it a try.

@solomonik
Copy link
Collaborator

tns files are just one standard

@rohany
Copy link
Contributor Author

rohany commented Feb 3, 2022

I tested this out with incrementing all of my dimensions by 1 and I'm still running into a segfault on 1 and 40 processes.

@raghavendrak
Copy link
Collaborator

Are you seeing the segfault when reading the tensor?
(I tried running your code, and we were able to read the tensors on 1 process).

@rohany
Copy link
Contributor Author

rohany commented Feb 3, 2022

No, it seems to be after the tensors load.

To replicate my exact setup, try running this code: https://github.com/rohany/ctf/blob/master/examples/spbench.cxx (and edit line 287 to be dims.push_back(atoi(it.c_str()) + 1);.

Then, run the binary with arguments:
spbench -tensor <path to tns> -dims 12092,9184,28818 -n 20 -warmup 10 -bench spinnerprod -tensorC <path to tns>

@raghavendrak
Copy link
Collaborator

The segmentation fault is because CTF runs out of memory for the contraction. Can you try higher node counts?
Also, this specific operation (if B == C) can be achieved by computing the Forbenius norm i.e., B.norm2(norm).

@rohany
Copy link
Contributor Author

rohany commented Feb 5, 2022

I'm skeptical that memory usage is the problem (I usually get a signal 9 from the job scheduler when a process OOMs). I tried running with up to 8 nodes and saw segfaults each time.

Also, this specific operation (if B == C) can be achieved by computing the Forbenius norm i.e., B.norm2(norm).

I'm running when B != C.

@raghavendrak
Copy link
Collaborator

CTF calculates the memory usage a priori. If the contraction cannot be performed then an assert is triggered and the computation is aborted. Can you recompile and run your code with -DDEBUG=4 and -DVERBOSE=4.
I was under the assumption that both B and C are loaded with the same tensor (filename.c_str()) (based on your code mentioned first here) [nell-2 tensor].

@rohany
Copy link
Contributor Author

rohany commented Feb 5, 2022

I don't see anything interesting output with those flags on.

The output before the crash is:

CTF: Running with 4 threads
CTF: Total amount of memory available to process 0 is 170956357632
12093
9185
28819
debug:untyped_tensor.cxx:440 Created order 3 tensor ETXS03, is_sparse = 1, allocated = 1
debug:untyped_tensor.cxx:440 Created order 3 tensor AILI03, is_sparse = 1, allocated = 1
debug:untyped_tensor.cxx:440 Created order 0 tensor OBNO00, is_sparse = 0, allocated = 1
New tensor OBNO00 defined of size 1 elms (8 bytes):
printing lens of dense tensor OBNO00:
printing mapping of dense tensor OBNO00
CTF: OBNO00 mapped to order 4 topology with dims: 2  2  2  5
CTF: Tensor mapping is OBNO00[]
printing mapping of sparse tensor ETXS03
CTF: ETXS03 mapped to order 3 topology with dims: 10  2  2
CTF: Tensor mapping is ETXS03[p2(1)c0,p2(2)c0,p10(0)c0]
Read 76879419 non-zero entries from the file.
printing mapping of sparse tensor AILI03
CTF: AILI03 mapped to order 3 topology with dims: 10  2  2
CTF: Tensor mapping is AILI03[p2(1)c0,p2(2)c0,p10(0)c0]
Read 76879419 non-zero entries from the file.

and the backtrace is

/g/g15/yadav2/ctf/src/contraction/contraction.cxx:119 (discriminator 3)
/g/g15/yadav2/ctf/src/interface/term.cxx:983
/g/g15/yadav2/ctf/src/interface/idx_tensor.cxx:227
/g/g15/yadav2/ctf/examples/../include/../src/interface/idx_tensor.h:262
/g/g15/yadav2/ctf/examples/spbench.cxx:209 (discriminator 6)
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/std_function.h:299
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/std_function.h:687
/g/g15/yadav2/ctf/examples/spbench.cxx:9 (discriminator 2)
/g/g15/yadav2/ctf/examples/spbench.cxx:208 (discriminator 1)
/g/g15/yadav2/ctf/examples/spbench.cxx:323 (discriminator 7)
??:0
??:0

I was under the assumption that both B and C are loaded with the same tensor (filename.c_str()) (based on your code mentioned first here) [nell-2 tensor].

That was a typo. The load of C shold have used a different input filename.

@raghavendrak
Copy link
Collaborator

So if I have to reproduce this, what are the two tensor files I need to use?
(I see that both tensors ETSX03 and AILI03 have the same non-zero entries: 76879419?)

@rohany
Copy link
Contributor Author

rohany commented Feb 5, 2022

I'm currently running it with the same tensor files (nell-2 and nell-2), but I aim to use it for different tensor files once we can resolve the segfault.

@raghavendrak
Copy link
Collaborator

raghavendrak commented Feb 11, 2022

CTF runs out of memory for this contraction (with nell-2 tensor as input for both B and C). I tried till 128 nodes with no luck. There is also a possibility of a bug in CTF. With -DDEBUG=4 and -DVERBOSE=4 you should be able to see output similar to below:

debug:contraction.cxx:2942 [EXH] Not enough memory available for topo 2047 with order 1 memory 1778101471/1183301216
ERROR: Failed to map contraction!

@rohany
Copy link
Contributor Author

rohany commented Feb 11, 2022

Is this something related to the shape of the tensor, or tensor of similar and greater size will also fail? Specifically the other larger tensors in the frostt suite?

@raghavendrak
Copy link
Collaborator

My guess is that it has to do with the size and the contraction type. Might have to try other tensors with this contraction to be able to conclude.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants