Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing: compare number of processors correctly #76

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alejandrogallo
Copy link
Member

Hello,

we updated our version of CTF and were having some issues with regard
to the performance of the slicing in our code.

After some sniffing around we found this line in the slice method.
In version 1.4.1 the if statement had the < operator in it.
Some time after that this was changed into <= which according to my
understanding renders the else codeblock useless, i.e.

...
} else {
      tsr_A->read_local_nnz(&sz_A, &all_data_A);
//      printf("sz_A+%ld\n",sz_A);
}
...

This means that when the number of processors where tsr_B is
distributed among is equal to tsr_A, there is also a checking of the
dimensions and padding for A, and this means that CTF has to read
the data from A,

...
      tsr_A->write(blk_sz_B, sr->mulid(), sr->addid(), blk_data_B, 'r');
...

which makes this block slower.

If this is true, this pull request would be a fix to the problem. We have
certainly tested it and it confirmed our suspicion. For slices of big tensors
the difference between the < and the <= version is up to 50% in time,
according to our benchmarks. However, for small tensors it appears to be
roughly equivalent, which makes sense.

Thank you very much for your great project!

@solomonik
Copy link
Collaborator

Thanks a lot Alejandro!

These changes seem to make the python tests fail though, likely due to some subtleties involving sparsity, so I will need to investigate/adjust a bit before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants