Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tanh doesnt use all cores #2136

Closed
hughperkins opened this issue Jul 18, 2017 · 9 comments
Closed

tanh doesnt use all cores #2136

hughperkins opened this issue Jul 18, 2017 · 9 comments

Comments

@hughperkins
Copy link
Contributor

When I run this script:

import torch

a = torch.rand(1000, 10000)
while True:
    print('.')
    a.tanh_()

and then open htop, I expect to see all 8 cores running at 100%, but only 4 seem to be running? :

screen shot 2017-07-17 at 9 01 08 am

In addition, those cores that are running, are only running at ~30-40%.

(Note that I'm not submitting a fix for this, just flagging it)

@jekbradbury
Copy link
Contributor

What happens when you fiddle around with the pertinent environment variables (some subset of OPENBLAS_NUM_THREADS, MKL_NUM_THREADS, and OPENMP_NUM_THREADS)?

@hughperkins
Copy link
Contributor Author

I think this should be handled automatically really? At least, I dont see anything in any of the doc I've read suggesting one needs to do this. So, either the doc needs to be updated, or else this should be handled automatically, I reckon.

@hughperkins
Copy link
Contributor Author

That said, with MKL_NUM_THREADS=8, no change:
screen shot 2017-07-17 at 10 24 40 pm

OPENBLAS_NUM_THREADS=8 MKL_NUM_THREADS=8 OPENMP_NUM_THREADS=8 python test_tanh.py: no change:

screen shot 2017-07-17 at 10 25 37 pm

@hughperkins
Copy link
Contributor Author

(tanh shouldnt be using blas? I would think it uses some combination of SSE and OpenMP?)

@jekbradbury
Copy link
Contributor

Oh of course, sorry. Yeah, this is a good question then!

@apaszke
Copy link
Contributor

apaszke commented Jul 18, 2017

It seems that Tanh doesn't use OpenMP at all. It calls into TH, and it uses TH_TENSOR_APPLY_2 which is single-threaded. TH could use some multi-core optimizations, but we don't have enough hands to do it right now.

We could provide some guidance if anyone wants to take a stab. We already have macros that extend TH_TENSOR_APPLY to make use of all cores in the contiguous case, so at least this improvement should be easy to add.

@ruotianluo
Copy link
Contributor

I submitted a PR, check if it looks like what's expected. @apaszke

@fmassa
Copy link
Member

fmassa commented Jul 18, 2017

If one wants to use OMP on TH_TENSOR_APPLY2, one could look into improving on top of torch/torch7#395 . But I think that having better support for AVX/SSE would be best.

@soumith soumith added this to Uncategorized in Issue Status Aug 23, 2017
@soumith soumith moved this from Uncategorized to High Priority in Issue Status Aug 23, 2017
@soumith soumith added this to Performance in Issue Categories Aug 30, 2017
@weiyangfb
Copy link
Contributor

Closing since this has been fixed vix #2792

jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this issue Nov 30, 2022
…ter (pytorch#2136)



Co-authored-by: Ryan Spring <rdspring1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Issue Categories
Performance
Issue Status
High Priority
Development

No branches or pull requests

6 participants