Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel execution may be slower than serial #3

Open
ingenieroariel opened this issue Feb 19, 2014 · 1 comment
Open

Parallel execution may be slower than serial #3

ingenieroariel opened this issue Feb 19, 2014 · 1 comment

Comments

@ingenieroariel
Copy link

@madeleineudell Thanks for this package, here is the output of my initial testing:

Sample program:

using ParallelSparseRegression

m,n,p = 2048,1024,.1
A = sprand(m,n,p)
x0 = Base.shmem_randn(n)
b = A*x0
rho = 1
lambda = 1
quiet = false
maxiters = 100

params = Params(rho,quiet,maxiters)

# Lasso
@time z_lasso = lasso(A,b,lambda; params=params)

Calling the following program with different addprocs values gives the following:

Output without addprocs:

1000 : 1.76e+00 1.27e-01 5.54e-03 4.09e+01
elapsed time: 24.422318823 seconds (6440755392 bytes allocated)

Output with addprocs(3):

1000 : 2.15e+00 1.12e-01 6.07e-03 4.65e+01
elapsed time: 90.979009048 seconds (12805856436 bytes allocated)

Output with addprocs(7):

1000 : 1.75e+00 1.47e-01 5.74e-03 4.21e+01
elapsed time: 228.324713722 seconds (28927210844 bytes allocated)

Full output with values for every iteration:

https://gist.github.com/ingenieroariel/9095001

@madeleineudell
Copy link
Owner

@ingenieroariel, thanks for doing those tests. My guess is that the slowdown is caused by repeatedly allocating shared memory, which is a somewhat slow process. The code in prox.jl, admm.jl, and (possibly) IterativeSolvers:lsqr.jl will have to be modified to overwrite previously allocated memory rather than allocating new memory. Eg right now we call lsqr, when we should call lsqr!, lsqr may call A_mul_b instead of A_mul_b!, the various prox functions don't yet overwrite their inputs... etc. We may even want to do in-place summation: if a and b are shared arrays, a+b is a normal array, and a new shared array will need to be allocated if we multiply a+b by a shared matrix; we can do better by assigning a.s += b and then multiplying a by the shared matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants