Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's best practice for setup cluster? #3180

Open
cmsxbc opened this issue Jul 5, 2022 · 3 comments
Open

What's best practice for setup cluster? #3180

cmsxbc opened this issue Jul 5, 2022 · 3 comments

Comments

@cmsxbc
Copy link

cmsxbc commented Jul 5, 2022

ENV: python 3.7.11 mars 0.9.0

import mars
import mars.tensor as mt

n = 20000
n_worker = 1
n_cpu = 1
mem_bytes = 20 * 2 ** 30

mars.new_session(init_local=True, n_worker=n_worker, n_cpu=n_cpu, mem_bytes=mem_bytes)

X = mt.random.RandomState(0).rand(n,n)
invX = mt.linalg.inv(X).execute()

While executing above script, the speed is positive correlate to n_cpu and negative correlate to n_worker.
Is it right result or I did something wrong?
If it was right, is that meaning I should always choose 1 worker with multi cpu , instead of multi worker with 1 cpu?
And what should I do if i can run on multi machines, but only a little cpu core every machine.

samples: execute time in seconds.

Worker\CPU 1 2 4 8 16
1 1100 546 291 184 124
2 \ 857 360 211 150
4 \ \ 836 383 234
8 \ \ \ 417 310
16 \ \ \ \ 555
@cmsxbc cmsxbc changed the title What What's best practice for setup cluster? Jul 5, 2022
@qinxuye
Copy link
Collaborator

qinxuye commented Jul 6, 2022

Nice question, the performance may be related with many factors, seems the implementation of mt.linalg.inv does not scale well when workers grow.

We need to do some investigation to see what happened.

@cmsxbc
Copy link
Author

cmsxbc commented Jul 7, 2022

@qinxuye And I found there were very busy network traffic. When doing mt.linalg.inv on a 20k square matrix (the size is about 2.9GiB), there will be about 40GiB - 50 GiB data be transmit between nodes, from 2 workers cluster to 16 workers cluster.
I deployed the cluster over ray backend. As I known, the background network throughput of ray is about 10-20 KiB/s.

@qinxuye
Copy link
Collaborator

qinxuye commented Jul 7, 2022

@cmsxbc Do you mind to join slack https://join.slack.com/t/mars-computing/shared_invite/zt-1c39tdh83-K1AT9FmtKkUgOzmM6~Nwbg

So that we can know more info about the problem you are solving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants