Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running kmeans executable in distributed Environment #159

Open
luca-filipponi opened this issue Sep 12, 2014 · 2 comments
Open

Running kmeans executable in distributed Environment #159

luca-filipponi opened this issue Sep 12, 2014 · 2 comments

Comments

@luca-filipponi
Copy link

Graphab offers a kmeans executable to perform clustering, I've tried that on a single node and it works perfectly. My question is, how can I do that in a distributed environment?
I've created two virtual machine, they are on the same network and the ipaddresd of each one is reachable from the other (I've test using ping), each machine has the kmeans executuable compiled from the graphlab source.
I've see in the official documentation that the command for running the kmeans distributed is :

mpiexec -n [N machines] --hostfile [host file] ./kmeans ....

How a hostfile should be??
Someone has ever ran kmeans using mpi?

Thanks in advance for the help.

@unmeshvrije
Copy link

Host file should contain IP addresses of the nodes you intend to form
cluster with.
On Sep 12, 2014 6:27 PM, "Luca Filipponi" notifications@github.com wrote:

Graphab offers a kmeans executable to perform clustering, I've tried that
on a single node and it works perfectly. My question is, how can I do that
in a distributed environment?
I've created two virtual machine, they are on the same network and the
ipaddresd of each one is reachable from the other (I've test using ping),
each machine has the kmeans executuable compiled from the graphlab source.
I've see in the official documentation that the command for running the
kmeans distributed is :

mpiexec -n [N machines] --hostfile [host file] ./kmeans ....

How a hostfile should be??
Someone has ever ran kmeans using mpi?

Thanks in advance for the help.


Reply to this email directly or view it on GitHub
#159.

@luca-filipponi
Copy link
Author

Ok I didn't think was that easy, I've created an host file with the two ip addresses and used the command:

mpiexec -n 2 --hostfile input&result/hostfile ./kmeans --data=input&result/tfidf --clusters=5 --output-clusters=input&result/centroid.txt --output-data=input&result/clusteredPoint.txt --sparse=1 --id=1

And all works perfectly, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants