Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel Computation of Transition Matrix #19

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

benassa-de-glassa
Copy link
Contributor

can compute the Transition Matrix in Parallel
requires the pathos library

can compute the Transition Matrix in Parallel
requires pathos
Copy link
Collaborator

@verginer verginer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix undefined variable n_cores.


edges_length= len(self.edges)

n_chunks = n_cores
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will raise an error, since n_cores is not defined in this scope, I guess you wanted to write n_chunks = self.cores

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right, thank you

@verginer
Copy link
Collaborator

verginer commented Nov 20, 2017

@benassa-de-glassa thanks for the nice contribution. Do you have any idea what the speedup is and at what scale (e.g. density, number of nodes, ...)?

I have run it on a toy model and due to the overhead it actually slowed down.

Copy link
Contributor Author

@benassa-de-glassa benassa-de-glassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed n_cores to self.cores


edges_length= len(self.edges)

n_chunks = n_cores
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right, thank you

@verginer
Copy link
Collaborator

Does pathos add some features, which are not available in the standard multiprocessing library? Or would it be possible to avoid this dependency?

@benassa-de-glassa
Copy link
Contributor Author

@verginer It actually slows down for smaller networks. But for the Entropy Growth Rate Ratio, at some point the transition matrix of size 16k x 16k is calculated. In that case, when using 6 cores instead of 1, the time to calculate went from ~15 minutes to ~6.5 minutes

@verginer
Copy link
Collaborator

verginer commented Nov 20, 2017

To handle small and large networks it might be good to have a heuristic (e.g. number of edges) to choose whether to compute it sequentially or in parallel if the network is sufficiently large.

@benassa-de-glassa
Copy link
Contributor Author

pathos improves on multiprocessing, at least, that's what they say on their pypi site.
But I mostly used it because I'm somewhat familiar with it. I'm right now reading up on the standard library and it looks similiar enough to be interchangeable

@benassa-de-glassa
Copy link
Contributor Author

you're certainly right, one should calculate smaller matrices sequentially. It's on my todo list

removed pathos, so that the standard multiprocessing library can be used, thereby removing the dependency
@IngoScholtes
Copy link
Owner

Thanks, Benedikt!

Awesome work, that is something we were actually missing. We will see how to best integrate it without creating problems in deployment.

@benassa-de-glassa
Copy link
Contributor Author

Simon just made me aware, that I made a mistake while importing some files. This means, I tested the wrong files and that the way I implemented the Pool using standard multiprocessing does not work due to a pickling issue and will still require pathos.
I'm very sorry for this mistake

@verginer
Copy link
Collaborator

verginer commented Nov 25, 2017

What is the problem exactly? I am able to use the multiprocessing version using the tests in tests/test_estimation.py (with 3 cores) and it correctly executes with the expected result? Do you mean that the pickling does not work for large objects?

@benassa-de-glassa
Copy link
Contributor Author

benassa-de-glassa commented Nov 25, 2017

I ran the test too, and got no error.
Until I changed the the file in the site-packages. I believe that the import in test_estimation.py uses the site-package of pathpy instead of the package in the folder which it is saved in, which is why I had not noticed until Simon pointed it out to me

I have attached the output from running pytest with the new file in the site-package

pytest.txt

@verginer
Copy link
Collaborator

verginer commented Nov 25, 2017

Ok, I have found the problem. Your implementation is correct and yes there is a problem with pickling. However that is only a problem when using python 3.4 or earlier on 3.5 and 3.6 it works.
I think the older pickling protocol 3 or 2 is used which has some problems.

The question now is if we want to support 3.4 which is no longer officially supported by the python core developers.

For a detail comparison of the tests under 3.4, 3.5 and 3.6 you can see the output from travsi here:

https://travis-ci.org/verginer/pathpy-1/builds/307186422

correction:
deprecation of 3.4 is planned for January 2018

not closing the pool lets the processes stay alive, even after they finished their task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants