New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel Computation of Transition Matrix #19
base: master
Are you sure you want to change the base?
parallel Computation of Transition Matrix #19
Conversation
can compute the Transition Matrix in Parallel requires pathos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined variable n_cores
.
pathpy/HigherOrderNetwork.py
Outdated
|
||
edges_length= len(self.edges) | ||
|
||
n_chunks = n_cores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will raise an error, since n_cores
is not defined in this scope, I guess you wanted to write n_chunks = self.cores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you're right, thank you
@benassa-de-glassa thanks for the nice contribution. Do you have any idea what the speedup is and at what scale (e.g. density, number of nodes, ...)? I have run it on a toy model and due to the overhead it actually slowed down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed n_cores to self.cores
pathpy/HigherOrderNetwork.py
Outdated
|
||
edges_length= len(self.edges) | ||
|
||
n_chunks = n_cores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you're right, thank you
Does |
@verginer It actually slows down for smaller networks. But for the Entropy Growth Rate Ratio, at some point the transition matrix of size 16k x 16k is calculated. In that case, when using 6 cores instead of 1, the time to calculate went from ~15 minutes to ~6.5 minutes |
To handle small and large networks it might be good to have a heuristic (e.g. number of edges) to choose whether to compute it sequentially or in parallel if the network is sufficiently large. |
pathos improves on multiprocessing, at least, that's what they say on their pypi site. |
you're certainly right, one should calculate smaller matrices sequentially. It's on my todo list |
removed pathos, so that the standard multiprocessing library can be used, thereby removing the dependency
Thanks, Benedikt! Awesome work, that is something we were actually missing. We will see how to best integrate it without creating problems in deployment. |
Simon just made me aware, that I made a mistake while importing some files. This means, I tested the wrong files and that the way I implemented the Pool using standard multiprocessing does not work due to a pickling issue and will still require pathos. |
What is the problem exactly? I am able to use the multiprocessing version using the tests in |
I ran the test too, and got no error. I have attached the output from running pytest with the new file in the site-package |
Ok, I have found the problem. Your implementation is correct and yes there is a problem with pickling. However that is only a problem when using python 3.4 or earlier on 3.5 and 3.6 it works. The question now is if we want to support 3.4 which is no longer officially supported by the python core developers. For a detail comparison of the tests under 3.4, 3.5 and 3.6 you can see the output from travsi here: https://travis-ci.org/verginer/pathpy-1/builds/307186422 correction: |
not closing the pool lets the processes stay alive, even after they finished their task
can compute the Transition Matrix in Parallel
requires the pathos library