New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A faster way to add edges and build the network in Python API? #108
Comments
Great suggestion to support a list of lists or a numpy array directly to C++. It doesn't seem trivial so we will comeback to this later! |
No stress, but looking into your commit from 27 days ago, @antoneri, it seems you have made some progress on this. Me and colleagues are using Infomap as a component in infostop to do stop location detection on human mobility data (COVID-19 project, assessing mobility and social distancing). It involves clustering some pretty big dense networks with Infomap and this single step is a huge bottleneck (at least a 50% slowdown). So if there's any chance you have time to work on this in near future, I would be really grateful. |
Thanks for reminding us @ulfaslak! |
Can you give an example of your intended usage @ulfaslak? |
Sure. I'm using Infomap for spatial clustering in my "Infostop" module. Repository here. I have a function
Check the implementation here. While it would be awesome if there was a way to bulk-add this sparse adjacency matrix object, I understand that it's not super general and therefore probably not a sensible thing to implement. BUT what I do think should exist is an
If that existed, I could probably do some numpy optimization magic to fast convert my sparse adjacency matrix into an edgelist like that and then bulk-add my edges to create an |
:( |
Sorry I have forgotten this, the for link in links:
self.add_link(*link) That would work on a 2d numpy array too, and maybe be good enough if you can export an edge list as you mentioned @ulfaslak? |
Hi Daniel! |
UPDATED: Hi ulfaslak! I met the same question and try to implement adding link one at a time in a C++ loop by just simply adding a new interface in C++ source codes and recompile them. I test the modified function on the graph with 62627641 links. On my server, the default
That's what I do to accelerate add links. Correct errors, if any. :) |
Hi @xiangyh9988! This is nice. Thanks for reporting the execution times! Looking closer at |
Thanks for your affirmation. At first, I didn't notice that there had been an official implementation with C++ loop in #103, so I try to implement these codes by myself. Since they have finished that, it seems unnecessary to make a pull request now. |
According to this steps, 3127126 links cost time as follow: infomap: 2.6.1 |
When clustering big networks using the Python API, I often find that the process of creating the network (adding links by
for
looping over my edge list and inserting them into the network usingim.add_link
orim.addLink
) takes more time than actually running Infomap.I assume it is slow because for every
im.addLink
method call, we send data to the_infomap
C++ module. Theim.add_links
method doesn't help much, it basically just loops overadd_link
. There is a method of_infomap
calledStateNetwork_addLink
which is called whenaddLink
is called. Ignorant about the workings of the C++ core, could there not be a method calledStateNetwork_addLinks
which let us pass e.g. a list of lists or a numpy array with edges into the_infomap
object in C++ such as to add a bulk of edges more efficiently?For the data I'm working with right now this would give about a 2x speedup.
The text was updated successfully, but these errors were encountered: