Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce detectCommunity #1219

Closed
ychen983384 opened this issue May 14, 2024 · 5 comments
Closed

Reproduce detectCommunity #1219

ychen983384 opened this issue May 14, 2024 · 5 comments

Comments

@ychen983384
Copy link

Runs on the same graph generates different results. Is there a way to make this reproducible on the same graph?

first run

Communities = nk.community.detectCommunities(GData, algo=nk.community.PLM(G=GData,refine=True, gamma=0.5))
Communities detected in 1.21094 [s]
solution properties:


communities 10
min community size 12562
max community size 41579
avg. community size 24544.8
imbalance 1.69399
edge cut 404216
edge cut (portion) 0.054895
modularity 0.801823

second run

Communities = nk.community.detectCommunities(GData, algo=nk.community.PLM(G=GData,refine=True, gamma=0.5))
Communities detected in 1.18332 [s]
solution properties:


communities 11
min community size 13536
max community size 43623
avg. community size 22313.5
imbalance 1.95496
edge cut 426229
edge cut (portion) 0.0578845
modularity 0.808974

@clstaudt
Copy link
Collaborator

@ychen983384 Have you tried turning off parallelism?

PLM(G, par="none")

@ychen983384
Copy link
Author

ychen983384 commented May 22, 2024

@ychen983384 Have you tried turning off parallelism?

PLM(G, par="none")

@clstaudt
I have not tried that. The reason I tried networkit's PLM is because I have a large graph to work with (~15 million vertices) and hope the parallelism could shorten the runnig time for detect community. I would test turning off parallelism. Do you suggest that the results would not be reproducible with parallelism on?

@fabratu
Copy link
Member

fabratu commented May 23, 2024

Yes, with parallelism the order of execution (in terms of nodes moving from one cluster to the other) is not deterministic. Without parallelism you get the same result for consequetive runs:

import networkit as nk

G = nk.generators.BarabasiAlbertGenerator(5,100).generate()

for par_strat in ["none", "balanced"]:
    PLM = nk.community.PLM(G,par=par_strat)
    PLM.run()
    par1 = PLM.getPartition().getVector()

    PLM = nk.community.PLM(G,par=par_strat)
    PLM.run()
    par2 = PLM.getPartition().getVector()

    print(par1 == par2)

The output is True, False.

@clstaudt
Copy link
Collaborator

@ychen983384 As @fabratu has explained, the parallelism makes it nondeterministic. Turning off parallelism for your graph kind of defeats the purpose of using this particular algorithm. We never considered determinism to be an important property when building it.

@ychen983384
Copy link
Author

@ychen983384 As @fabratu has explained, the parallelism makes it nondeterministic. Turning off parallelism for your graph kind of defeats the purpose of using this particular algorithm. We never considered determinism to be an important property when building it.

@clstaudt @fabratu Thank you both for the clear explanation and the great tools you developed. Scientific community and journals have been emphasizing more and more on reproducibility of both computational and experimental research work which might make determinism more important. With determinism, the parallel tools you have developed such as PLM and parLeiden would be more attractive to the scientific community since huge data set come from single cell multi-omics technology. Is it feasible to implement these community detection algorithms (such as louvain and leiden) with both parallelism and determinism or it is theoretically impossible?

@networkit networkit locked and limited conversation to collaborators May 24, 2024
@fabratu fabratu converted this issue into discussion #1227 May 24, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants