Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arguments for running Louvain clustering #915

Open
sg0 opened this issue Jun 10, 2022 · 4 comments
Open

Arguments for running Louvain clustering #915

sg0 opened this issue Jun 10, 2022 · 4 comments
Labels
❓ question Usage or code base related questions.

Comments

@sg0
Copy link

sg0 commented Jun 10, 2022

I am trying to run Louvain for the default 10 phases and a maximum of 100 iterations/phase. I am passing the following options:

$BIN_PATH/./louvain --advance-mode=ALL_EDGES --max-iters=100 --iter-th=1e-6 --pass-th=1e-6 --1st-th=1e-6 --graph-type=market --graph-file=$file

I had expected the program to exit when the criteria is satisfied (as in exit a phase and also exit when threshold of previous and current phase is less than or equal to 1E-06), but it seems the code runs for 100 phases and 100 iterations/phase. A snapshot of the output is provided below, please advise.

 48 --------------------------
 49 Run 0 elapsed: 98370.742188 ms, q = 0.661231
 50 ==============================================
 51 64bit-VertexT=false 64bit-SizeT=false 64bit-ValueT=true undirected=false unify-segments=0 advance-mode=ALL_EDGES omp-threads=0 1st-th=1e-6 neighborcomm-th=-1
 52 __________________________
 53 #threads = 16, 1st-th = 0.000001
 54 --------------------------
 55 Run 0 elapsed: 12942.558594 ms, q = 0.674211
 56 Community Validity: PASS
 57 Computed: #communities = 19.000000, modularity = 0.674211
 58 Reference: #communities = 43.000000, modularity = 0.661231
 59 Using advance mode ALL_EDGES
 60 Using filter mode CULL
 61 __________________________
 62 --------------------------
 63 Run 0 elapsed: 258337.101936 ms, #passes = 100
 64 Community Validity: PASS
 65 Computed: #communities = 4740.000000, modularity = 0.162310
 66 Reference: #communities = 43.000000, modularity = 0.661231
 67 [louvain] finished.
 68  avg. elapsed: 258337.101936 ms
 69  iterations: 100
 70  min. elapsed: 258337.101936 ms
 71  max. elapsed: 258337.101936 ms
 72  load time: 8893.48 ms
 73  preprocess time: 2305.970000 ms
 74  postprocess time: 2059.674978 ms
 75  total time: 262710.861921 ms
 76 -----------------------------------
 77 -----------------------------------

Also, another question: if I add --device=4 to the arguments above, then Louvain will run on 4 devices (as set by CUDA_VISIBLE_DEVICES), is that right?

@sg0 sg0 added the ❓ question Usage or code base related questions. label Jun 10, 2022
@crozhon
Copy link
Contributor

crozhon commented Jun 10, 2022

It looks like it's trying to do what you want (the comparison uses max-iters and threshold). You'll have to dig into the code to see what happens.

Looks like it's around here?

iter_num >= enactor.max_iters ||

@sg0
Copy link
Author

sg0 commented Jun 10, 2022

OK, but why #passes = 100 is showing up in the output above (it should be the default 10 isn't it)?
But aside from that, Louvain should stop when the threshold criteria is above the user specified threshold (otherwise, what is the point of passing a threshold).

(enactor.neighborcomm_threshold > 0 && iter_num != 1 &&

Instead of enactor.neighborcomm_threshold > 0 above, it should be enactor.neighborcomm_threshold >= iter-th it seems.

@crozhon
Copy link
Contributor

crozhon commented Jun 10, 2022

Yes, I only mean it's not intended behavior.

Instead of enactor.neighborcomm_threshold > 0 above, it should be enactor.neighborcomm_threshold >= iter-th it seems.

Thanks for catching.

@sg0
Copy link
Author

sg0 commented Jun 10, 2022

However, chances are the above suggestion will only save a few initial passes, if at all (exit iteration early); but this condition should mostly work all the time:

if ((pass_num != 0 && iter_gain[0] < enactor.iter_gain_threshold) ||

Now, looking at the output below, thinking about possible reasons why Louvain is running for all the #passes (note, #iters is only high for the first few phases, and diminishes with passes as expected):

  1. Something wrong with pass/phase exit criteria (a phase exit condition like if (pass_gain - iter_gain[0] <= pass_gain_threshold) seems alright). Since you are accumulating gains, I understand q value from the output below is erroneous (must be within 1), which is fine, but below output is with all thresholds 1E-4 (and --max-iters=100, not sure why max-iters assumes max passes), and if you subtract the pass gains of 99th and 98th passes, it is 3E-5, so it should not get to this point if pass exit criteria is correct.

  2. Slow gain growth (which does not match with the CPU reference), as in still reasonable difference in gains across iterations. Why is iter_gain[0] doubled?

...
159 pass 90, #v = 4280 -> 4279, #e = 27634800 -> 27526013, #iter = 2, q = 101.243059, pass_gain = 0.001350, elapsed = 539.814949
160 pass 91, #v = 4279 -> 4278, #e = 27526013 -> 27450275, #iter = 2, q = 101.244358, pass_gain = 0.001299, elapsed = 537.569046
161 pass 92, #v = 4278 -> 4277, #e = 27450275 -> 27347230, #iter = 2, q = 101.245642, pass_gain = 0.001284, elapsed = 535.786867
162 pass 93, #v = 4277 -> 4276, #e = 27347230 -> 27274500, #iter = 2, q = 101.246925, pass_gain = 0.001283, elapsed = 536.940098
163 pass 94, #v = 4276 -> 4275, #e = 27274500 -> 27204445, #iter = 2, q = 101.248201, pass_gain = 0.001277, elapsed = 533.104181
164 pass 95, #v = 4275 -> 4274, #e = 27204445 -> 27095860, #iter = 2, q = 101.249450, pass_gain = 0.001249, elapsed = 530.736923
165 pass 96, #v = 4274 -> 4273, #e = 27095860 -> 27023434, #iter = 2, q = 101.250696, pass_gain = 0.001246, elapsed = 528.774023
166 pass 97, #v = 4273 -> 4272, #e = 27023434 -> 26908417, #iter = 2, q = 101.251941, pass_gain = 0.001245, elapsed = 529.470921
167 pass 98, #v = 4272 -> 4271, #e = 26908417 -> 26784030, #iter = 2, q = 101.253179, pass_gain = 0.001238, elapsed = 523.810148
168 pass 99, #v = 4271 -> 4270, #e = 26784030 -> 26715493, #iter = 2, q = 101.254386, pass_gain = 0.001207, elapsed = 521.795034

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❓ question Usage or code base related questions.
Projects
None yet
Development

No branches or pull requests

2 participants