Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble converging pagerank algorithm on any size dataset #13

Open
kulkarpo opened this issue Mar 12, 2020 · 7 comments
Open

Trouble converging pagerank algorithm on any size dataset #13

kulkarpo opened this issue Mar 12, 2020 · 7 comments

Comments

@kulkarpo
Copy link

Hello,
There is no error that i can log here. but I have followed the documentation as-is.
Test-machine - I have a small ubuntu instance with RAM 4.9GB, 2 VCPUs and 10GB ssd.

I have tried with LiveJournal and zachary's karate datasset(tiny dataset with 35 vertices),
sudo ./build/bin/pagerank ./data/zachary_grid 1 1

PageRank program does not converge/terminate.
Do you have an idea about what could I be missing?

@coolerzxw
Copy link
Member

Hi. Can you check whether the grid formatted data is correct or not? You may do this by printing inside the UDF of stream_edges.

@kulkarpo
Copy link
Author

Not entirely sure about validation of grid formatted data, i did the following -
-> checked if all the partitions are being read (begin and end vertex of each partitions) in UDF stream_edges ,
-> pre and post (being_vid, end_vid) pairs
Pagerank never gets until "post_source_window" to print the latter (graph.cpp:422)

another point of validation : bfs, wcc work as expected with the same grid.

looks like pagerank is stuck while waiting for threads (in stream_edges) to terminate (graph.cpp:419)

-Thanks

@coolerzxw
Copy link
Member

looks like pagerank is stuck while waiting for threads (in stream_edges) to terminate (graph.cpp:419)
What is the CPU usage when this happens?

@kulkarpo
Copy link
Author

looks like pagerank is stuck while waiting for threads (in stream_edges) to terminate (graph.cpp:419)
What is the CPU usage when this happens?

100%

@kulkarpo
Copy link
Author

just a thought : if inf value should cause any issues?

In my dataset i have degrees of few vertices as 0.
Because of which the pagerank initialisation values (pagerank.cpp:52) have inf value.

Making the initialisation of pagerank vector as (1/vertices) solves my issue

Correct me if i'm wrong - This scheme of initialisation is standard as far as I know (this makes pagerank values of nodes to sum up to 1)
But, if this is indeed the case it's strange it worked so far.

Please let me know your thoughts on this.

@coolerzxw
Copy link
Member

It appears that the initialization scheme causes the issue. Thanks for reporting this.

@dusollee22
Copy link

is there any version this issue is included? I have exactly same problem
there is no error if I add below code before 'stream_vertices()' (pagerank.cpp:50)
for(long i=0; i<(long)graph.vertices; i++) { degree[i] = (long)graph.vertices; }

am I doing right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants