Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application gets stuck #8

Open
kartiklakhotia opened this issue Apr 28, 2018 · 2 comments
Open

Application gets stuck #8

kartiklakhotia opened this issue Apr 28, 2018 · 2 comments

Comments

@kartiklakhotia
Copy link

I am trying to use Gemini for graph processing on a single server. It gives this warning:

" A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces"

and then the program gets stuck (most likely outcome). Sometimes the program proceeds forward after this warning and execution finishes but that happens like 1 in 10 times. As I wish to run multiple experiments, I intend to use scripts and with the application getting stuck more often than not, it is difficult. Please let me know how to resolve this.

@coolerzxw
Copy link
Member

Hi. Can you run other simpler MPI programs (e.g. do some arithmetics and AllReduce the sum) correctly?

@xuyinghai
Copy link

xuyinghai commented May 13, 2018

My simple MPI hello world program with AllReduce works. But the following run doesn't work. Is it because MPI_THREAD_SERIALIZED?

$ mpirun -n 2 ./pagerank ../../inputs/soc-LiveJournal1.txt.bsnap 4847571 1
thread support level provided by MPI: MPI_THREAD_SERIALIZED
|V| = 4847571, |E| = 68475391
|V'_0| = 1261568 |E^dense_0| = 43518651
|V'_1| = 3586003 |E^dense_1| = 24956740
|V'_0_0| = 1261568 |E^dense_0_0| = 43518651
|V'_1_0| = 3586003 |E^dense_1_0| = 24956740
[yuxing-desk:01757] *** Process received signal ***
[yuxing-desk:01757] Signal: Segmentation fault (11)
[yuxing-desk:01757] Signal code: Address not mapped (1)
[yuxing-desk:01757] Failing at address: 0x14ec650
[yuxing-desk:01757] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7ff406b0d390]
[yuxing-desk:01757] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_frag+0x59)[0x7ff3f98dbc19]
[yuxing-desk:01757] [ 2] /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x93)[0x7ff3fa344813]
[yuxing-desk:01757] [ 3] /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so(+0x3abe)[0x7ff3fa344abe]
[yuxing-desk:01757] [ 4] /usr/lib/libopen-pal.so.13(opal_progress+0x4a)[0x7ff40602d1ea]
[yuxing-desk:01757] [ 5] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4c5)[0x7ff3f98d5745]
[yuxing-desk:01757] [ 6] /usr/lib/libmpi.so.12(PMPI_Send+0x14b)[0x7ff40753ccdb]
[yuxing-desk:01757] [ 7] ./pagerank[0x4151dd]
[yuxing-desk:01757] [ 8] ./pagerank[0x409db3]
[yuxing-desk:01757] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff406752830]
[yuxing-desk:01757] [10] ./pagerank[0x409f79]
[yuxing-desk:01757] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 1757 on node yuxing-desk exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants