Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in 3D classification in relion #1078

Open
SepidehV opened this issue Feb 6, 2024 · 5 comments
Open

Segmentation fault in 3D classification in relion #1078

SepidehV opened this issue Feb 6, 2024 · 5 comments

Comments

@SepidehV
Copy link

SepidehV commented Feb 6, 2024

This is a template for reporting bugs. Please fill in as much information as you can.

I am constantly getting segmentation fault while I am running 3D classification in relion 5. I tried running it on different nodes by changing the number of GPUs and MPIs that I request. I also tried it with and without blush regularization but I just keep having the same error. The 2D classification and 3D refinement run with the same particles without any issues. As soon as the 3D classification goes to iteration 2 I receive segmentation fault. Sometimes it starts the iteration 2 but it says that it takes 30-45 hours for that iteration.

Environment:

  • OS: Linux m3q005 3.10.0-1160.83.1.el7.x86_64 Test #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • MPI runtime:
  • RELION version relion5_beta
  • Memory: 400 or 800
  • GPU: A16 GPUs

Dataset:

  • Box size: 300 binned to 64
  • Pixel size: 1.07
  • Number of particles: 2268196
  • Description: A membrane protein in detergent with around 200 KDa

Job options:

  • Type of job: 3D classification
  • Number of MPI processes: I tried different MPI processes, 5,9,17
  • Number of threads:I used 1 threads this is how we have been told to run jobs in cluster. However, I also tried 5 MPI and 6 threads. I have access to 56 CPUS, 16 GPUs and 800 G memory.
  • I submit job via bash script. I requested 400 and 800 G memory.
  • Full command
    `which relion_refine_mpi` --o Class3D/job105/run --i Select/job088/particles.star --ref ../Project directory/InitialModel/job056/initial_model.mrc --ini_high 60 --dont_combine_weights_via_disc --pool 3 --pad 1  --ctf --iter 25 --tau2_fudge 4 --particle_diameter 200 --K 5 --flatten_solvent --zero_mask --blush  --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale  --j 1 --gpu ""  --pipeline_control Class3D/job105/
    

++++

Error message:

*** Error in `/usr/local/relion/5.0_beta-20240104-openmpi1.10.7-mlx/bin/relion_refine_mpi': free(): invalid next size (fast): 0x000000000ccaf5b0 ***
[m3q005:11354:0] Caught signal 11 (Segmentation fault)
======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x7fba8f1e6329]
/lib64/libcuda.so.1(+0x268493)[0x7fba8127d493]
/lib64/libcuda.so.1(+0x273c6a)[0x7fba81288c6a]
/lib64/libcuda.so.1(+0x262e46)[0x7fba81277e46]
/lib64/libpthread.so.0(+0x7ea5)[0x7fba9020eea5]
/lib64/libc.so.6(clone+0x6d)[0x7fba8f263b0d]
======= Memory map: ========
00400000-00c63000 r-xp 00000000 00:2a 12276904190 /usr/local/relion/5.0_beta-20240104-openmpi1.10.7-mlx/bin/relion_refine_mpi
00e63000-00e69000 r--p 00863000 00:2a 12276904190 /usr/local/relion/5.0_beta-20240104-openmpi1.10.7-mlx/bin/relion_refine_mpi
00e69000-00e6b000 rw-p 00869000 00:2a 12276904190 /usr/local/relion/5.0_beta-20240104-openmpi1.10.7-mlx/bin/relion_refine_mpi
00e6b000-00f42000 rw-p 00000000 00:00 0
0265c000-8b8e2000 rw-p 00000000 00:00 0 [heap]
200000000-200200000 ---p 00000000 00:00 0
200200000-200400000 rw-s 00000000 00:05 16656 /dev/nvidiactl
200400000-200600000 rw-s 00000000 00:05 68874 /dev/nvidia6
200600000-203600000 rw-s 00000000 00:05 16656 /dev/nvidiactl
203600000-203a00000 ---p 00000000 00:00 0
203a00000-203c00000 rw-s 00000000 00:05 16656 /dev/nvidiactl
203c00000-203e00000 rw-s 00000000 00:05 16656 /dev/nvidiactl
203e00000-204000000 rw-s 203e00000 00:05 91231 /dev/nvidia-uvm
204000000-204200000 rw-s 00000000 00:05 16656 /dev/nvidiactl
204200000-204400000 ---p 00000000 00:00 0
204400000-204600000 rw-s 00000000 00:05 16656 /dev/nvidiactl
204600000-a00200000 ---p 00000000 00:00 0
10000000000-10004000000 ---p 00000000 00:00 0
7fba36000000-7fba3a400000 ---p 00000000 00:00 0
7fba3a400000-7fba3a600000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba3a600000-7fba3a800000 rw-s 00000000 00:04 19572740 /dev/zero (deleted)
7fba3a800000-7fba3aa00000 rw-s 00000000 00:04 19572741 /dev/zero (deleted)
7fba3aa00000-7fba3b000000 ---p 00000000 00:00 0
7fba3b000000-7fba3b200000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba3b200000-7fba3b400000 rw-s 00000000 00:04 19572743 /dev/zero (deleted)
7fba3b400000-7fba3b733000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba3b733000-7fba3c000000 ---p 00000000 00:00 0
7fba3d63d000-7fba60000000 rw-p 00000000 00:00 0
7fba60000000-7fba70000000 ---p 00000000 00:00 0
7fba70e13000-7fba70e20000 rw-p 00000000 00:00 0
7fba714a7000-7fba714ce000 rw-p 00000000 00:00 0
7fba714d7000-7fba714fe000 rw-p 00000000 00:00 0
7fba71506000-7fba71520000 rw-p 00000000 00:00 0
7fba71530000-7fba7153d000 rw-p 00000000 00:00 0
7fba71550000-7fba7155d000 rw-p 00000000 00:00 0
7fba7157d000-7fba715be000 rw-p 00000000 00:00 0
7fba715c4000-7fba715de000 rw-p 00000000 00:00 0
7fba715e5000-7fba71619000 rw-p 00000000 00:00 0
7fba7161b000-7fba71705000 rw-p 00000000 00:00 0
7fba7170a000-7fba71765000 rw-p 00000000 00:00 0
7fba7193e000-7fba71958000 rw-p 00000000 00:00 0
7fba7195b000-7fba71968000 rw-p 00000000 00:00 0
7fba7196f000-7fba719bd000 rw-p 00000000 00:00 0
7fba719be000-7fba719cb000 rw-p 00000000 00:00 0
7fba719d2000-7fba71a06000 rw-p 00000000 00:00 0
7fba71a0b000-7fba71a4c000 rw-p 00000000 00:00 0
7fba71a4f000-7fba71a90000 rw-p 00000000 00:00 0
7fba71a97000-7fba71ad8000 rw-p 00000000 00:00 0
7fba71ad9000-7fba71af3000 rw-p 00000000 00:00 0
7fba71af9000-7fba71b20000 rw-p 00000000 00:00 0
7fba71b28000-7fba71b76000 rw-p 00000000 00:00 0
7fba71b81000-7fba71bc2000 rw-p 00000000 00:00 0
7fba71bc5000-7fba71c20000 rw-p 00000000 00:00 0
7fba71c22000-7fba71cbe000 rw-p 00000000 00:00 0
7fba71cc1000-7fba71d36000 rw-p 00000000 00:00 0
7fba71d41000-7fba71d9c000 rw-p 00000000 00:00 0
7fba71d9e000-7fba71e06000 rw-p 00000000 00:00 0
7fba71e0b000-7fba71e25000 rw-p 00000000 00:00 0
7fba71e2f000-7fba71ebe000 rw-p 00000000 00:00 0
7fba71ec3000-7fba71ed0000 rw-p 00000000 00:00 0
7fba71eda000-7fba71f35000 rw-p 00000000 00:00 0
7fba71f3d000-7fba71f71000 rw-p 00000000 00:00 0
7fba71f7a000-7fba71f94000 rw-p 00000000 00:00 0
7fba71fa0000-7fba71fd4000 rw-p 00000000 00:00 0
7fba71fdf000-7fba71fec000 rw-p 00000000 00:00 0
7fba71fee000-7fba7208a000 rw-p 00000000 00:00 0
7fba7208c000-7fba72142000 rw-p 00000000 00:00 0
7fba72144000-7fba7219f000 rw-p 00000000 00:00 0
7fba721a3000-7fba721d7000 rw-p 00000000 00:00 0
7fba721e3000-7fba72217000 rw-p 00000000 00:00 0
7fba72221000-7fba72248000 rw-p 00000000 00:00 0
7fba7224d000-7fba7229b000 rw-p 00000000 00:00 0
7fba722a6000-7fba722e7000 rw-p 00000000 00:00 0
7fba722f1000-7fba72318000 rw-p 00000000 00:00 0
7fba7231d000-7fba72385000 rw-p 00000000 00:00 0
7fba7238a000-7fba723b1000 rw-p 00000000 00:00 0
7fba723b4000-7fba723db000 rw-p 00000000 00:00 0
7fba723dd000-7fba7241e000 rw-p 00000000 00:00 0
7fba7241f000-7fba72439000 rw-p 00000000 00:00 0
7fba7243b000-7fba72462000 rw-p 00000000 00:00 0
7fba7246e000-7fba724a2000 rw-p 00000000 00:00 0
7fba724a6000-7fba724da000 rw-p 00000000 00:00 0
7fba724e1000-7fba724ee000 rw-p 00000000 00:00 0
7fba724f5000-7fba7251c000 rw-p 00000000 00:00 0
7fba7251d000-7fba7256b000 rw-p 00000000 00:00 0
7fba72574000-7fba72581000 rw-p 00000000 00:00 0
7fba72587000-7fba725e2000 rw-p 00000000 00:00 0
7fba725e8000-7fba725f5000 rw-p 00000000 00:00 0
7fba725f7000-7fba72679000 rw-p 00000000 00:00 0
7fba72681000-7fba72710000 rw-p 00000000 00:00 0
7fba72716000-7fba72730000 rw-p 00000000 00:00 0
7fba72738000-7fba72745000 rw-p 00000000 00:00 0
7fba72746000-7fba727c8000 rw-p 00000000 00:00 0
7fba727cb000-7fba72833000 rw-p 00000000 00:00 0
7fba72834000-7fba72841000 rw-p 00000000 00:00 0
7fba7284d000-7fba728c2000 rw-p 00000000 00:00 0
7fba728cc000-7fba728d9000 rw-p 00000000 00:00 0
7fba728e1000-7fba72970000 rw-p 00000000 00:00 0
7fba72974000-7fba729e9000 rw-p 00000000 00:00 0
7fba729ef000-7fba72a09000 rw-p 00000000 00:00 0
7fba72a12000-7fba72a6d000 rw-p 00000000 00:00 0
7fba72a79000-7fba72aad000 rw-p 00000000 00:00 0
7fba72ab5000-7fba72ac2000 rw-p 00000000 00:00 0
7fba72ac3000-7fba72add000 rw-p 00000000 00:00 0
7fba72ae3000-7fba72b31000 rw-p 00000000 00:00 0
7fba72b39000-7fba72b7a000 rw-p 00000000 00:00 0
7fba72b81000-7fba72bc2000 rw-p 00000000 00:00 0
7fba72bc4000-7fba72c60000 rw-p 00000000 00:00 0
7fba72c6a000-7fba72c91000 rw-p 00000000 00:00 0
7fba72c9c000-7fba72d11000 rw-p 00000000 00:00 0
7fba72d17000-7fba72d3e000 rw-p 00000000 00:00 0
7fba72d47000-7fba74d89000 rw-p 00000000 00:00 0
7fba74d8f000-7fba74da9000 rw-p 00000000 00:00 0
7fba74dad000-7fba74de1000 rw-p 00000000 00:00 0
7fba74de7000-7fba74e01000 rw-p 00000000 00:00 0
7fba74e0c000-7fba74e4d000 rw-p 00000000 00:00 0
7fba74e54000-7fba74e61000 rw-p 00000000 00:00 0
7fba74e68000-7fba74e9c000 rw-p 00000000 00:00 0
7fba74e9f000-7fba74ee0000 rw-p 00000000 00:00 0
7fba74ee2000-7fba74f71000 rw-p 00000000 00:00 0
7fba74f78000-7fba74f92000 rw-p 00000000 00:00 0
7fba74f9a000-7fba74fa7000 rw-p 00000000 00:00 0
7fba74fae000-7fba74fe2000 rw-p 00000000 00:00 0
7fba74fe6000-7fba7501a000 rw-p 00000000 00:00 0
7fba7501c000-7fba75029000 rw-p 00000000 00:00 0
7fba75030000-7fba7504a000 rw-p 00000000 00:00 0
7fba7504b000-7fba75058000 rw-p 00000000 00:00 0
7fba7505f000-7fba750ad000 rw-p 00000000 00:00 0
7fba750af000-7fba75158000 rw-p 00000000 00:00 0
7fba75161000-7fba751a2000 rw-p 00000000 00:00 0
7fba751a3000-7fba75266000 rw-p 00000000 00:00 0
7fba75267000-7fba75274000 rw-p 00000000 00:00 0
7fba7527a000-7fba75294000 rw-p 00000000 00:00 0
7fba7529d000-7fba75346000 rw-p 00000000 00:00 0
7fba75348000-7fba753b0000 rw-p 00000000 00:00 0
7fba753b5000-7fba753c2000 rw-p 00000000 00:00 0
7fba753c7000-7fba753ee000 rw-p 00000000 00:00 0
7fba753f2000-7fba7544d000 rw-p 00000000 00:00 0
7fba75453000-7fba75487000 rw-p 00000000 00:00 0
7fba7548c000-7fba754a6000 rw-p 00000000 00:00 0
7fba754aa000-7fba754f8000 rw-p 00000000 00:00 0
7fba75502000-7fba7550f000 rw-p 00000000 00:00 0
7fba75518000-7fba75525000 rw-p 00000000 00:00 0
7fba7552e000-7fba75589000 rw-p 00000000 00:00 0
7fba75589000-7fba7558a000 ---p 00000000 00:00 0
7fba7558a000-7fba77d8b000 rw-p 00000000 00:00 0
7fba77d92000-7fba77d9f000 rw-p 00000000 00:00 0
7fba77daa000-7fba77dc4000 rw-p 00000000 00:00 0
7fba77dc8000-7fba77e16000 rw-p 00000000 00:00 0
7fba77e1e000-7fba77e2b000 rw-p 00000000 00:00 0
7fba77e2f000-7fba77e3c000 rw-p 00000000 00:00 0
7fba77e42000-7fba77e5c000 rw-p 00000000 00:00 0
7fba77e67000-7fba77e74000 rw-p 00000000 00:00 0
7fba77e7b000-7fba77eaf000 rw-p 00000000 00:00 0
7fba77eb6000-7fba77ec3000 rw-p 00000000 00:00 0
7fba77ecc000-7fba78b79000 rw-p 00000000 00:00 0
7fba78b84000-7fba78bb8000 rw-p 00000000 00:00 0
7fba78bbc000-7fba78c31000 rw-p 00000000 00:00 0
7fba78c38000-7fba78c93000 rw-p 00000000 00:00 0
7fba78c9c000-7fba78ca9000 rw-p 00000000 00:00 0
7fba78cb2000-7fba78ce6000 rw-p 00000000 00:00 0
7fba78cee000-7fba78d08000 rw-p 00000000 00:00 0
7fba78d0a000-7fba78d4b000 rw-p 00000000 00:00 0
7fba78d52000-7fba78d6c000 rw-p 00000000 00:00 0
7fba78d6f000-7fba78da3000 rw-p 00000000 00:00 0
7fba78daf000-7fba78ee0000 rw-p 00000000 00:00 0
7fba78eea000-7fba7917c000 rw-p 00000000 00:00 0
7fba79181000-7fba7929c000 rw-p 00000000 00:00 0
7fba7929c000-7fba7949c000 rw-s 00000000 00:04 19572742 /dev/zero (deleted)
7fba7949c000-7fba79660000 rw-p 00000000 00:00 0
7fba79662000-7fba79689000 rw-p 00000000 00:00 0
7fba7968b000-7fba79700000 rw-p 00000000 00:00 0
7fba79700000-7fba797d5000 rw-p 00000000 00:00 0
7fba797d5000-7fba79800000 ---p 00000000 00:00 0
7fba79806000-7fba79861000 rw-p 00000000 00:00 0
7fba7993c000-7fba799a0000 rw-p 00000000 00:00 0
7fba799a9000-7fba799b6000 rw-p 00000000 00:00 0
7fba799ba000-7fba799d4000 rw-p 00000000 00:00 0
7fba799d9000-7fba799f3000 rw-p 00000000 00:00 0
7fba799f5000-7fba79a29000 rw-p 00000000 00:00 0
7fba79a2b000-7fba79aa0000 rw-p 00000000 00:00 0
7fba79ba1000-7fba79c5d000 rw-p 00000000 00:00 0
7fba79c63000-7fba79c70000 rw-p 00000000 00:00 0
7fba79cb9000-7fba79cc8000 rw-p 00000000 00:00 0
7fba79cc8000-7fba79cc9000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba79cc9000-7fba79cca000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba79cca000-7fba79ccb000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba79ccb000-7fba79ccc000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba79ccc000-7fba79ccd000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba79ccd000-7fba79cce000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba79cce000-7fba79ccf000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba79ccf000-7fba79cd0000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fba79cd0000-7fba79d52000 rw-p 00000000 00:00 0
7fba79d52000-7fba7a573000 rw-s 00000000 00:04 11370765 /SYSV00000000 (deleted)
7fba7a573000-7fba7a773000 rw-s 00000000 00:04 11370762 /SYSV00000000 (deleted)
7fba7a773000-7fba7a774000 ---p 00000000 00:00 0
7fba7a774000-7fba7b015000 rw-p 00000000 00:00 0
7fba7b015000-7fba81015000 ---p 00000000 00:00 0
7fba81015000-7fba823f3000 r-xp 00000000 fd:01 859126 /usr/lib64/libcuda.so.510.85.02
7fba823f3000-7fba825f3000 ---p 013de000 fd:01 859126 /usr/lib64/libcuda.so.510.85.02
7fba825f3000-7fba826ef000 r--p 013de000 fd:01 859126 /usr/lib64/libcuda.so.510.85.02
7fba826ef000-7fba827fa000 rw-p 014da000 fd:01 859126 /usr/lib64/libcuda.so.510.85.02
7fba827fa000-7fba82860000 rw-p 00000000 00:00 0
7fba82860000-7fba83081000 rw-s 00000000 00:04 11370726 /SYSV00000000 (deleted)
7fba83081000-7fba83281000 rw-s 00000000 00:04 11370721 /SYSV00000000 (deleted)
7fba83281000-7fba83aa2000 rw-s 00000000 00:04 11370714 /SYSV00000000 (deleted)
7fba83aa2000-7fba83ca2000 rw-s 00000000 00:04 11370710 /SYSV00000000 (deleted)
7fba83ca2000-7fba844c3000 rw-s 00000000 00:04 11370761 /SYSV00000000 (deleted)
7fba844c3000-7fba846c3000 rw-s 00000000 00:04 11370759 /SYSV00000000 (deleted)
7fba846c3000-7fba84ee4000 rw-s 00000000 00:04 11370749 /SYSV00000000 (deleted)
7fba84ee4000-7fba850e4000 rw-s 00000000 00:04 11370747 /SYSV00000000 (deleted)
7fba850e4000-7fba85905000 rw-s 00000000 00:04 11370783 /SYSV00000000 (deleted)
7fba85905000-7fba85b05000 rw-s 00000000 00:04 11370780 /SYSV00000000 (deleted)
7fba85b05000-7fba86326000 rw-s 00000000 00:04 11370751 /SYSV00000000 (deleted)
7fba86326000-7fba86b47000 rw-s 00000000 00:04 11370735 /SYSV00000000 (deleted)
7fba86b47000-7fba86d47000 rw-s 00000000 00:04 11370750 /SYSV00000000 (deleted)
7fba86d47000-7fba87568000 rw-s 00000000 00:04 11370754 /SYSV00000000 (deleted)
7fba87568000-7fba87768000 rw-s 00000000 00:04 11370753 /SYSV00000000 (deleted)
7fba87768000-7fba87968000 rw-s 00000000 00:04 11370729 /SYSV00000000 (deleted)
7fba87968000-7fba88189000 rw-s 00000000 00:04 11370720 /SYSV00000000 (deleted)
7fba88189000-7fba88389000 rw-s 00000000 00:04 11370715 /SYSV00000000 (deleted)
7fba88389000-7fba88baa000 rw-s 00000000 00:04 11370746 /SYSV00000000 (deleted)
7fba88baa000-7fba893cb000 rw-s 00000000 00:04 11370758 /SYSV00000000 (deleted)
7fba893cb000-7fba895cb000 rw-s 00000000 00:04 11370755 /SYSV00000000 (deleted)
7fba895cb000-7fba897cb000 rw-s 00000000 00:04 11370744 /SYSV00000000 (deleted)
7fba897cb000-7fba899cc000 rw-p 00000000 00:00 0
7fba899cc000-7fba8a1ed000 rw-s 00000000 00:04 11370774 /SYSV00000000 (deleted)
7fba8a1ed000-7fba8a3ed000 rw-s 00000000 00:04 11370769 /SYSV00000000 (deleted)
7fba8a3ed000-7fba8a40f000 r-xp 00000000 fd:01 814445 /usr/lib64/libmlx4-rdmav2.so
7fba8a40f000-7fba8a60f000 ---p 00022000 fd:01 814445 /usr/lib64/libmlx4-rdmav2.so
7fba8a60f000-7fba8a610000 r--p 00022000 fd:01 814445 /usr/lib64/libmlx4-rdmav2.so
7fba8a610000-7fba8a612000 rw-p 00023000 fd:01 814445 /usr/lib64/libmlx4-rdmav2.so
7fba8a612000-7fba8a61c000 r-xp 00000000 fd:01 809295 /usr/lib64/libnuma.so.1.0.0
7fba8a61c000-7fba8a81c000 ---p 0000a000 fd:01 809295 /usr/lib64/libnuma.so.1.0.0
7fba8a81c000-7fba8a81d000 r--p 0000a000 fd:01 809295 /usr/lib64/libnuma.so.1.0.0
7fba8a81d000-7fba8a81e000 rw-p 0000b000 fd:01 809295 /usr/lib64/libnuma.so.1.0.0
7fba8a81e000-7fba8a877000 r-xp 00000000 fd:01 814449 /usr/lib64/libmlx5.so.1.0.0
7fba8a877000-7fba8aa76000 ---p 00059000 fd:01 814449 /usr/lib64/libmlx5.so.1.0.0
7fba8aa76000-7fba8aa77000 r--p 00058000 fd:01 814449 /usr/lib64/libmlx5.so.1.0.0
7fba8aa77000-7fba8aa79000 rw-p 00059000 fd:01 814449 /usr/lib64/libmlx5.so.1.0.0
7fba8aa79000-7fba8aa7a000 ---p 00000000 00:00 0
7fba8aa7a000-7fba8b27a000 rw-p 00000000 00:00 0
7fba8b27a000-7fba8bae5000 r--s 00000000 fd:00 1966458 /var/lib/sss/mc/passwd
7fba8bae5000-7fba8baed000 r-xp 00000000 fd:01 815006 /usr/lib64/libnss_sss.so.2
7fba8baed000-7fba8bcec000 ---p 00008000 fd:01 815006 /usr/lib64/libnss_sss.so.2
7fba8bcec000-7fba8bced000 r--p 00007000 fd:01 815006 /usr/lib64/libnss_sss.so.2
7fba8bced000-7fba8bcee000 rw-p 00008000 fd:01 815006 /usr/lib64/libnss_sss.so.2
7fba8bcee000-7fba8bcfa000 r-xp 00000000 fd:01 805303 /usr/lib64/libnss_files-2.17.so
7fba8bcfa000-7fba8bef9000 ---p 0000c000 fd:01 805303 /usr/lib64/libnss_files-2.17.so
7fba8bef9000-7fba8befa000 r--p 0000b000 fd:01 805303 /usr/lib64/libnss_files-2.17.so
7fba8befa000-7fba8befb000 rw-p 0000c000 fd:01 805303 /usr/lib64/libnss_files-2.17.so
7fba8befb000-7fba8bf01000 rw-p 00000000 00:00 0
7fba8bf01000-7fba8bf02000 ---p 00000000 00:00 0
7fba8bf02000-7fba8c702000 rw-p 00000000 00:00 0
7fba8c702000-7fba8c718000 r-xp 00000000 fd:01 805313 /usr/lib64/libresolv-2.17.so
7fba8c718000-7fba8c918000 ---p 00016000 fd:01 805313 /usr/lib64/libresolv-2.17.so
7fba8c918000-7fba8c919000 r--p 00016000 fd:01 805313 /usr/lib64/libresolv-2.17.so
7fba8c919000-7fba8c91a000 rw-p 00017000 fd:01 805313 /usr/lib64/libresolv-2.17.so
7fba8c91a000-7fba8c91c000 rw-p 00000000 00:00 0
7fba8c91c000-7fba8caea000 r-xp 00000000 fd:01 1722627 /opt/slurm-23.11.1/lib/slurm/libslurm_pmi.so
7fba8caea000-7fba8cce9000 ---p 001ce000 fd:01 1722627 /opt/slurm-23.11.1/lib/slurm/libslurm_pmi.so
7fba8cce9000-7fba8ccec000 r--p 001cd000 fd:01 1722627 /opt/slurm-23.11.1/lib/slurm/libslurm_pmi.so
7fba8ccec000-7fba8ccf9000 rw-p 001d0000 fd:01 1722627 /opt/slurm-23.11.1/lib/slurm/libslurm_pmi.so
7fba8ccf9000-7fba8ccff000 rw-p 00000000 00:00 0
7fba8ccff000-7fba8cd1d000 r-xp 00000000 fd:01 812392 /usr/lib64/libnl-3.so.200.23.0
7fba8cd1d000-7fba8cf1d000 ---p 0001e000 fd:01 812392 /usr/lib64/libnl-3.so.200.23.0
7fba8cf1d000-7fba8cf1f000 r--p 0001e000 fd:01 812392 /usr/lib64/libnl-3.so.200.23.0
7fba8cf1f000-7fba8cf20000 rw-p 00020000 fd:01 812392 /usr/lib64/libnl-3.so.200.23.0
7fba8cf20000-7fba8cf84000 r-xp 00000000 fd:01 809303 /usr/lib64/libnl-route-3.so.200.23.0
7fba8cf84000-7fba8d183000 ---p 00064000 fd:01 809303 /usr/lib64/libnl-route-3.so.200.23.0
7fba8d183000-7fba8d186000 r--p 00063000 fd:01 809303 /usr/lib64/libnl-route-3.so.200.23.0
7fba8d186000-7fba8d18b000 rw-p 00066000 fd:01 809303 /usr/lib64/libnl-route-3.so.200.23.0
7fba8d18b000-7fba8d18d000 rw-p 00000000 00:00 0
7fba8d18d000-7fba8d195000 r-xp 00000000 fd:01 814457 /usr/lib64/libibumad.so.3.2.0
7fba8d195000-7fba8d395000 ---p 00008000 fd:01 814457 /usr/lib64/libibumad.so.3.2.0
7fba8d395000-7fba8d396000 r--p 00008000 fd:01 814457 /usr/lib64/libibumad.so.3.2.0
7fba8d396000-7fba8d397000 rw-p 00009000 fd:01 814457 /usr/lib64/libibumad.so.3.2.0
7fba8d397000-7fba8d3a0000 r-xp 00000000 fd:01 813213 /usr/lib64/libjbig.so.2.0
7fba8d3a0000-7fba8d59f000 ---p 00009000 fd:01 813213 /usr/lib64/libjbig.so.2.0
7fba8d59f000-7fba8d5a0000 r--p 00008000 fd:01 813213 /usr/lib64/libjbig.so.2.0
7fba8d5a0000-7fba8d5a3000 rw-p 00009000 fd:01 813213 /usr/lib64/libjbig.so.2.0
7fba8d5a3000-7fba8d5a5000 r-xp 00000000 fd:01 805319 /usr/lib64/libutil-2.17.so
7fba8d5a5000-7fba8d7a4000 ---p 00002000 fd:01 805319 /usr/lib64/libutil-2.17.so
7fba8d7a4000-7fba8d7a5000 r--p 00001000 fd:01 805319 /usr/lib64/libutil-2.17.so
7fba8d7a5000-7fba8d7a6000 rw-p 00002000 fd:01 805319 /usr/lib64/libutil-2.17.so
7fba8d7a6000-7fba8d7af000 r-xp 00000000 fd:01 813810 /usr/lib64/libpciaccess.so.0.11.1
7fba8d7af000-7fba8d9ae000 ---p 00009000 fd:01 813810 /usr/lib64/libpciaccess.so.0.11.1
7fba8d9ae000-7fba8d9af000 r--p 00008000 fd:01 813810 /usr/lib64/libpciaccess.so.0.11.1
7fba8d9af000-7fba8d9b0000 rw-p 00009000 fd:01 813810 /usr/lib64/libpciaccess.so.0.11.1
7fba8d9b0000-7fba8db7e000 r-xp 00000000 fd:01 1722619 /opt/slurm-23.11.1/lib/libslurm.so.40.0.0
7fba8db7e000-7fba8dd7d000 ---p 001ce000 fd:01 1722619 /opt/slurm-23.11.1/lib/libslurm.so.40.0.0
7fba8dd7d000-7fba8dd80000 r--p 001cd000 fd:01 1722619 /opt/slurm-23.11.1/lib/libslurm.so.40.0.0
7fba8dd80000-7fba8dd8d000 rw-p 001d0000 fd:01 1722619 /opt/slurm-23.11.1/lib/libslurm.so.40.0.0
7fba8dd8d000-7fba8dd93000 rw-p 00000000 00:00 0
7fba8dd93000-7fba8dd98000 r-xp 00000000 fd:01 1722940 /opt/slurm-23.11.1/lib/libpmi.so.0.0.0
7fba8dd98000-7fba8df97000 ---p 00005000 fd:01 1722940 /opt/slurm-23.11.1/lib/libpmi.so.0.0.0
7fba8df97000-7fba8df98000 r--p 00004000 fd:01 1722940 /opt/slurm-23.11.1/lib/libpmi.so.0.0.0
7fba8df98000-7fba8df99000 rw-p 00005000 fd:01 1722940 /opt/slurm-23.11.1/lib/libpmi.so.0.0.0
7fba8df99000-7fba8e075000 r-xp 00000000 00:2a 4759252814 /usr/local/openmpi/1.10.7-mlx/lib/libopen-pal.so.13.0.4
7fba8e075000-7fba8e275000 ---p 000dc000 00:2a 4759252814 /usr/local/openmpi/1.10.7-mlx/lib/libopen-pal.so.13.0.4
7fba8e275000-7fba8e279000 r--p 000dc000 00:2a 4759252814 /usr/local/openmpi/1.10.7-mlx/lib/libopen-pal.so.13.0.4
7fba8e279000-7fba8e27f000 rw-p 000e0000 00:2a 4759252814 /usr/local/openmpi/1.10.7-mlx/lib/libopen-pal.so.13.0.4
7fba8e27f000-7fba8e285000 rw-p 00000000 00:00 0
7fba8e285000-7fba8e363000 r-xp 00000000 00:2a 4759252817 /usr/local/openmpi/1.10.7-mlx/lib/libopen-rte.so.12.0.4
7fba8e363000-7fba8e562000 ---p 000de000 00:2a 4759252817 /usr/local/openmpi/1.10.7-mlx/lib/libopen-rte.so.12.0.4
7fba8e562000-7fba8e564000 r--p 000dd000 00:2a 4759252817 /usr/local/openmpi/1.10.7-mlx/lib/libopen-rte.so.12.0.4
7fba8e564000-7fba8e56d000 rw-p 000df000 00:2a 4759252817 /usr/local/openmpi/1.10.7-mlx/lib/libopen-rte.so.12.0.4
7fba8e56d000-7fba8e56f000 rw-p 00000000 00:00 0
7fba8e56f000-7fba8e587000 r-xp 00000000 fd:01 814253 /usr/lib64/libibverbs.so.1.0.0
7fba8e587000-7fba8e786000 ---p 00018000 fd:01 814253 /usr/lib64/libibverbs.so.1.0.0
7fba8e786000-7fba8e787000 r--p 00017000 fd:01 814253 /usr/lib64/libibverbs.so.1.0.0
7fba8e787000-7fba8e788000 rw-p 00018000 fd:01 814253 /usr/lib64/libibverbs.so.1.0.0
7fba8e788000-7fba8e79d000 r-xp 00000000 fd:01 805591 /usr/lib64/libz.so.1.2.7
7fba8e79d000-7fba8e99c000 ---p 00015000 fd:01 805591 /usr/lib64/libz.so.1.2.7
7fba8e99c000-7fba8e99d000 r--p 00014000 fd:01 805591 /usr/lib64/libz.so.1.2.7
7fba8e99d000-7fba8e99e000 rw-p 00015000 fd:01 805591 /usr/lib64/libz.so.1.2.7
7fba8e99e000-7fba8eb13000 r-xp 00000000 fd:01 1707330 /opt/mellanox/mxm/lib/libmxm.so.2.0.32
7fba8eb13000-7fba8ed12000 ---p 00175000 fd:01 1707330 /opt/mellanox/mxm/lib/libmxm.so.2.0.32
7fba8ed12000-7fba8ed25000 r--p 00174000 fd:01 1707330 /opt/mellanox/mxm/lib/libmxm.so.2.0.32
7fba8ed25000-7fba8ed32000 rw-p 00187000 fd:01 1707330 /opt/mellanox/mxm/lib/libmxm.so.2.0.32
7fba8ed32000-7fba8ed38000 rw-p 00000000 00:00 0
7fba8ed38000-7fba8ed48000 r-xp 00000000 fd:01 814594 /usr/lib64/libosmcomp.so.3.1.0
7fba8ed48000-7fba8ef47000 ---p 00010000 fd:01 814594 /usr/lib64/libosmcomp.so.3.1.0
7fba8ef47000-7fba8ef48000 r--p 0000f000 fd:01 814594 /usr/lib64/libosmcomp.so.3.1.0
7fba8ef48000-7fba8ef49000 rw-p 00010000 fd:01 814594 /usr/lib64/libosmcomp.so.3.1.0
7fba8ef49000-7fba8ef63000 r-xp 00000000 fd:01 814513 /usr/lib64/librdmacm.so.1.0.0
7fba8ef63000-7fba8f162000 ---p 0001a000 fd:01 814513 /usr/lib64/librdmacm.so.1.0.0
7fba8f162000-7fba8f163000 r--p 00019000 fd:01 814513 /usr/lib64/librdmacm.so.1.0.0
7fba8f163000-7fba8f164000 rw-p 0001a000 fd:01 814513 /usr/lib64/librdmacm.so.1.0.0
7fba8f164000-7fba8f165000 rw-p 00000000 00:00 0
7fba8f165000-7fba8f329000 r-xp 00000000 fd:01 805285 /usr/lib64/libc-2.17.so
7fba8f329000-7fba8f528000 ---p 001c4000 fd:01 805285 /usr/lib64/libc-2.17.so
7fba8f528000-7fba8f52c000 r--p 001c3000 fd:01 805285 /usr/lib64/libc-2.17.so
7fba8f52c000-7fba8f52e000 rw-p 001c7000 fd:01 805285 /usr/lib64/libc-2.17.so
7fba8f52e000-7fba8f533000 rw-p 00000000 00:00 0
7fba8f533000-7fba8f54a000 r-xp 00000000 00:2a 19499663793 /usr/local/gcc/8.1.0/lib64/libgcc_s.so.1
7fba8f54a000-7fba8f749000 ---p 00017000 00:2a 19499663793 /usr/local/gcc/8.1.0/lib64/libgcc_s.so.1
7fba8f749000-7fba8f74a000 r--p 00016000 00:2a 19499663793 /usr/local/gcc/8.1.0/lib64/libgcc_s.so.1
7fba8f74a000-7fba8f74b000 rw-p 00017000 00:2a 19499663793 /usr/local/gcc/8.1.0/lib64/libgcc_s.so.1
7fba8f74b000-7fba8f778000 r-xp 00000000 00:2a 19499663800 /usr/local/gcc/8.1.0/lib64/libgomp.so.1.0.0
7fba8f778000-7fba8f977000 ---p 0002d000 00:2a 19499663800 /usr/local/gcc/8.1.0/lib64/libgomp.so.1.0.0
7fba8f977000-7fba8f978000 r--p 0002c000 00:2a 19499663800 /usr/local/gcc/8.1.0/lib64/libgomp.so.1.0.0
7fba8f978000-7fba8f979000 rw-p 0002d000 00:2a 19499663800 /usr/local/gcc/8.1.0/lib64/libgomp.so.1.0.0
7fba8f979000-7fba8fa7a000 r-xp 00000000 fd:01 805293 /usr/lib64/libm-2.17.so
7fba8fa7a000-7fba8fc79000 ---p 00101000 fd:01 805293 /usr/lib64/libm-2.17.so
7fba8fc79000-7fba8fc7a000 r--p 00100000 fd:01 805293 /usr/lib64/libm-2.17.so
7fba8fc7a000-7fba8fc7b000 rw-p 00101000 fd:01 805293 /usr/lib64/libm-2.17.so
7fba8fc7b000-7fba8fdef000 r-xp 00000000 00:2a 19499716756 /usr/local/gcc/8.1.0/lib64/libstdc++.so.6.0.25
7fba8fdef000-7fba8ffef000 ---p 00174000 00:2a 19499716756 /usr/local/gcc/8.1.0/lib64/libstdc++.so.6.0.25
7fba8ffef000-7fba8fff9000 r--p 00174000 00:2a 19499716756 /usr/local/gcc/8.1.0/lib64/libstdc++.so.6.0.25
7fba8fff9000-7fba8fffb000 rw-p 0017e000 00:2a 19499716756 /usr/local/gcc/8.1.0/lib64/libstdc++.so.6.0.25
7fba8fffb000-7fba8ffff000 rw-p 00000000 00:00 0
7fba8ffff000-7fba90006000 r-xp 00000000 fd:01 805315 /usr/lib64/librt-2.17.so
7fba90006000-7fba90205000 ---p 00007000 fd:01 805315 /usr/lib64/librt-2.17.so
7fba90205000-7fba90206000 r--p 00006000 fd:01 805315 /usr/lib64/librt-2.17.so
7fba90206000-7fba90207000 rw-p 00007000 fd:01 805315 /usr/lib64/librt-2.17.so
7fba90207000-7fba9021e000 r-xp 00000000 fd:01 805311 /usr/lib64/libpthread-2.17.so
7fba9021e000-7fba9041d000 ---p 00017000 fd:01 805311 /usr/lib64/libpthread-2.17.so
7fba9041d000-7fba9041e000 r--p 00016000 fd:01 805311 /usr/lib64/libpthread-2.17.so
7fba9041e000-7fba9041f000 rw-p 00017000 fd:01 805311 /usr/lib64/libpthread-2.17.so
7fba9041f000-7fba90423000 rw-p 00000000 00:00 0
7fba90423000-7fba90466000 r-xp 00000000 fd:01 811397 /usr/lib64/libjpeg.so.62.1.0
7fba90466000-7fba90666000 ---p 00043000 fd:01 811397 /usr/lib64/libjpeg.so.62.1.0
7fba90666000-7fba90667000 r--p 00043000 fd:01 811397 /usr/lib64/libjpeg.so.62.1.0
7fba90667000-7fba90668000 rw-p 00044000 fd:01 811397 /usr/lib64/libjpeg.so.62.1.0
7fba90668000-7fba90678000 rw-p 00000000 00:00 0
7fba90678000-7fba906a1000 r-xp 00000000 fd:01 811586 /usr/lib64/libpng15.so.15.13.0
7fba906a1000-7fba908a1000 ---p 00029000 fd:01 811586 /usr/lib64/libpng15.so.15.13.0
7fba908a1000-7fba908a2000 r--p 00029000 fd:01 811586 /usr/lib64/libpng15.so.15.13.0
7fba908a2000-7fba908a3000 rw-p 0002a000 fd:01 811586 /usr/lib64/libpng15.so.15.13.0
7fba908a3000-7fba94423000 r-xp 00000000 00:2a 13958194804 /usr/local/cuda/11.4/targets/x86_64-linux/lib/libcurand.so.10.2.5.100
7fba94423000-7fba94623000 ---p 03b80000 00:2a 13958194804 /usr/local/cuda/11.4/targets/x86_64-linux/lib/libcurand.so.10.2.5.100
7fba94623000-7fba94629000 r--p 03b80000 00:2a 13958194804 /usr/local/cuda/11.4/targets/x86_64-linux/lib/libcurand.so.10.2.5.100
7fba94629000-7fba95a1a000 rw-p 03b86000 00:2a 13958194804 /usr/local/cuda/11.4/targets/x86_64-linux/lib/libcurand.so.10.2.5.100
7fba95a1a000-7fba96052000 rw-p 00000000 00:00 0
7fba96052000-7fba96166000 r-xp 00000000 00:2a 10867571108 /usr/local/fftw/3.3.4-gcc/lib/libfftw3f.so.3.4.4
7fba96166000-7fba96365000 ---p 00114000 00:2a 10867571108 /usr/local/fftw/3.3.4-gcc/lib/libfftw3f.so.3.4.4
7fba96365000-7fba9636c000 r--p 00113000 00:2a 10867571108 /usr/local/fftw/3.3.4-gcc/lib/libfftw3f.so.3.4.4
7fba9636c000-7fba9636d000 rw-p 0011a000 00:2a 10867571108 /usr/local/fftw/3.3.4-gcc/lib/libfftw3f.so.3.4.4
7fba9636d000-7fba96485000 r-xp 00000000 00:2a 10867571102 /usr/local/fftw/3.3.4-gcc/lib/libfftw3.so.3.4.4
7fba96485000-7fba96684000 ---p 00118000 00:2a 10867571102 /usr/local/fftw/3.3.4-gcc/lib/libfftw3.so.3.4.4
7fba96684000-7fba9668b000 r--p 00117000 00:2a 10867571102 /usr/local/fftw/3.3.4-gcc/lib/libfftw3.so.3.4.4
7fba9668b000-7fba9668c000 rw-p 0011e000 00:2a 10867571102 /usr/local/fftw/3.3.4-gcc/lib/libfftw3.so.3.4.4
7fba9668c000-7fba966fc000 r-xp 00000000 fd:01 813216 /usr/lib64/libtiff.so.5.2.0
7fba966fc000-7fba968fb000 ---p 00070000 fd:01 813216 /usr/lib64/libtiff.so.5.2.0
7fba968fb000-7fba968fc000 r--p 0006f000 fd:01 813216 /usr/lib64/libtiff.so.5.2.0
7fba968fc000-7fba968ff000 rw-p 00070000 fd:01 813216 /usr/lib64/libtiff.so.5.2.0
7fba968ff000-7fba96900000 rw-p 00000000 00:00 0
7fba96900000-7fba96902000 r-xp 00000000 fd:01 805291 /usr/lib64/libdl-2.17.so
7fba96902000-7fba96b02000 ---p 00002000 fd:01 805291 /usr/lib64/libdl-2.17.so
7fba96b02000-7fba96b03000 r--p 00002000 fd:01 805291 /usr/lib64/libdl-2.17.so
7fba96b03000-7fba96b04000 rw-p 00003000 fd:01 805291 /usr/lib64/libdl-2.17.so
7fba96b04000-7fba96d70000 r-xp 00000000 00:2a 4759252287 /usr/local/openmpi/1.10.7-mlx/lib/libmpi.so.12.0.7
7fba96d70000-7fba96f6f000 ---p 0026c000 00:2a 4759252287 /usr/local/openmpi/1.10.7-mlx/lib/libmpi.so.12.0.7
7fba96f6f000-7fba96f72000 r--p 0026b000 00:2a 4759252287 /usr/local/openmpi/1.10.7-mlx/lib/libmpi.so.12.0.7
7fba96f72000-7fba96f92000 rw-p 0026e000 00:2a 4759252287 /usr/local/openmpi/1.10.7-mlx/lib/libmpi.so.12.0.7
7fba96f92000-7fba97027000 rw-p 00000000 00:00 0
7fba97027000-7fba9703f000 r-xp 00000000 00:2a 4759252802 /usr/local/openmpi/1.10.7-mlx/lib/libmpi_cxx.so.1.1.3
7fba9703f000-7fba9723e000 ---p 00018000 00:2a 4759252802 /usr/local/openmpi/1.10.7-mlx/lib/libmpi_cxx.so.1.1.3
7fba9723e000-7fba97240000 r--p 00017000 00:2a 4759252802 /usr/local/openmpi/1.10.7-mlx/lib/libmpi_cxx.so.1.1.3
7fba97240000-7fba97241000 rw-p 00019000 00:2a 4759252802 /usr/local/openmpi/1.10.7-mlx/lib/libmpi_cxx.so.1.1.3
7fba97241000-7fba98bd0000 r-xp 00000000 00:2a 13958194800 /usr/local/cuda/11.4/targets/x86_64-linux/lib/libcufft.so.10.5.1.100
7fba98bd0000-7fba98dd0000 ---p 0198f000 00:2a 13958194800 /usr/local/cuda/11.4/targets/x86_64-linux/lib/libcufft.so.10.5.1.100
7fba98dd0000-7fba98ddf000 r--p 0198f000 00:2a 13958194800 /usr/local/cuda/11.4/targets/x86_64-linux/lib/libcufft.so.10.5.1.100
7fba98ddf000-7fbaaccd1000 rw-p 0199e000 00:2a 13958194800 /usr/local/cuda/11.4/targets/x86_64-linux/lib/libcufft.so.10.5.1.100
7fbaaccd1000-7fbaacec1000 rw-p 00000000 00:00 0
7fbaacec1000-7fbaacee3000 r-xp 00000000 fd:01 797561 /usr/lib64/ld-2.17.so
7fbaacee3000-7fbaacee4000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fbaacee4000-7fbaacee5000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fbaacee5000-7fbaacee6000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fbaacee6000-7fbaacee7000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fbaacee7000-7fbaacee8000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fbaacee8000-7fbaacee9000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fbaacee9000-7fbaaceea000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fbaaceea000-7fbaacf3b000 rw-p 00000000 00:00 0
7fbaacf3b000-7fbaacf4b000 -w-s 00000000 00:05 77032 /dev/nvidia13
7fbaacf4b000-7fbaacf5b000 -w-s 00000000 00:05 69038 /dev/nvidia12
7fbaacf5b000-7fbaacf6b000 -w-s 00000000 00:05 69017 /dev/nvidia9
7fbaacf6b000-7fbaacf7b000 -w-s 00000000 00:05 68875 /dev/nvidia8
7fbaacf7b000-7fbaacf8b000 -w-s 00000000 00:05 19350 /dev/nvidia7
7fbaacf8b000-7fbaacf9b000 -w-s 00000000 00:05 68874 /dev/nvidia6
7fbaacf9b000-7fbaacfab000 -w-s 00000000 00:05 19349 /dev/nvidia5
7fbaacfab000-7fbaacfbb000 -w-s 00000000 00:05 68872 /dev/nvidia4
7fbaacfbb000-7fbaad0b8000 rw-p 00000000 00:00 0
7fbaad0b8000-7fbaad0b9000 rw-s 00000000 00:05 16656 /dev/nvidiactl
7fbaad0b9000-7fbaad0d4000 rw-p 00000000 00:00 0
7fbaad0d4000-7fbaad0d5000 rw-p 00000000 00:00 0
7fbaad0d5000-7fbaad0d6000 -w-s 00207000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0d6000-7fbaad0d7000 -w-s 00206000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0d7000-7fbaad0d8000 -w-s 00205000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0d8000-7fbaad0d9000 -w-s 00204000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0d9000-7fbaad0da000 -w-s 00203000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0da000-7fbaad0db000 -w-s 00202000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0db000-7fbaad0dc000 -w-s 00201000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0dc000-7fbaad0dd000 -w-s 00200000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0dd000-7fbaad0de000 r--s 0ff01000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0de000-7fbaad0df000 r--s 0fb00000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0df000-7fbaad0e2000 rw-p 00000000 00:00 0
7fbaad0e2000-7fbaad0e3000 r--p 00021000 fd:01 797561 /usr/lib64/ld-2.17.so
7fbaad0e3000-7fbaad0e4000 rw-p 00022000 fd:01 797561 /usr/lib64/ld-2.17.so
7fbaad0e4000-7fbaad0e5000 rw-p 00000000 00:00 0
7fbaad0e5000-7fbaad0e6000 rw-s 0fc0c000 00:05 11454 /dev/infiniband/uverbs0
7fbaad0e8000-7fbaad0ec000 rw-s 0fc0e000 00:05 11454 /dev/infiniband/uverbs0
7fbaad100000-7fbaad111000 rw-s 0fd11000 00:05 11454 /dev/infiniband/uverbs0
7fbaad120000-7fbaad138000 rw-s 0fd11000 00:05 11454 /dev/infiniband/uverbs0
7fbaad180000-7fbaad200000 rw-s 0fc13000 00:05 11454 /dev/infiniband/uverbs0
7fbaad200000-7fbaad310000 rw-s 0fd15000 00:05 11454 /dev/infiniband/uverbs0
7fbaad400000-7fbaad811000 rw-s 00116000 00:05 11454 /dev/infiniband/uverbs0
7fbaada00000-7fbaadb41000 rw-s 00115000 00:05 11454 /dev/infiniband/uverbs0
7fbaadc00000-7fbaaee26000 rw-s 00116000 00:05 11454 /dev/infiniband/uverbs0
7fbaaf000000-7fbaaf800000 rw-s 0fc16000 00:05 11454 /dev/infiniband/uverbs0
7fff66e6d000-7fff66e91000 rw-p 00000000 00:00 0 [stack]
7fff66ef2000-7fff66ef4000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
[m3q005:11356] *** Process received signal ***
[m3q005:11356] Signal: Aborted (6)
[m3q005:11356] Signal code: (-6)
[m3q005:11356] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7fba90216630]
[m3q005:11356] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7fba8f19b387]
[m3q005:11356] [ 2] /lib64/libc.so.6(abort+0x148)[0x7fba8f19ca78]
[m3q005:11356] [ 3] /lib64/libc.so.6(+0x78f67)[0x7fba8f1ddf67]
[m3q005:11356] [ 4] /lib64/libc.so.6(+0x81329)[0x7fba8f1e6329]
[m3q005:11356] [ 5] /lib64/libcuda.so.1(+0x268493)[0x7fba8127d493]
[m3q005:11356] [ 6] /lib64/libcuda.so.1(+0x273c6a)[0x7fba81288c6a]
[m3q005:11356] [ 7] /lib64/libcuda.so.1(+0x262e46)[0x7fba81277e46]
[m3q005:11356] [ 8] /lib64/libpthread.so.0(+0x7ea5)[0x7fba9020eea5]
[m3q005:11356] [ 9] /lib64/libc.so.6(clone+0x6d)[0x7fba8f263b0d]
[m3q005:11356] *** End of error message ***
==== backtrace ====
2 0x000000000006c54c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.7.3112/src/mxm/util/debug/debug.c:641
3 0x000000000006ca9c mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.7.3112/src/mxm/util/debug/debug.c:616
4 0x0000000000036400 killpg() ??:0
5 0x0000000000080e80 _int_free() malloc.c:0
6 0x0000000000268493 cuGetErrorString() ??:0
7 0x0000000000273c6a cuGetErrorString() ??:0
8 0x0000000000262e46 cuGetErrorString() ??:0
9 0x0000000000007ea5 start_thread() pthread_create.c:0
10 0x00000000000feb0d __clone() ??:0

@biochem-fan
Copy link
Member

The program appears to be dying inside the CUDA runtime. Very puzzling indeed.

For a quick check, make a subset with very few particles (let say 1000) and try the followings:

  • Run as usual (to make sure the problem is reproducible with fewer particles)
  • Run without MPI (e.g. 1 process, 16 threads)
  • Run without CUDA
  • Run without MPI and CUDA

@SepidehV
Copy link
Author

SepidehV commented Feb 8, 2024 via email

@biochem-fan
Copy link
Member

Thanks for testing. This indicates that the problem is really with CUDA.
If you can share this small subset privately to us, we will investigate further.

@SepidehV
Copy link
Author

SepidehV commented Feb 8, 2024 via email

@biochem-fan
Copy link
Member

Please make the test case as small as possible. Make a new directory and copy minimum required files (STAR file, particles, mask, reference etc) and make sure the problem can be reproduced there. Compress the folder and upload it to Google Drive or Drop Box and send the link to Takanori Nakane, whose email address can be found in the CCPEM mailing list. Don't forget to write the full command line to reproduce the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants