Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run mutiple queues on one node with error : mpirun noticed .... exited on signal 15 (Terminated) #748

Open
jialay opened this issue Feb 20, 2022 · 0 comments

Comments

@jialay
Copy link

jialay commented Feb 20, 2022

Please submit help issues to:
https://matsci.org/atomate

The vasp jobs run well with my SGE queue system. The single job also run well with atomate, but it will run into error with mutiple queues jobs on one node. The jobs can be submitted successfully, but would encounter a mpirun error.
the vasp.out file shows that : "mpirun noticed that process rank 3 with PID 57743 on node node3 exited on signal 15 (Terminated)"
this error never show in SGE that directly runs with "mpirun -np n vasp".

I think it would be a bug in atomate or custodian.
I just figure out that the vasp pid is submitted by the func "self.pid = _posixsubprocess.fork_exec()" in custodian.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant