Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI error from python bindings #81

Open
trontrytel opened this issue Jul 1, 2020 · 3 comments
Open

MPI error from python bindings #81

trontrytel opened this issue Jul 1, 2020 · 3 comments
Assignees

Comments

@trontrytel
Copy link
Collaborator

I'm trying to run parcel model in the singularity image provided in UWLCM. However I'm running into the following error. Any suggestions how I should switch MPI for parcel?

Singularity sng_ubuntu_18_04_cuda_10_0_python3.sif:~/clones/parcel> python3 parcel.py 
both r_0 and RH_0 negative, using default r_0 = 0.022
Traceback (most recent call last):
  File "parcel.py", line 544, in <module>
    parcel(**args)
  File "parcel.py", line 375, in parcel
    micro = _micro_init(aerosol, opts, state, info)
  File "parcel.py", line 103, in _micro_init
    micro = lgrngn.factory(lgrngn.backend_t.serial, opts_init)
RuntimeError: The Python bindings of libcloudph++ Lagrangian microphysics can't be used in MPI runs.
@trontrytel
Copy link
Collaborator Author

I was able to check that OMPI_COMM_WORLD_RANK, LAMRANK and MV2_COMM_WORLD_RANK are null but

PMI_RANK=0 and PMI_SIZE=1

I'm guessing that this is what causes the problems

@trontrytel
Copy link
Collaborator Author

aaaand then typing in terminal unset PMI_RANK solves my issues

I'm guessing it's something specific to the settings of the cluster I'm running on.

@pdziekan
Copy link
Contributor

pdziekan commented Jul 2, 2020

Good job solving it yourself :)

libcloudph++ checks if a simulation uses MPI by looking at some env vars:

bool ran_with_mpi() { return ( // mpich std::getenv("PMI_RANK") != NULL || // openmpi std::getenv("OMPI_COMM_WORLD_RANK") != NULL || // lam std::getenv("LAMRANK") != NULL || // mvapich2 std::getenv("MV2_COMM_WORLD_RANK") != NULL ); }

These are set by the mpirun command.
I think that the problem is that when you run a single-node job on the cluster, you still use mpirun.

We could try using mpi_rank >= 1 for detecting mpi jobs.
I made a branch with such change on my libcloud repo: https://github.com/pdziekan/libcloudphxx/tree/mpi_detection
Please test it on your cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants