Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

OpenMPI and Docker containers #378

Open
andrealati opened this issue Jul 28, 2022 · 0 comments
Open

OpenMPI and Docker containers #378

andrealati opened this issue Jul 28, 2022 · 0 comments

Comments

@andrealati
Copy link

andrealati commented Jul 28, 2022

Problem Description

I am trying to run a simple OpenMPI test code using mpi4py and this docker image (aalati/mpi_ex_mit). The container includes a python script with mpi4py that checks the nodes communications and a shell script that passes the python script to mpiexec.

When I submit the job to the pool using shipyard I get the following error.

Error response from daemon: Cannot kill container: simjob-aalati-mpi_ex_mit: No such container: simjob-aalati-mpi_ex_mit
Error: No such container: simjob-aalati-mpi_ex_mit
Warning: Permanently added '[10.0.0.5]:23' (ECDSA) to the list of known hosts.
Warning: Permanently added '[10.0.0.6]:23' (ECDSA) to the list of known hosts.

**********************************************************

Open MPI does not support recursive calls of mpiexec

**********************************************************

I am not sure if the problem comes from the pool or job configuration or from the construction of the container (even if this is less likely as it works as expected when I run it locally). I have attached below the Dockerfile and the jobs configuration in case useful.

I would appreciate any advice on the issue. Thank you very much for your help.

Batch Shipyard Version

I am using the version on Azure CloudShell for now.

Redacted Configuration

jobs.yaml

job_specifications:
- auto_complete: true
  id: simjob
  tasks:
  - docker_image: aalati/mpi_ex_mit
    additional_docker_run_options: [-w /root/codes] 
    multi_instance:
      num_instances: pool_current_dedicated
      mpi:
        runtime: openmpi
        executable_path: /usr/bin/mpiexec
        processes_per_node: nproc
        options: 
        - -mca btl_base_warn_component_unused 0
    command:  /bin/bash -c "bash -i ./mpi_example.sh"

Dockerfile

# Filename: Dockerfile

FROM ubuntu:18.04 

COPY ssh_config /root/.ssh/config

RUN apt-get -y update && \ 
	apt-get -y install gcc gfortran g++ libopenmpi-dev wget openssh-server openssh-client \
	&& apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    # configure ssh server and keys
    && mkdir /var/run/sshd \
    && ssh-keygen -A \
    && sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config \
    && sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd \
    && ssh-keygen -f /root/.ssh/id_rsa -t rsa -N '' \
    && chmod 600 /root/.ssh/config \
    && chmod 700 /root/.ssh \
    && cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /root/miniconda.sh && \
	bash ~/miniconda.sh -b && \
	export PATH="/root/miniconda3/bin:$PATH" && \
	conda init bash 

RUN . /root/.bashrc && \ 
	conda create --name py37 python=3.7 -y && \
	conda activate py37 && \
	conda install numpy scipy matplotlib tabulate seaborn statsmodels -y && \
	#conda install -c conda-forge interpolation fredapi time bdw-gc r-operator.tools r-sys -y && \
	wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-3.0.3.tar.gz -O /root/mpi4py-3.0.3.tar.gz && \
	tar -zxf /root/mpi4py-3.0.3.tar.gz && \ 
	cd mpi4py-3.0.3 && \
	which mpicc python && \
	python setup.py build && \
	python setup.py install && \
	conda install -c anaconda openpyxl && \
	conda install -c conda-forge interpolation fredapi time -y


COPY codes /root/codes

RUN chmod u+x /root/codes/mpi_example.sh

WORKDIR /root/codes

RUN echo "conda activate py37" >> /root/.bashrc

#ENTRYPOINT ["bash","-i","./mpi_example.sh"]
# make sshd listen on 23 and run by default
EXPOSE 23
CMD ["/usr/sbin/sshd", "-D", "-p", "23"]

ssh_config

Host 10.*
  Port 23
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant