Skip to content

Galaxy outside Kubernetes

sneumann edited this page Jul 26, 2018 · 4 revisions

Galaxy outside Kubernetes

This installation implies that Galaxy runs as a process in a computer that shares a filesystem with a Kubernetes cluster.

Additional requirements

  • A shared filesystem between the machine(s) where Kubernetes runs and the machine where Galaxy will run.
  • This is covered as long as the Persistent Volume created before relies on a disk that is accessible with the same path naming by Galaxy.

Installation and setup for Galaxy

The Kubernetes runner is has been incorporated in Galaxy stable version 16.07. The main requirement in terms of Python code for the Kubernetes runner is to have pykube (a Python client library for Kubernetes REST API), which is still an optional Galaxy install. To make sure it is installed by Galaxy on its virtual environment when running, activate Galaxy's virtual environment and execute the following pip command :

. /srv/galaxy/venv/bin/activate
pip install pykube==0.15.0

Galaxy job_config.xml setup

And now you need to setup the Kubernetes runner to talk to your Kubernetes installation. For this you need add the following on your config/job_config.xml file in Galaxy:

  • In the <job_conf><plugins>...</plugins> add:
<plugin id="k8s" type="runner" load="galaxy.jobs.runners.kubernetes:KubernetesJobRunner">
   <param id="k8s_config_path">/Users/jdoe/.kube/config</param>
   <param id="k8s_persistent_volume_claim_name">galaxy-pvc</param>
   <!-- The following mount path needs to be the initial part of the "file_path" and "new_file_path" paths
    set in universe_wsgi.ini (or equivalent general galaxy config file).
   -->
   <param id="k8s_persistent_volume_claim_mount_path">/Users/jdoe/galaxy_data</param>
   <param id="k8s_namespace">default</param>
   <!-- Allows pods to retry up to this number of times, before marking the Job as failed -->
   <param id="k8s_pod_retrials">1</param>
</plugin>

then, you need to make galaxy aware of the containers where your tools will run. For that, we add, for each container desired, within the <destinations>...</destinations> section of the same job_conf.xml:

<destination id="iso2flux-container" runner="k8s">
      <param id="docker_repo_override">container-registry.phenomenal-h2020.eu</param>
      <param id="docker_owner_override">phnmnl</param>
      <param id="docker_image_override">iso2flux</param>
      <param id="docker_tag_override">latest</param>
      <param id="max_pod_retrials">1</param>
      <param id="docker_enabled">true</param>
</destination>

please not that the Kubernetes runner is also able to trust the tool's set docker container, but still the destination placeholder is needed. If you want the runner to prefer the tool's set container, then instead of using override in all the parameters, such as <param id="docker_repo_override"...>, use default instead. Please note that the set repo needs to fit with where your docker image is available (on which registry). To use docker hub, simply remove the line for docker_repo_[override|default]. In the same manner, the owner and the tag are as well optional. Without owner, we would be talking about an official image, like the Perl or Wordpress official image, but this is not something that would have a lot of sense in our context. Please note as well that on the destination definition line, the internal parameter runner needs to point to the Kubernetes runner ID (as set in the plugins).

Up to this point, we have made Galaxy aware of a Kubernetes installation and of container, but now we need to link Galaxy tools to the destinations associated to the containers. For this, we add again to the same file, a tool entry in the <tools>...</tools>, like this

<tool id="iso2flux" destination="iso2flux-container"/>

Which essentially links the galaxy tool, with ID iso2flux to the docker container set in the destination with id iso2flux-container. This of course means that Galaxy has a tool with that id, which has an .xml wrapper within the tools directories and an entry in config/tool_conf.xml.

Galaxy galaxy.ini setup

On this file, we need to make sure that dataset files and temporary files are within the scope of the shared filsystem that both Galaxy and Kubernetes see, and they should match the settings used in the k8s PV, k8s PVC and Galaxy runner (plugin) setup written before:

# -- Files and directories

# Dataset files are stored in this directory.
file_path = /Users/jdoe/galaxy_data/files

# Temporary files are stored in this directory.
new_file_path = /Users/jdoe/galaxy_data/tmp

Run Galaxy

Finally, you can run Galaxy by issuing ./run.sh on the Galaxy root directory, it should expose your tools and allow you to run them through Kubernetes.

Clone this wiki locally