Supporting multi-node, containerised deployments #3198

h4l · 2024-05-13T16:50:42Z

Recently I reworked Cambridge University's ArchivesSpace deployment to address performance and reliability/availability problems that it'd been struggling with. We now deploy ArchivesSpace from a bespoke container image I created. We run each of the components (frontend, backend, public UI, indexer etc) in separate containers, and run 3 instances of the frontend, backend and public UI.

This approach has been really successful. It's been in production for a few months and we've had no downtime (other than one or two unrelated infra problems), and performance is consistent. Previously (with the conventional unix service deployment in one JVM process) we were experiencing the service becoming unresponsive quite regularly (sometimes every few days, usually every week or two). This was initially due to the JVM's memory not being appropriately tuned, but even after I did some tuning and scaled up the server a lot, it would still run out of memory periodically.

We can also do zero-downtime deploys now, which is very handy. We use a single-VM Docker Swarm cluster to run the containers, but this approach would work with Kubernetes or any orchestrator.

We'd like to share the work we've done to enable ArchivesSpace to deploy like this. It seems like it should be helpful for other people to be able to deploy from containers, even if they run everything from a single container. It would also help us if ArchivesSpace had official support for this, to reduce the amount of difference between our own deployment and the vanilla ArchivesSpace.

So this issue is really to talk about multi-node and containerised deployments, whether ArchivesSpace would like to have official support for them, and whether I can help with that.

(I've got a related PR in the oauth plugin to contribute a change I made to that plugin to support multi-node deployments: lyrasis/aspace-oauth#33 )

To build our ArchivesSpace container image I created a standalone repo (not a fork of the main ArchivesSpace source repo) that contains a kind of generic container image build for ArchivesSpace that allows anyone to build an ArchivesSpace image, customised with whichever plugins they use. That is here: https://gitlab.developers.cam.ac.uk/lib/dev/ams/archivesspace-container

Then we have another repo that uses archivesspace-container to build the image we deploy — it configures our plugins and has the CI config to build our images. So it's basically an example of using archivesspace-container in practice. That is here: https://gitlab.developers.cam.ac.uk/lib/dev/ams/infra/

I can say more about the changes I made to get this working, but it wasn't anything significant, most things were already in place. I'll leave it there for now, don't want to write too much!

The text was updated successfully, but these errors were encountered:

h4l · 2024-05-14T14:06:00Z

There are a couple of scenarios/approaches I can imagine here:

ArchivesSpace wants to keep the current single-JVM deployment, without
official container deployment support
- If I could contribute the core changes below, we could maintain our
  container build separately without needing any weird hacks
ArchivesSpace wants to support multi-node container deployments
- If you're happy with the general approach I've taken, I could work to
  integrate the container build as part of the core ArchivesSpace repo, or we
  could keep the container build as a separate repo, but under the
  ArchivesSpace project.
- If you'd rather do things differently to some extent, maybe I can help with
  those changes so that we at Cambridge University can stay as close to
  vanilla ArchivesSpace as possible.

The following are the main changes/additions I made to support multi-node,
container deployments:

Core ArchivesSpace changes

Upgrade jruby
- I upgraded jruby to 9.2.21 (from 9.2.20) because this version fixed a memory
  leak in the frozen string cache (
  https://github.com/jruby/jruby/releases/tag/9.2.21.0 ).
Allow components to be enabled but not running (in the local process)
- See commit
  https://gitlab.developers.cam.ac.uk/lib/dev/ams/archivesspace-container/-/commit/4159d5736e1053f1c91c3d63f56875ac2f69b5f3
- I hacked support for this, but it needs support in ArchivesSpace directly
Allow changing the temp directory
- By default it's inside the data directory, which isn't ideal for a
  container, as the data dir will be a persistent volume. I currently override
  it by changing the java.io.tmpdir system property from launcher_rc.rb.
Shutdown if an error occurs during startup
- In order for container orchestrators to notice that a deployment has failed
  and automatically roll back, the container process needs to exit if an error
  occurs at startup (e.g. due to a configuration problem).
- By default both Jetty and jruby-rack leave the server running after an error
  at startup. I reconfigure Jetty and jruby rack in launcher_rc.rb.
- Similarly, we also shutdown the process if an uncaught exception occurs.
  (With the expectation that the container is auto-restarted).

Extra, somewhat container-specific features

Extra things the container image supports which are not essential. These could
reasonably remain being container-specific things done in a custom
launcher_rc.rb file and or config.rb file.

Support for configuring via environment variables.
- Containers are often configured via environment variables rather than
  editing config files, and the image supports this by defining a custom
  config.rb that sets the common settings (e.g. database connection details)
  from envars. There are some utility functions to help with this, and support
  things like loading secrets from files via *_FILE envars.
- This functionality could be equally useful for a systemd service definition
  or similar.
Support logging the resolved configuration at startup
- If the ASPACE_SHOW_CONFIG envar is set, ArchivesSpace logs the config it's
  running with as it starts. This is very helpful when configuring via
  environment variables, as there isn't a single config.rb file to look at to
  determine the config.
Allow plugins to run code early at startup, at the same point as
launcher_rc.rb.
- Currently, plugins can define init code, but it runs later on than
  launcher_rc.rb. I mainly added this as otherwise there wouldn't be a way
  to customise launcher_rc.rb for a customised container image, because the
  core image uses launcher_rc.rb to enable all changes listed above.
Healthcheck support.
- In order to support container healthchecks, I have ArchivesSpace write a
  text file for each component started containing the port the component is
  listening on.
- The container image healthcheck script uses these files to a) verify that
  the component started (i.e. the file must exist) and b) check that the
  component can respond to an HTTP request.

Plugin configuration

To install a plugin for a conventional ArchivesSpace deployment, people would
copy in the plugin files and edit their config.rb as needed.

I needed to install plugins into the image at build time, and provide a way to
configure them. Ideally without forking the container image repo, as that would
be a maintenance burden for anyone using the image.

To support this, I allow the image build to be configured to include a dir containing
plugin definition config files. The config files list things like a git repo and
branch/tag to install from, gems that the plugin needs, and config.rb, code to
run at launcher_rc.rb time, and shell script to run from the container
entrypoint script.

For example, here's the config file we use to install the oauth plugin:
https://gitlab.developers.cam.ac.uk/lib/dev/ams/infra/-/blob/main/archivesspace-plugins/cam_customisations/plugin.toml

I wrote a small CLI program that works with these config files to lock the git
branch/tag ref to a specific commit hash, and install the plugin files during
the container build.

My normal go-to dynamic language is Python rather than Ruby, so I wrote this
tool in Python to get it done quickly. So this could be an issue. On the other
hand, it's pretty straightforward and not very large. It's in a
subdir of the archivesspace-container repo.

Container image itself

The container image supports either running one or more ArchivesSpace
components, or running the maintenance scripts (e.g. to run a db migration).
It's basically a fairly normal container image, but I've worked out various
small kinks, like cleaning up tmp files on restart so they don't accumulate,
setting up healthchecks in the entrypoint, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting multi-node, containerised deployments #3198

Supporting multi-node, containerised deployments #3198

h4l commented May 13, 2024

h4l commented May 14, 2024

Supporting multi-node, containerised deployments #3198

Supporting multi-node, containerised deployments #3198

Comments

h4l commented May 13, 2024

h4l commented May 14, 2024

Core ArchivesSpace changes

Extra, somewhat container-specific features

Plugin configuration

Container image itself