Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting multi-node, containerised deployments #3198

Open
h4l opened this issue May 13, 2024 · 1 comment
Open

Supporting multi-node, containerised deployments #3198

h4l opened this issue May 13, 2024 · 1 comment

Comments

@h4l
Copy link

h4l commented May 13, 2024

Recently I reworked Cambridge University's ArchivesSpace deployment to address performance and reliability/availability problems that it'd been struggling with. We now deploy ArchivesSpace from a bespoke container image I created. We run each of the components (frontend, backend, public UI, indexer etc) in separate containers, and run 3 instances of the frontend, backend and public UI.

This approach has been really successful. It's been in production for a few months and we've had no downtime (other than one or two unrelated infra problems), and performance is consistent. Previously (with the conventional unix service deployment in one JVM process) we were experiencing the service becoming unresponsive quite regularly (sometimes every few days, usually every week or two). This was initially due to the JVM's memory not being appropriately tuned, but even after I did some tuning and scaled up the server a lot, it would still run out of memory periodically.

We can also do zero-downtime deploys now, which is very handy. We use a single-VM Docker Swarm cluster to run the containers, but this approach would work with Kubernetes or any orchestrator.

We'd like to share the work we've done to enable ArchivesSpace to deploy like this. It seems like it should be helpful for other people to be able to deploy from containers, even if they run everything from a single container. It would also help us if ArchivesSpace had official support for this, to reduce the amount of difference between our own deployment and the vanilla ArchivesSpace.

So this issue is really to talk about multi-node and containerised deployments, whether ArchivesSpace would like to have official support for them, and whether I can help with that.

(I've got a related PR in the oauth plugin to contribute a change I made to that plugin to support multi-node deployments: lyrasis/aspace-oauth#33 )

To build our ArchivesSpace container image I created a standalone repo (not a fork of the main ArchivesSpace source repo) that contains a kind of generic container image build for ArchivesSpace that allows anyone to build an ArchivesSpace image, customised with whichever plugins they use. That is here: https://gitlab.developers.cam.ac.uk/lib/dev/ams/archivesspace-container

Then we have another repo that uses archivesspace-container to build the image we deploy — it configures our plugins and has the CI config to build our images. So it's basically an example of using archivesspace-container in practice. That is here: https://gitlab.developers.cam.ac.uk/lib/dev/ams/infra/

I can say more about the changes I made to get this working, but it wasn't anything significant, most things were already in place. I'll leave it there for now, don't want to write too much!

@h4l
Copy link
Author

h4l commented May 14, 2024

There are a couple of scenarios/approaches I can imagine here:

  1. ArchivesSpace wants to keep the current single-JVM deployment, without
    official container deployment support

    • If I could contribute the core changes below, we could maintain our
      container build separately without needing any weird hacks
  2. ArchivesSpace wants to support multi-node container deployments

    • If you're happy with the general approach I've taken, I could work to
      integrate the container build as part of the core ArchivesSpace repo, or we
      could keep the container build as a separate repo, but under the
      ArchivesSpace project.
    • If you'd rather do things differently to some extent, maybe I can help with
      those changes so that we at Cambridge University can stay as close to
      vanilla ArchivesSpace as possible.

The following are the main changes/additions I made to support multi-node,
container deployments:

Core ArchivesSpace changes

  • Upgrade jruby

  • Allow components to be enabled but not running (in the local process)

  • Allow changing the temp directory

    • By default it's inside the data directory, which isn't ideal for a
      container, as the data dir will be a persistent volume. I currently override
      it by changing the java.io.tmpdir system property from launcher_rc.rb.
  • Shutdown if an error occurs during startup

    • In order for container orchestrators to notice that a deployment has failed
      and automatically roll back, the container process needs to exit if an error
      occurs at startup (e.g. due to a configuration problem).
    • By default both Jetty and jruby-rack leave the server running after an error
      at startup. I reconfigure Jetty and jruby rack in launcher_rc.rb.
    • Similarly, we also shutdown the process if an uncaught exception occurs.
      (With the expectation that the container is auto-restarted).

Extra, somewhat container-specific features

Extra things the container image supports which are not essential. These could
reasonably remain being container-specific things done in a custom
launcher_rc.rb file and or config.rb file.

  • Support for configuring via environment variables.

    • Containers are often configured via environment variables rather than
      editing config files, and the image supports this by defining a custom
      config.rb that sets the common settings (e.g. database connection details)
      from envars. There are some utility functions to help with this, and support
      things like loading secrets from files via *_FILE envars.

    • This functionality could be equally useful for a systemd service definition
      or similar.

  • Support logging the resolved configuration at startup

    • If the ASPACE_SHOW_CONFIG envar is set, ArchivesSpace logs the config it's
      running with as it starts. This is very helpful when configuring via
      environment variables, as there isn't a single config.rb file to look at to
      determine the config.
  • Allow plugins to run code early at startup, at the same point as
    launcher_rc.rb.

    • Currently, plugins can define init code, but it runs later on than
      launcher_rc.rb. I mainly added this as otherwise there wouldn't be a way
      to customise launcher_rc.rb for a customised container image, because the
      core image uses launcher_rc.rb to enable all changes listed above.
  • Healthcheck support.

    • In order to support container healthchecks, I have ArchivesSpace write a
      text file for each component started containing the port the component is
      listening on.

    • The container image healthcheck script uses these files to a) verify that
      the component started (i.e. the file must exist) and b) check that the
      component can respond to an HTTP request.

Plugin configuration

To install a plugin for a conventional ArchivesSpace deployment, people would
copy in the plugin files and edit their config.rb as needed.

I needed to install plugins into the image at build time, and provide a way to
configure them. Ideally without forking the container image repo, as that would
be a maintenance burden for anyone using the image.

To support this, I allow the image build to be configured to include a dir containing
plugin definition config files. The config files list things like a git repo and
branch/tag to install from, gems that the plugin needs, and config.rb, code to
run at launcher_rc.rb time, and shell script to run from the container
entrypoint script.

For example, here's the config file we use to install the oauth plugin:
https://gitlab.developers.cam.ac.uk/lib/dev/ams/infra/-/blob/main/archivesspace-plugins/cam_customisations/plugin.toml

I wrote a small CLI program that works with these config files to lock the git
branch/tag ref to a specific commit hash, and install the plugin files during
the container build.

My normal go-to dynamic language is Python rather than Ruby, so I wrote this
tool in Python to get it done quickly. So this could be an issue. On the other
hand, it's pretty straightforward and not very large. It's in a
subdir of the archivesspace-container repo.

Container image itself

The container image supports either running one or more ArchivesSpace
components, or running the maintenance scripts (e.g. to run a db migration).
It's basically a fairly normal container image, but I've worked out various
small kinks, like cleaning up tmp files on restart so they don't accumulate,
setting up healthchecks in the entrypoint, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant