Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Pre-Build images in Github #249

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mattelacchiato
Copy link

This is a proposal as requested by @yanokwa at
https://forum.getodk.org/t/host-odk-central-docker-images-on-github-container-registry/29812/19?u=mattelacchiato

It introduces the usage of Github actions to build the docker images in
Github and host them at ghcr.io (the github package repository). For
open source projects, the infrastructure is for free.

Additionally, I've removed the unnamed volumes from the
docker-compose.yml. The reason is that unnamed volumes are hard to
backup. Currently, the enketo secrets are not part of the backup, which
is really bad if you have to recover your installation and all public
links are not working anymore for submitters.

For the same aspect of easier backup on OS-level, I moved the postgres
and redis folder to a local volume, too.

I switch the secrets docker base image from node to alpine, which is
much more lightweight. We don't need node for a simple bash script
execution.

This setup will increase the benefits for experienced administrators
without removing any convenience for beginners.

@issa-tseng
Copy link
Member

hello!
i think this is a great idea in general, and thank you so much for the thoughtful pr.

please understand you are the docker expert out of the two of us, so everything i'm saying here is from a docker-naïve maintainer's perspective, not authority:

  1. please help me understand the implications for existing installations that these containers and volumes have all been moved around.
  2. please also help me understand what the action itself is? when is it run, what is published, and what's the implications of that publication? how does it, for example, decide on a version number? i realize i could learn this by reading your config and the documentation, but i'm hoping you can just summarize for me off the top of your head.
  3. last, i hear your desire to move secrets to alpine. in the version 1.3 release, i made an effort to move all our images to alpine as part of going to node 14. i had trouble with our nginx image as we have this lua script that deflates compressed request POST-bodies. because of this, i pulled back on moving anything to alpine under the reasoning that we have a lot of deployments in areas with awful connections and i'd rather save them having to download the one more thing even if is more efficient on disk and to run. of course, maybe we already depend on alpine somewhere and i'm being silly. but that was my previous reasoning for not wanting to touch alpine at all. maybe if you can show i'm silly or resolve the nginx issue i had we can do some or all alpine. it'd be nice.

@issa-tseng
Copy link
Member

related to volumes and naming, this previous attempt to correct the anonymous volume problem we eventually decided was too risky to implement for what we get out of it, which is probably useful to your context and thinking. i hope.

build:
context: .
dockerfile: service.dockerfile
image: ghcr.io/mattelacchiato/service:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the intentional target?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should be ghcr.io/getodk/service:LATEST_TAG, where LATEST_TAG is the latest version you've tagged in git.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be fixed before you release a new version. I've only set it here to my repository, since it's the one where I have access to, so I can test the github actions

@mattelacchiato
Copy link
Author

mattelacchiato commented Oct 25, 2021

related to volumes and naming, this previous attempt to correct the anonymous volume problem we eventually decided was too risky to implement for what we get out of it, which is probably useful to your context and thinking. i hope.

Thanks for the hint! It lead me to read the docker-compose down manual to figure out: Yes, named volumes are not getting deleted by this command. But we still have the problem of "backupability".

But anyway I want to say:

  1. I thought about cutting out the volumes changes of this PR since they are not required for the initial motivation: Publish pre-build docker images. So if we conclude that the volume changes are not useful, we can still use the other changes.
  2. I'm not real docker expert - I'm only using it for some time in our projects. Therefore I've asked my colleagues for feedback on this volume question.

By the way: docker-compose down will also destroy the generated SSL certificates from letsencrypt. Although you can re-create them, this takes a) a few minutes since the Diffie Hellman keys needs to be re-generated as well and b) will result in letsencrypt ban for a few days once you hit their rate limit. I've hit this at least two times while playing around with the setup.

@mattelacchiato
Copy link
Author

mattelacchiato commented Oct 25, 2021

  1. please help me understand the implications for existing installations that these containers and volumes have all been moved around.

Sadly, I didn't thought about existing installations. Could we maybe solve this via a manual step in the manual and release notes?

  1. please also help me understand what the action itself is? when is it run, what is published, and what's the implications of that publication?
    For me, it is the first time that I've worked with github actions, so I'm completely new to this, too. That is also why I couldn't manage to remove the code duplication.

But I'll try my best explaining, what I understood so far. I will copy the relevant parts here for explanation:

on:
  push:
    branches: [ master ]
    # Publish semver tags as releases.
    tags: [ 'v*.*.*' ]

On every push to master and every tag, the workflow will run. (Previously it would have been run also for PR to master, but this is not what we want, so I've just deleted this)

REGISTRY_WITH_PATH: ghcr.io/${{ github.repository_owner }}

The docker images are pushed to github's package storage of your account ("getodk"). In my case, you can find the images here: https://github.com/mattelacchiato?tab=packages&repo_name=central
I'm not sure what you mean with "implications of that publication"...?

how does it, for example, decide on a version number?

          tags: |
            type=ref,event=branch
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}.{{hotfix}}
            type=semver,pattern={{major}}.{{minor}}
            type=semver,pattern={{major}}
            type=sha

These are the version tags that are created:

  • the current branch name (currently "master" only)
  • the complete git tag version (e.g. "1.3.1")
  • more broadened versions like "1.3" and "1" (usefull if the admin wants to have always the latest minor version, e.g. "1.3" for all "1.3.x" versions.
  • the hash of the latest HEAD commit (e.g. 2f2565f)
  1. last, i hear your desire to move secrets to alpine. in the version 1.3 release, i made an effort to move all our images to alpine as part of going to node 14. i had trouble with our nginx image as we have this lua script that deflates compressed request POST-bodies. because of this, i pulled back on moving anything to alpine under the reasoning that we have a lot of deployments in areas with awful connections and i'd rather save them having to download the one more thing even if is more efficient on disk and to run. of course, maybe we already depend on alpine somewhere and i'm being silly. but that was my previous reasoning for not wanting to touch alpine at all.

The reason why I've chose alpine was that it was previously based on node12 (not node 14). This would increase the download volume significantly. We could change to node14, but I would assume that 5,6MB (the size of alpine) wouldn't make that much of a difference compared to the 1,17GB of the service image...

maybe if you can show i'm silly or resolve the nginx issue i had we can do some or all alpine. it'd be nice.
I could have a look, but my time is very limited...

I hope, I could answer all of your questions.

@mattelacchiato
Copy link
Author

Regarding the volume questions, this is what some people on reddit say: https://www.reddit.com/r/docker/comments/dzhlhs/why_use_named_volumes_instead_of_host_volume/?utm_source=share&utm_medium=web2x&context=3

It would be fine for me to keep the volumes as they are and remove the changes from this PR. At least in our installations, I will keep the secrets not in a named volume, but in a local mount for easier backup/restore.

@issa-tseng
Copy link
Member

unless you can provide a lot of automation and foolproofing around it, i think asking people to run commands during upgrade to prevent data loss is too much to ask. not everybody reads the instructions when they upgrade. as mentioned, we have the same concerns but we stopped down this road because we tried very hard to come up with an answer we felt we could trust but came up short. we are interested in naming these volumes but i need to see a really good answer for deployment/migration.

my suggestion would be to drop that element from the pr and open a new one focused on that if you're interested in tackling it. if you are, know that i'm also interested in moving to postgres 14, which seems like a related problem area (locate and perform the upgrade on the volume and associate it back to a 14.x image).

as for alpine, that's fine. number of connections can be just as bad (or worse) than download size and i'd still rather stick to the minimum we can offer for the sake of those with poor connections but if you feel strongly about alpine being included i guess i don't care that much. thanks for catching that it was still on 12.

@mattelacchiato
Copy link
Author

Yeah, I will shrink this MR down. Hopefully within the next 2 weeks ;-)

mattelacchiato and others added 4 commits February 18, 2022 15:02
This is a proposal as requested by @yanokwa at
https://forum.getodk.org/t/host-odk-central-docker-images-on-github-container-registry/29812/19?u=mattelacchiato

It introduces the usage of Github actions to build the docker images in
Github and host them at ghcr.io (the github package repository). For
open source projects, the infrastructure is for free.

Additionally, I've removed the unnamed volumes from the
docker-compose.yml. The reason is that unnamed volumes are hard to
backup. Currently, the enketo secrets are not part of the backup, which
is really bad if you have to recover your installation and all public
links are not working anymore for submitters.

For the same aspect of easier backup on OS-level, I moved the postgres
and redis folder to a local volume, too.

I switch the secrets docker base image from node to alpine, which is
much more lightweight. We don't need node for a simple bash script
execution.

This setup will increase the benefits for experienced administrators
without removing any convenience for beginners.
@mattelacchiato
Copy link
Author

Sorry for the long waiting time! I've just pushed a new version without the volume changes. I hope, I didn't miss a thing...

My project (where we use ODK) suddenly died. So I don't have any test system where I could test this easily. But I am sure that you have one, so feel free to check if it is working for you =)

@lcalisto
Copy link

Any update on this?

@tobiasmcnulty
Copy link

I started a slimmed-down version of these changes in #546 (based on this PR, thank you @mattelacchiato!), in case it could be merged in smaller steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants