Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add initial container file and github build action #54575

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

BlaineEXE
Copy link
Contributor

@BlaineEXE BlaineEXE commented Nov 20, 2023

This is only an initial implementation. It is not expected to work for any branch except for Ceph 'main'. The build action is also disabled for any pull request except ones that modify the container build files themselves to make sure that any issues do not cause confusion with ongoing or future feature PRs.

This PR focuses on creating a good starting point for a "Containerfile" definition, tooling for building multi-architecture images using it, and pushing the image to a temporary repository: quay.io/ceph/ceph-staging.

There are many TODO items noted in the files that should be followed up on in future PRs.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added the CI Continuous Integration label Nov 20, 2023
@BlaineEXE BlaineEXE force-pushed the add-container-file branch 7 times, most recently from 919c509 to 13f476f Compare November 20, 2023 18:45
@BlaineEXE
Copy link
Contributor Author

BlaineEXE commented Nov 20, 2023

Note: the Containerfile is not in the root of the repo and is instead placed into ./container for some practical reasons:

  1. for developers using non-Linux systems to build the container image (i.e., MacOS users), the build context needs to copy the entire ceph repo into a VM for the build, which is unnecessary and takes a truly massive amount of time (for no benefit)
  2. it provides a space for adding scripts/makefile to help with the container build into the ./container dir, keeping any container tooling in the same place without cluttering the repo root

Comment on lines 101 to 86
# Copr repos
# TODO: afaict, scikit is for mgr-diskpredition-local, asyncssh for cephadm, but why do we
# encode these as deps only in container instead of including them as a vendored
# submodule in the ceph repo or something to be included with ceph packages?
# @tchaikov @torchiaf @adk3798 @ktdreyer?
# ref: https://github.com/ceph/ceph-container/pull/1821
# note: scikit-learn packages are not included in ubi8 in ceph-container
RUN \
dnf install -y --setopt=install_weak_deps=False dnf-plugins-core && \
dnf copr enable -y tchaikov/python-scikit-learn && \
dnf copr enable -y tchaikov/python3-asyncssh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guits @adk3798 @torchiaf @tchaikov do any of you have any thoughts here?

Copy link
Contributor

@adk3798 adk3798 Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For asyncssh at least, the cephadm mgr module needs it. I think we couldn't find any packaging of it in official repos so Kefu had this copr repo set up. We should maybe host this under the copr of somebody more involved in the project at this point, but we still need it somewhere.

Edit - didn't read the full comment first.

I assume keeping it in a copr repo is just easier than having it as a submodule, but I didn't really look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for weighing in. It's not a big issue, but it seemed worth asking since I'm not sure how hard it is to set up and maintain copr repos. Regardless, it'll be something to changed later down the line if changing it makes sense.

Comment on lines 119 to 120
# General
PACKAGES="ca-certificates" && \
# Ceph
# TODO: I removed these packages that were included in ganesha packages:
# gcc, lua-devel, luarocks
# They are listed as BuildRequires in ceph.spec.in, so I assume they are unnecessary
# @mattbenjamin please advise about whether these are necessary for runtime
PACKAGES="$PACKAGES \
ceph-common${CEPH_PACKAGE_VERSION} \
ceph-exporter${CEPH_PACKAGE_VERSION} \
ceph-grafana-dashboards${CEPH_PACKAGE_VERSION} \
ceph-immutable-object-cache${CEPH_PACKAGE_VERSION} \
ceph-mds${CEPH_PACKAGE_VERSION} \
ceph-mgr-cephadm${CEPH_PACKAGE_VERSION} \
ceph-mgr-dashboard${CEPH_PACKAGE_VERSION} \
ceph-mgr-diskprediction-local${CEPH_PACKAGE_VERSION} \
ceph-mgr-k8sevents${CEPH_PACKAGE_VERSION} \
ceph-mgr-rook${CEPH_PACKAGE_VERSION} \
ceph-mgr${CEPH_PACKAGE_VERSION} \
ceph-mon${CEPH_PACKAGE_VERSION} \
ceph-osd${CEPH_PACKAGE_VERSION} \
ceph-radosgw${CEPH_PACKAGE_VERSION} \
ceph-volume${CEPH_PACKAGE_VERSION} \
cephfs-mirror${CEPH_PACKAGE_VERSION} \
cephfs-top${CEPH_PACKAGE_VERSION} \
kmod \
libradosstriper1${CEPH_PACKAGE_VERSION} \
rbd-mirror${CEPH_PACKAGE_VERSION} \
$(cat crimson-packages.txt) \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattbenjamin could you advise if gcc, lua-devel, or luarocks packages are needed in the ceph container image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lua-devel and luarocks are runtime dependencies for radosgw, and #52931 (cc @yuvalif) is moving them from BuildRequires to Requires

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added those back in here. Left gcc removed for now.

Comment on lines +172 to +156
# Ceph-CSI
# TODO: coordinate with @Madhu-1 to have Ceph-CSI install these itself if unused by ceph
# @adk3798 does cephadm use these?
PACKAGES="$PACKAGES \
attr \
ceph-fuse${CEPH_PACKAGE_VERSION} \
rbd-nbd${CEPH_PACKAGE_VERSION} \
" && \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 @adk3798 let me know if you have thoughts around this. It seems to me like keeping the ceph-fuse and rbd-nbd drivers in the ceph image makes sense, even for non-CSI images. But if you have differing thoughts, let's discuss.

Also, @Madhu-1 is attr still needed, and if so, do you think it's broadly needed or only for Ceph-CSI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cephadm doesn't use these directly at least. Can't say with certainty for any of the daemons cephadm deploys that use the ceph container image though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving these for now.

Comment on lines 187 to 170
# TODO: Per discussion, sg3_utils should be moved to ceph.spec.in with libstoragemgmt; @ktdreyer
# ref: https://github.com/ceph/ceph-container/pull/2013#issuecomment-1248606472
PACKAGES="$PACKAGES \
gdisk \
hostname \
procps-ng \
sg3_utils \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktdreyer FYI, it seems like there will be a follow-up after this to mark sg3_utils as an RPM "Requires".

Comment on lines 25 to 34

# TODO: Is this still used? Can we get rid of this? @adk3798 do you know?
# ref: https://github.com/ceph/ceph-container/pull/1604
# Is a ceph container ?
LABEL ceph="True"

# TODO: This doesn't seem to be used by rook or ceph. Can we get rid of this? @guits do you know?
# ref: https://github.com/ceph/ceph-container/pull/1969
ENV I_AM_IN_A_CONTAINER 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adk3798 @guits do you see a need to keep these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cephadm doesn't use these directly. It might link somehow to the CEPH_USE_RANDOM_NONCE stuff though. Cephadm's use of that was just recently removed in #50344

Copy link
Contributor Author

@BlaineEXE BlaineEXE Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a search of the ceph repo, I didn't find any references to I_AM_IN_A_CONTAINER, but I'm not sure if there could be usages I'd miss by searching like that.

Would @jdurgin know? Or maybe know who I should be asking to find out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might have been just for humans and/or custom scripting in the container. It's good to be able to query, although maybe there's some more-standard way?...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are new org.opencontainers standardized labels added now that should be good for querying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a process running inside a container query the container labels?

tags: ${{ steps.container-build.outputs.tags }}
registry: quay.io/ceph/ceph-staging
username: ceph+ceph_staging_ci
password: ${{ secrets.CEPH_STAGING_QUAY_TOKEN }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guits this variable seems to be empty, causing this build-push to fail. Is this token misspelled maybe, or is it not created as an "Actions" secret in the repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's well an action secret CEPH_STAGING_QUAY_TOKEN, i've just reset its value to make sure.

Comment on lines +213 to +203
# CLEAN UP!
RUN set -x && \
dnf clean all && \
rm -rf /var/cache/dnf/* && \
rm -rf /var/lib/dnf/* && \
rm -f /var/lib/rpm/__db* && \
# remove unnecessary files with big impact
rm -rf /etc/selinux /usr/share/{doc,man,selinux} && \
# don't keep compiled python binaries
find / -xdev \( -name "*.pyc" -o -name "*.pyo" \) -delete

# Verify that the packages installed haven't been accidentally cleaned, then
# clean the package list and re-clean unnecessary RPM database files
RUN rpm -q $(cat packages.txt) && rm -f /var/lib/rpm/__db* && rm -f *packages.txt
Copy link
Contributor Author

@BlaineEXE BlaineEXE Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revising the clean up from ceph-container, I am seeing an ARM image that is 1.07GB on my machine, versus 1.25GB to 1.28GB for other quay.io/ceph images I've pulled.

Of note, the parens in the find ... -delete command fix a bug where *.pyc files weren't being deleted before. Plus removing unnecessary DNF and RPM database files.

I decided to leave bash completion files (which have changed locations apparently) in the image because they are less than 1MB in total.

@cbodley
Copy link
Contributor

cbodley commented Nov 21, 2023

the squid release will be dropping support for centos 8, and there's a need a build containers based on centos 9. there's been some discussions around this in #53901 and elsewhere on slack. if we're starting from scratch here, it would be wonderful if we could start with centos 9 instead

@BlaineEXE
Copy link
Contributor Author

BlaineEXE commented Nov 27, 2023

the squid release will be dropping support for centos 8, and there's a need a build containers based on centos 9. there's been some discussions around this in #53901 and elsewhere on slack. if we're starting from scratch here, it would be wonderful if we could start with centos 9 instead

I agree that it would be nice, but I (or someone) would need to spend time making sure all the el9 locations are ready. Ideally quincy, reef, squid, and main all have el9 builds, and I'm not sure that's true today or not.

Can we at least get this merged and then convert to el9 later? This PR is clean and good to go from my end, but I'm waiting to hear back from @guits about checking the secret so that the quay login works.

#54575 (comment)

[update]

Just realized that this is probably not a configuration issue, but a security thing. GitHub doesn't allow secrets to be used from pull requests so that users can't "exfiltrate" secrets by making arbitrary PRs.
https://stackoverflow.com/questions/73866908/github-actions-main-repository-secret-not-picked-up-from-pull-request-build

This means that the GH action is good for builds from committed PRs, but not viable for building/publishing images related to pull requests. I guess the shaman system would be good for that, and it seems like something is already set up to build the quay.io/ceph-ci images like that. @dmick does that sound right?

@BlaineEXE BlaineEXE marked this pull request as ready for review November 27, 2023 21:49
@BlaineEXE BlaineEXE force-pushed the add-container-file branch 2 times, most recently from a98c31c to 3361853 Compare November 27, 2023 22:04
Copy link
Member

@ktdreyer ktdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not viable for building/publishing images related to pull requests

Yes, that's dangerous in general to run privileged code automatically on anyone's forked PR.

I think what you have here is fine and we should merge it.

@cbodley
Copy link
Contributor

cbodley commented Dec 8, 2023

this has an approval. does it need any further review/testing to merge? afaik nothing depends on this yet so there's little risk to merging

what are the next steps?

@cbodley cbodley mentioned this pull request Dec 8, 2023
2 tasks
@dmick
Copy link
Member

dmick commented Dec 8, 2023

"This means that the GH action is good for builds from committed PRs, but not viable for building/publishing images related to pull requests. I guess the shaman system would be good for that, and it seems like something is already set up to build the quay.io/ceph-ci images like that. @dmick does that sound right?"

I can't speak to the GH actions; I don't understand anything about the context they run in, or how one might possibly handle secrets safely. I know that things like those secrets are stored in Jenkins, which is what does the CI builds (shaman and chacra are just storage/db for the build results). I do want to point out however that we generally don't push CI images to quay.io, but rather our own quay.ceph.io (quay.io has usage limits and we do a lot of builds).

This is only an initial implementation. It is not expected to work for
any branch except for Ceph 'main'. The build action is also disabled for
any pull request except ones that modify the container build files
themselves to make sure that any issues do not cause confusion with
ongoing or future feature PRs.

This PR focuses on creating a good starting point for a "Containerfile"
definition, tooling for building multi-architecture images using it, and
pushing the image to a temporary repository: quay.io/ceph/ceph-staging.

There are many TODO items noted in the files that should be followed up
on in future PRs.

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
@BlaineEXE BlaineEXE force-pushed the add-container-file branch 2 times, most recently from 2d46335 to 9b5837b Compare December 15, 2023 17:16
RUN \
dnf install -y --setopt=install_weak_deps=False dnf-plugins-core && \
dnf copr enable -y tchaikov/python-scikit-learn && \
dnf copr enable -y tchaikov/python3-asyncssh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the tchaikov/python3-asyncssh copr here (and the corresponding mention in the comment above). The python-asyncssh package is in epel8 and epel9 now: https://src.fedoraproject.org/rpms/python-asyncssh

In fact the EPEL 9 version will have an important bug fix (rhbz#2255043 / CVE-2023-48795) soon.

wmfgerrit pushed a commit to wikimedia/operations-docker-images-production-images that referenced this pull request Mar 21, 2024
Use of cephadm requires a suitable container image to be available;
upstream's are based on centos. This is an attempt at a minimal set
for our intended purposes (i.e. no ganesha / mds etc.)

Based on upstream work in ceph/ceph#54575
which in turn is a move to simplify how upstream builds their
containers.

This is for the PoC ceph/reef multisite cluster, codenamed "apus".

Bug: T279621
Change-Id: I256acec29b5c50eff266690e15db066470038aa7
@dmick
Copy link
Member

dmick commented Mar 21, 2024

I've been working on github.com/dmick/ceph-buildah which uses the buildah utility to build the containers. The huge advantage is that it allows a series of shell commands to make incremental changes to a working container, and then commits the final result, so all the conditional logic can be coded in the toplevel shell command, which is a huge help at bug diagnosis. I think I very much want to go this direction rather than a Dockerfile for the revamp. I've been building containers of centos8 and centos9 x86 and arm and having some luck; I want to do that formally and run some formal container-based tests (probably the rbd iscsi tests to catch "is the iscsi inclusion good as well), and then I think I'd like to propose integrating that into ceph.git rather than this.

@ktdreyer
Copy link
Member

Red Hat's build system (OSBS) input is a single Dockerfile per container, and I think a Dockerfile is generally the input format for other build systems (like Tekton). I think a Dockerfile is going to be easier to maintain.

@ktdreyer
Copy link
Member

For debugging, I like to use the podman build --force-rm=false option to avoid deleting intermediate stages. (it leaves a lot behind, so podman image prune cleans up.)

@dmick
Copy link
Member

dmick commented Mar 28, 2024

oh well that might be a good argument for dockerfiles I guess. crap. Maybe modern dockerfiles are less painful to do this ridiculous handstanding in.


# Other labels are set automatically by container/build github action
# See: https://github.com/opencontainers/image-spec/blob/main/annotations.md
LABEL org.opencontainers.image.authors="Guillaume Abrioux <gabrioux@redhat.com>" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guits are you the maintainer of this image?

@phlogistonjohn
Copy link
Contributor

@dmick I was pointed at this conversation during the most recent infra meeting. I am also interested in the topic and am happy to discuss any challenges you are hitting wrt building images: esp. wrt "this ridiculous handstanding" might entail :-)

I also am a fan of buildah, but as Ken notes it's not so widespread and more niche than Dockerfiles.

@dmick
Copy link
Member

dmick commented Apr 4, 2024

Mostly it's "if somehow we end up with another 200-line continued bash statement I will descend upon the author with a mighty and terrible force" but I think multiple RUN statements and --squash are the way out of that horror. ARG and short conditional bash are probably maintainable/usable.

@phlogistonjohn
Copy link
Contributor

Mostly it's "if somehow we end up with another 200-line continued bash statement I will descend upon the author with a mighty and terrible force"

lol :-)

but I think multiple RUN statements and --squash are the way out of that horror. ARG and short conditional bash are probably maintainable/usable.

That seems fine. I also dislike having a bunch of logic on a single RUN line. If the logic is necessary I'll often tend to farm the work out to a real script outside the {Dock,Contain}erfile.

@teckert
Copy link

teckert commented May 6, 2024

In our project we've run into several problems around CentOS Stream, causing us to look for Debian based Ceph container images. As we could not find suitable ones, we are now looking at building them ourselves and would like to use the code from this PR as base instead of the one based on Make (I arrived here through issue 2171).

So I would like to ask what state this PR is in? I see some open questions / work-in-progress-notes while I do not find anything which, at least to me, looks like testing, e.g. compared with the previous target test.staging (though I only just started reading through that code, so no idea what it actually does/tests, yet).

@dmick
Copy link
Member

dmick commented May 6, 2024

More work is needed; I don't know how far this is from final form yet. But container building isn't particularly magic, and if you end up at a slightly different place than we do (besides the obvious 'based on Debian' thing, which will involve a fair bit of package name munging as I'm sure you know), as long as you have the dependencies you need it'll probably be fine. I have a separate repo I've been working in, dmick/container.git, that might or might not be a better starting place.

@@ -0,0 +1,213 @@
FROM quay.io/centos/centos:stream8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use :stream9 as CentOS 8 Stream is EOL at the end of this month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
9 participants