Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Simplify Dockerfile #6320

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

polarathene
Copy link

@polarathene polarathene commented Sep 15, 2023

1. Why is this pull request needed and what does it do?

TL;DR:

  • Enforcing the NET_BIND_SERVICE capability for any user to bind privileged ports isn't required with Docker unless running with --network host?
  • You could have a much simpler Dockerfile and allow those who enforce dropping NET_BIND_SERVICE to still run the coredns binary on unprivileged ports which is more secure of a choice to support. setcap applying the capability with effective + permitted set is enforcing a requirement for privileges that aren't always necessary. You can still run as non-root without this.

Introduced in March 2023 to support non-root feature. It seems largely unnecessary though?

#! /bin/bash

# Custom config:
cat > /tmp/Corefile <<EOF
.:5353 {
  whoami
}
EOF


# No problems binding with non-root to port >1024
# in container, and < 1024 on host:
docker run --rm \
  --sysctl 'net.ipv4.ip_unprivileged_port_start=1024' \
  --cap-drop NET_BIND_SERVICE \
  --user 1000:1000
  --workdir / --volume '/tmp/Corefile:/Corefile' \
  -p 53:5353/udp
  coredns/coredns:1.10.1

On a host network, or perhaps a different container run-time you may need to opt for the setcap approach? However due to the change in Docker 20.10.0 (Dec 2020), non-root users can already bind to ports below 1024:

# --cap-add NET_BIND_SERVICE is default capability,
# Dropping it has no impact due to Docker modifying sysctl:
$ docker run --rm --cap-drop NET_BIND_SERVICE coredns/coredns:1.10.1
$ docker run --rm --cap-drop NET_BIND_SERVICE --user 1000:1000 coredns/coredns:1.10.1

.:53
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3

Restore the sysctl restriction and the capability will have an effect when not applied:

# --cap-add NET_BIND_SERVICE is default capability, reset sysctl to 1024:
$ docker run --rm --sysctl 'net.ipv4.ip_unprivileged_port_start=1024'  coredns/coredns:1.10.1
.:53
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3


# Root (drop capability) and user (does not inherit?) - Both fail to bind now:
$ docker run --rm --sysctl 'net.ipv4.ip_unprivileged_port_start=1024' --cap-drop NET_BIND_SERVICE  coredns/coredns:1.10.1
$ docker run --rm --sysctl 'net.ipv4.ip_unprivileged_port_start=1024'  --user '1000:1000'  coredns/coredns:1.10.1
Listen: listen tcp :53: bind: permission denied

The only reason to use the setcap call in Dockerfile is to make the /coredns binary treated as a "capability-dumb binary":

# `setcap cap_net_bind_service=ep` via 1.11.1 tag,
# Capability requirement is enforced, sysctl has no relevance:
$ docker run --rm \
  --sysctl 'net.ipv4.ip_unprivileged_port_start=1024'  \
  --cap-add NET_BIND_SERVICE \
  coredns/coredns:1.11.1

.:53
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2

# Capability dropped, fails check and refuses to run binary:
$ docker run --rm \
  --sysctl 'net.ipv4.ip_unprivileged_port_start=1024' \
  --cap-drop NET_BIND_SERVICE \
  coredns/coredns:1.11.1

exec /coredns: operation not permitted


# Even root:
$ docker run --rm \
  --cap-drop NET_BIND_SERVICE \
  --user '0:0'
  coredns/coredns:1.11.1

exec /coredns: operation not permitted
  • Not a helpful error message to encounter.
  • Root user does not necessarily need to bind port lower than 1024 (and with Docker can regardless by default)
  • When the capability is dropped, not even the root user can bypass the restriction.
  • setcap is only in use to bypass the non-root restriction being respected (for any user in the container to leverage, with any port).

2. Which issues (if any) are related?

Last issue raised requesting non-root feature:

Reference of unexpected problems from users affected by the non-root approach (which also changed default WORKDIR):

Earlier attempts to apply the change:

NOTE: PR #4528 at least included a comment providing better context of the capability intent, but it's not been necessary since Docker 20.10.0 (Dec 2020) which for Docker managed networks drops the <=1024 privileged ports.

3. Which documentation changes (if any) need to be made?

N/A?

4. Does this introduce a backward incompatible change or deprecation?

  • Reverts back to 1.10.1 Docker image expectation without mandatory capability enforced. The entire build stage is not needed, it's only purpose was to apply setcap to a pre-built coredns binary.
  • Has not reverted the expectation of 1.10.1 working directory (I could include that, should just be WORKDIR /).
  • The USER nonroot:nonroot directive shouldn't be necessary as that's already what the image should run with, I left it alone for the visibility. EXPOSE likewise is not necessary and is effectively documentation.

Everything else was to support `setcap` which should not be required to support running non-root.
@chrisohaver
Copy link
Member

Please look into the failed ci/circleci kubernetes-test. On the surface, the change appears to have rendered coredns unresponsive to dns queries, but could of course be some flaw or inadequacy with the test.

@polarathene
Copy link
Author

Please look into the failed ci/circleci kubernetes-test.

I don't have experience with k8s.

srv_test.go:52: Could not load corefile: timeout waiting for coredns to be ready. coredns log: Listen: listen tcp :53: bind: permission denied

I am familiar with the error though. I would say it's due to the non-root user binding below port 1024. This wouldn't normally be allowed on the host with CoreDNS would it? The binary isn't built/packaged with the capability enforced?

It is fine for Docker due to their internal bridge networks relaxing the port restriction.


My concern is from their comment in k8s by @vinayakankugoyal , where perhaps they relied on the container to bind to :53 internally for convenience and their network/runtime does not relax the privileged ports (which is what the capability would bypass).

It would seem that you can only keep one group happy 😅

  • For DNS I can understand wanting to favor binding to :53 and needing that ability in other contexts beyond Dockers internal bridge networks.
  • For those that bind to a > 1024 port internally and then publish that via :53, it's also warranted to not require NET_BIND_SERVICE capability in the image, since non-root user still works fine for that.

So I guess it's your choice as maintainer which one to favour. I don't have much input for k8s as I don't know how flexible that is with DNS configuration. Docker doesn't seem to allow changing the DNS port for containers with --dns for example. but is otherwise functional (like dig with -p which can choose the custom port to connect to).

@SuperQ
Copy link
Collaborator

SuperQ commented Sep 15, 2023

This needs a DCO sign-off. You can use git commit -s --amend to add it.

@polarathene
Copy link
Author

This needs a DCO sign-off.

I'm aware, but in light of the k8s test failure I'll wait for maintainer decision on which route the Docker image should cater to.

If setcap is kept, I'll just raise a new PR (or adjust this one if preferred) to handle the WORKDIR and remove the ca-certificates.crt copy.

@chrisohaver
Copy link
Member

#5969 to support non-root feature. It seems largely unnecessary though?

Perhaps we should roll that back. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants