refactor: Simplify `Dockerfile` #6320

polarathene · 2023-09-15T09:59:24Z

1. Why is this pull request needed and what does it do?

TL;DR:

Enforcing the NET_BIND_SERVICE capability for any user to bind privileged ports isn't required with Docker unless running with --network host?
You could have a much simpler Dockerfile and allow those who enforce dropping NET_BIND_SERVICE to still run the coredns binary on unprivileged ports which is more secure of a choice to support. setcap applying the capability with effective + permitted set is enforcing a requirement for privileges that aren't always necessary. You can still run as non-root without this.

Introduced in March 2023 to support non-root feature. It seems largely unnecessary though?

Everything removed was to support setcap mandating NET_BIND_SERVICE on the /coredns binary.
There is no need to COPY for ca-certificates.crt, the image base static-debian11:nonroot already has that (unless the BASE reference ARG is changed). I assume that the base image won't be changing much due to the expectation of USER nonroot:nonroot to be successful, thus value of overwriting the contents seems questionable?
- EDIT: Originally this feature was added when the scratch image was used for the final stage, it was more compatible then, but was not merged until the switch to nonroot image which added the USER directive.
- EDIT: The ca-certificates.crt is also from earlier as an improvement for the final stage with scratch of the image build. Thus no longer important.

#! /bin/bash

# Custom config:
cat > /tmp/Corefile <<EOF
.:5353 {
  whoami
}
EOF


# No problems binding with non-root to port >1024
# in container, and < 1024 on host:
docker run --rm \
  --sysctl 'net.ipv4.ip_unprivileged_port_start=1024' \
  --cap-drop NET_BIND_SERVICE \
  --user 1000:1000
  --workdir / --volume '/tmp/Corefile:/Corefile' \
  -p 53:5353/udp
  coredns/coredns:1.10.1

On a host network, or perhaps a different container run-time you may need to opt for the setcap approach? However due to the change in Docker 20.10.0 (Dec 2020), non-root users can already bind to ports below 1024:

# --cap-add NET_BIND_SERVICE is default capability,
# Dropping it has no impact due to Docker modifying sysctl:
$ docker run --rm --cap-drop NET_BIND_SERVICE coredns/coredns:1.10.1
$ docker run --rm --cap-drop NET_BIND_SERVICE --user 1000:1000 coredns/coredns:1.10.1

.:53
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3

Restore the sysctl restriction and the capability will have an effect when not applied:

# --cap-add NET_BIND_SERVICE is default capability, reset sysctl to 1024:
$ docker run --rm --sysctl 'net.ipv4.ip_unprivileged_port_start=1024'  coredns/coredns:1.10.1
.:53
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3


# Root (drop capability) and user (does not inherit?) - Both fail to bind now:
$ docker run --rm --sysctl 'net.ipv4.ip_unprivileged_port_start=1024' --cap-drop NET_BIND_SERVICE  coredns/coredns:1.10.1
$ docker run --rm --sysctl 'net.ipv4.ip_unprivileged_port_start=1024'  --user '1000:1000'  coredns/coredns:1.10.1
Listen: listen tcp :53: bind: permission denied

The only reason to use the setcap call in Dockerfile is to make the /coredns binary treated as a "capability-dumb binary":

# `setcap cap_net_bind_service=ep` via 1.11.1 tag,
# Capability requirement is enforced, sysctl has no relevance:
$ docker run --rm \
  --sysctl 'net.ipv4.ip_unprivileged_port_start=1024'  \
  --cap-add NET_BIND_SERVICE \
  coredns/coredns:1.11.1

.:53
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2

# Capability dropped, fails check and refuses to run binary:
$ docker run --rm \
  --sysctl 'net.ipv4.ip_unprivileged_port_start=1024' \
  --cap-drop NET_BIND_SERVICE \
  coredns/coredns:1.11.1

exec /coredns: operation not permitted


# Even root:
$ docker run --rm \
  --cap-drop NET_BIND_SERVICE \
  --user '0:0'
  coredns/coredns:1.11.1

exec /coredns: operation not permitted

Not a helpful error message to encounter.
Root user does not necessarily need to bind port lower than 1024 (and with Docker can regardless by default)
When the capability is dropped, not even the root user can bypass the restriction.
setcap is only in use to bypass the non-root restriction being respected (for any user in the container to leverage, with any port).

2. Which issues (if any) are related?

Last issue raised requesting non-root feature:

coredns should run as a non-root user #5968

Reference of unexpected problems from users affected by the non-root approach (which also changed default WORKDIR):

Earlier attempts to apply the change:

NOTE: PR #4528 at least included a comment providing better context of the capability intent, but it's not been necessary since Docker 20.10.0 (Dec 2020) which for Docker managed networks drops the <=1024 privileged ports.

3. Which documentation changes (if any) need to be made?

N/A?

4. Does this introduce a backward incompatible change or deprecation?

Reverts back to 1.10.1 Docker image expectation without mandatory capability enforced. The entire build stage is not needed, it's only purpose was to apply setcap to a pre-built coredns binary.
Has not reverted the expectation of 1.10.1 working directory (I could include that, should just be WORKDIR /).
The USER nonroot:nonroot directive shouldn't be necessary as that's already what the image should run with, I left it alone for the visibility. EXPOSE likewise is not necessary and is effectively documentation.

Everything else was to support `setcap` which should not be required to support running non-root.

chrisohaver · 2023-09-15T12:08:48Z

Please look into the failed ci/circleci kubernetes-test. On the surface, the change appears to have rendered coredns unresponsive to dns queries, but could of course be some flaw or inadequacy with the test.

polarathene · 2023-09-15T12:37:35Z

Please look into the failed ci/circleci kubernetes-test.

I don't have experience with k8s.

srv_test.go:52: Could not load corefile: timeout waiting for coredns to be ready. coredns log: Listen: listen tcp :53: bind: permission denied

I am familiar with the error though. I would say it's due to the non-root user binding below port 1024. This wouldn't normally be allowed on the host with CoreDNS would it? The binary isn't built/packaged with the capability enforced?

It is fine for Docker due to their internal bridge networks relaxing the port restriction.

My concern is from their comment in k8s by @vinayakankugoyal , where perhaps they relied on the container to bind to :53 internally for convenience and their network/runtime does not relax the privileged ports (which is what the capability would bypass).

It would seem that you can only keep one group happy 😅

For DNS I can understand wanting to favor binding to :53 and needing that ability in other contexts beyond Dockers internal bridge networks.
For those that bind to a > 1024 port internally and then publish that via :53, it's also warranted to not require NET_BIND_SERVICE capability in the image, since non-root user still works fine for that.

So I guess it's your choice as maintainer which one to favour. I don't have much input for k8s as I don't know how flexible that is with DNS configuration. Docker doesn't seem to allow changing the DNS port for containers with --dns for example. but is otherwise functional (like dig with -p which can choose the custom port to connect to).

SuperQ · 2023-09-15T12:44:37Z

This needs a DCO sign-off. You can use git commit -s --amend to add it.

polarathene · 2023-09-15T22:37:52Z

This needs a DCO sign-off.

I'm aware, but in light of the k8s test failure I'll wait for maintainer decision on which route the Docker image should cater to.

If setcap is kept, I'll just raise a new PR (or adjust this one if preferred) to handle the WORKDIR and remove the ca-certificates.crt copy.

chrisohaver · 2023-12-14T15:19:04Z

#5969 to support non-root feature. It seems largely unnecessary though?

Perhaps we should roll that back. Thoughts?

Simplify Dockerfile

522ea68

Everything else was to support `setcap` which should not be required to support running non-root.

polarathene requested review from bradbeam, chrisohaver, dilyevsky, jameshartig, greenpau, Isolus, johnbelamaric, miekg, pmoroney, rajansandeep, stp-ip, SuperQ, yongtang and Tantalor93 as code owners September 15, 2023 09:59

polarathene mentioned this pull request Sep 16, 2023

Docker working directory changed #6249

Open

This was referenced Dec 15, 2023

Run coredns as non root. #5969

Merged

Use a UID rather than "nonroot" in the Dockerfile #6416

Open

prezha mentioned this pull request Mar 17, 2024

add default sysctls to allow privileged ports with no capabilities kubernetes/minikube#18421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Simplify `Dockerfile` #6320

refactor: Simplify `Dockerfile` #6320

polarathene commented Sep 15, 2023 •

edited

chrisohaver commented Sep 15, 2023

polarathene commented Sep 15, 2023

SuperQ commented Sep 15, 2023

polarathene commented Sep 15, 2023

chrisohaver commented Dec 14, 2023

refactor: Simplify Dockerfile #6320

Are you sure you want to change the base?

refactor: Simplify Dockerfile #6320

Conversation

polarathene commented Sep 15, 2023 • edited

1. Why is this pull request needed and what does it do?

2. Which issues (if any) are related?

3. Which documentation changes (if any) need to be made?

4. Does this introduce a backward incompatible change or deprecation?

chrisohaver commented Sep 15, 2023

polarathene commented Sep 15, 2023

SuperQ commented Sep 15, 2023

polarathene commented Sep 15, 2023

chrisohaver commented Dec 14, 2023

refactor: Simplify `Dockerfile` #6320

refactor: Simplify `Dockerfile` #6320

polarathene commented Sep 15, 2023 •

edited