Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile cleanup: reduce image size 3x #1212

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sa7mon
Copy link

@sa7mon sa7mon commented Mar 15, 2024

Background

The Docker image produced by the master branch currently is large: 5.91 GB. Through some Docker optimizations this size can be reduced to 1/3 of that without losing any functionality.

Optimizations

1. Baseline

$ docker images
REPOSITORY                                         TAG                  IMAGE ID       CREATED         SIZE
docker.pkg.github.com/yogeshojha/rengine/rengine   latest               41abe57a3889   9 minutes ago   5.91GB

2. Combine install layers ( -290 MB )

-RUN wget https://golang.org/dl/go1.21.4.linux-amd64.tar.gz
-RUN tar -xvf go1.21.4.linux-amd64.tar.gz
-RUN rm go1.21.4.linux-amd64.tar.gz
-RUN mv go /usr/local
+RUN wget https://golang.org/dl/go1.21.4.linux-amd64.tar.gz && \
+    tar -xvf go1.21.4.linux-amd64.tar.gz go/bin/go --strip-components=2 && \
+    rm go1.21.4.linux-amd64.tar.gz && \
+    mv go /usr/local/bin/
 
 # Download geckodriver
-RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.32.0/geckodriver-v0.32.0-linux64.tar.gz
-RUN tar -xvf geckodriver-v0.32.0-linux64.tar.gz
-RUN rm geckodriver-v0.32.0-linux64.tar.gz
-RUN mv geckodriver /usr/bin
+RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.32.0/geckodriver-v0.32.0-linux64.tar.gz && \
+    tar -xvf geckodriver-v0.32.0-linux64.tar.gz -C /usr/bin/ && \
+    rm geckodriver-v0.32.0-linux64.tar.gz

Each RUN instruction in a Dockerfile will result in a new image layer so if you add files, then delete them on separate RUN lines, you don't free up any space with the deletion. Instead, add and delete the files in the same RUN instruction. More info about this is in the Docker docs here. The commands are split up into multiple lines for better readability.

$ docker images
REPOSITORY   TAG      IMAGE ID       CREATED          SIZE
rengine      latest   fca7bccdbf57   26 seconds ago   5.62GB

3. Don't cache pip packages ( -80 MB)

-    pip3 install -r /tmp/requirements.txt
+    pip3 install -r /tmp/requirements.txt --no-cache-dir

When pip installs Python packages, it caches install data locally to speed up future pip install calls. We can clear this cache.

$ docker images
REPOSITORY   TAG      IMAGE ID       CREATED              SIZE
rengine      latest   82bbc887dabe   About a minute ago   5.54GB

4. Delete go module cache ( -2 GB)

When go install calls are made, go caches all the module dependencies in /go/pkg/mod. We can clear this cache and save significant space.

root@9c870efaeab3:/usr/src/app# du -sh /go/pkg/*
2.0G    /go/pkg/mod
20K     /go/pkg/sumdb

Similar to the optimization in section 2, we need to install the modules and clear the cache in the same RUN instruction. To do this, we pipe printf into xargs to call go install on each module.

-RUN go install -v github.com/jaeles-project/gospider@latest
-RUN go install -v github.com/tomnomnom/gf@latest
...
-RUN go install -v github.com/dwisiswant0/crlfuzz/cmd/crlfuzz@latest
-RUN go install -v github.com/sa7mon/s3scanner@latest
+RUN printf "\
+    github.com/jaeles-project/gospider@latest\n\
+    github.com/tomnomnom/gf@latest\n\
...
+    github.com/dwisiswant0/crlfuzz/cmd/crlfuzz@latest\n\
+    github.com/sa7mon/s3scanner@latest\n" | \
+    xargs -L1 go install -v && \
+    rm -rf /go/pkg/*
$ docker images
REPOSITORY   TAG             IMAGE ID       CREATED          SIZE
rengine      latest          4d9dffd6e952   16 minutes ago   3.68GB

5. Omit go debug symbols and remove build cache ( -1.9 GB)

-    xargs -L1 go install -v && \
-    rm -rf /go/pkg/*
+    xargs -L1 go install -ldflags="-s -w" -v && \
+    rm -rf /go/pkg/* && rm -rf /root/.cache/go-build

When go install is called it's doing a go build on the source code it pulls down. We can squeeze an extra bit of space if we use the build flags to omit debug symbols. Go docs here.
Additionally, go caches build data in ~/.cache/go-build and in this case it's a significant amount of data - almost 2GB.

$ docker images
REPOSITORY   TAG             IMAGE ID       CREATED          SIZE
rengine      latest          1c239a350799   30 seconds ago   1.77GB

Other Changes

  • Removed extra apt update
  • Removed ProjectDiscovery -up calls. We just installed them 3 lines earlier in the Dockerfile - we know they are up-to-date.
  • Removed httpx alias. It works just fine without the alias.

@AnonymousWP
Copy link
Collaborator

Nice work! Has everything been tested?

@sa7mon
Copy link
Author

sa7mon commented Mar 16, 2024

@AnonymousWP I have verified that pip install and go install as well as all the installed tools work just fine.

@AnonymousWP AnonymousWP requested a review from psyray March 17, 2024 19:41
@yogeshojha
Copy link
Owner

Excellent Changes! 🚀

Reviewing this @sa7mon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants