Best practices when building docker images

Carlos Morales • 2021-02-03

Dockerfile are very popular, they describe how to assemble docker images. They also have a great caching mechanism. After some years of working with them, this is my list of best practices I use.

I grouped them in the following categories:

Performance
Reproducibility
Security

1. Performance

1.1 Enable BuildKit to build

Docker images created with BuildKit are compatible with the legacy build process and get automatic improvements in performance. It is quite easy to enable, and the only drawback is only Linux containers are supported (as of today).

1.2 Incremental build time

The order of the lines in the dockerfile is very important. Each line is an incremental step that is cached. Move the most stable steps at the very top and the more changeable ones at the bottom. E.g. the step of copying local project files should be at the bottom.

1.3 Prevent unneeded changes

Be as specific as possible, if you copy a whole directory, any file change will bust the cache. Instead, if you define to copy one file, there are more chances the cache reuses that step.

BAD

COPY target/ /app

BETTER

COPY target/app.jar /app

If you still need to copy the whole directory, use .dockerignore file and list all the files and folders that will be ignored.

1.4 Identify cacheable units

If you use a package manager, use the whole process (update and install all dependencies) in one simple step. This step should be together.

# compile the NGINX binary in one step
RUN CONFIG="\
  --prefix=/etc/nginx \
  --sbin-path=/usr/sbin/nginx \
  " \
  && microdnf install gzip gcc make curl
  && cd /usr/src/nginx-$NGINX_VERSION \
  && ./configure $CONFIG \
  && make install \
  && microdnf remove patch gcc make  \
  && microdnf clean all

1.5 Reduce extra weight

When installing dependencies, install only the required ones. E.g. do not install debugging tools or not recommended tools. E.g. in apt-get install use the --no-install-recommends flag, or in npm install use the --production flag to skip devDependencies.

Use common base images for all applications, this will reduce the time spent on maintenance. On top of this, these images will share layers across multiple images when running in a cluster.

1.7 Use the correct official images

When depending on other publicly available base images, try to use minimal flavors in favor of bigger ones. Most of the time you do not need the whole set of tools included in the bigger ones. This is important not only to save disk space and transfer data but also we reduce the attack surface by reducing the number of binaries installed.

BAD

FROM node:14.16

GOOD

FROM node:14.16-slim

Just with this change, the image goes down from around 400MB to 70MB

This will translate to faster deployments (less transfer), and lower disk space requirements.

1.8 Use cache folders

Most package managers use a local folder to pull all artifacts. You could mount a folder, so all builds use this cached folder. This is done by adding the --mount=type=cache flag between the RUNand the commands to run.

Example of a Debian distribution package manager

RUN --mount=type=cache,target=/var/cache/apt \
   apt-get update &&
   apt-get install -yqq --no-install-recommends \
   # ... list of libraries to install

Some of the common folders used by different package managers for caching:

apt: /var/cache/apt/
npm: ~/.npm
maven: ~/.m2
pip: ~/.cache/pip
go: ~/.cache/go-build
go-modules: $GOPATH/pkg/mod

2. Reproducibility

2.1 Specify dependencies

We want to have a reproducible build process. When defining dependencies, never omit the specific version or use the latest tag, this specially applies to the FROM base image. If you are not specific on the version (as latest version changes over time), the newer builds refer to newer versions. In other words, your build will differ from the past and these new dependencies may break your build or (worst) the application at runtime.

2.2 Create the build environment with a multi-stage:

Dockerfile files describe the build process, you could use Docker to create the build environment too. Because we do not want to release in production all development tools (required to create the build environment), we could use multi-stage builds.

Example of a multi-stage build for a Java based application:

FROM maven:3.6-jdk-8-alpine AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn -e -B dependency:resolve
COPY src ./src
RUN mvn -e -B package

FROM openjdk:8-jre-alpine
COPY --from=builder /app/target/app.jar /app
CMD ["java", "-jar", "/app/app.jar"]

In the previous example,

the first stage created a maven environment and built a Java jar file.
The second stage uses the file created in the previous stage.
The resulting docker image built in the second stage is the one that goes into production.

You could follow the same pattern to create different build steps: linting, running unit tests, integration tests, release artifacts, etc. In case some of these builds fail, the whole build process will fail. BuildKit allows parallelizing these stages.

More info on Using multi-stage builds and Advanced Dockerfiles: Faster Builds and Smaller Images Using BuildKit and Multistage Builds.

3. Security

3.1 Secrets

Our application may use some values that must be kept secret: private keys, database passwords, etc. Never add those secrets in the Dockerfile, you do not want anybody to read them in the repository. It is not OK neither to pass them as arguments in the Docker build process, you can easily find these values with docker or bash history.

The solution for adding those secrets heavily depends on the infrastructure, all cloud providers offer a way to manage secrets: Kubernetes secrets, Secret Manager in Google Cloud, AWS Secrets Manager, etc.

3.2 Sign images

You may use a Docker image in production that an attacker tampered with, adding some malicious binaries. To prevent this, you must sign all images you create and verify their signatures at runtime.

More info at Signing Images with Docker Content Trust

3.3 Use trusted base images

Always use trusted official base images. And as previously stated, if you do not require any specific utilities chose a smaller image, this reduces the attack surface. A popular choice is to use leaner and smaller OS distributions like Alpine (security-oriented distro) or slim distros.

If security is a serious concern, you may need to build from scratch those base images, compiling the dependencies from the source code. This is the practice we use at my company.

3.4 Specify the user

By default, the user used when executing a Docker image is root and it will run in privileges. This is especially dangerous, as it could attack the host, not only the container itself. Always specify a dedicated user and group in the dockerfile.

This risk is mitigated when running in Kubernetes clusters, as you could specify at the cluster level what is the user running the container, independently of what was specified in the dockerfile.

3.5 Scan for vulnerabilities

Multiple services scan vulnerabilities. Your Docker image may include vulnerable dependencies you should not include. As modern distributed applications include many dependencies, it is impossible to manage them manually. Automate the process with one of those services.

More info at Vulnerability scanning for Docker local images or Synk guide on Docker Security Scanning.

Conclusions

The most important takeaways:

Docker images are built on top a layer mechanism, leverage it: narrow them down, cache and reuse them.
Docker files are very descriptive, focus on reproducible processes.
Security is built into the system, not as something external. Consider security topics when building Docker images.

For more updated information on best practices, follow this link to official Docker documentation on best practices.