Skip to content

Dockerfile Deep Dive

This chapter will deeply explain Dockerfile syntax, instructions, and best practices, helping you master how to write efficient and secure Dockerfiles to build custom images.

Dockerfile Basics

What is a Dockerfile?

A Dockerfile is a text file containing a series of instructions used to automate the building of Docker images. Each instruction creates a new layer in the image.

Basic Structure of Dockerfile

dockerfile
# Comment
FROM base_image:tag
LABEL maintainer="your-email@example.com"
RUN command
COPY source destination
WORKDIR /app
EXPOSE 8080
CMD ["executable", "param1", "param2"]

Build Context

The build context is the set of files and directories that the docker build command sends to the Docker daemon:

bash
# Current directory as build context
docker build -t myapp:v1.0 .

# Specify build context
docker build -t myapp:v1.0 /path/to/context

# Build from Git repository
docker build -t myapp:v1.0 https://github.com/user/repo.git

Dockerfile Instructions Detailed

FROM - Base Image

dockerfile
# Basic usage
FROM ubuntu:20.04

# Use multi-stage builds
FROM node:16 AS builder
FROM nginx:alpine AS runtime

# Use ARG variable
ARG BASE_IMAGE=node:16
FROM ${BASE_IMAGE}

# Specify platform
FROM --platform=linux/amd64 node:16

Best Practices:

  • Use specific tags instead of latest
  • Prefer official images
  • Use lightweight base images (like Alpine)

RUN - Execute Commands

dockerfile
# Shell form (recommended for complex commands)
RUN apt-get update && apt-get install -y \
    curl \
    vim \
    && rm -rf /var/lib/apt/lists/*

# Exec form (recommended for simple commands)
RUN ["apt-get", "update"]

# Multi-line commands
RUN apt-get update \
    && apt-get install -y curl \
    && curl -sL https://deb.nodesource.com/setup_16.x | bash - \
    && apt-get install -y nodejs \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Use heredoc (Docker 20.10+)
RUN <<EOF
apt-get update
apt-get install -y curl
apt-get clean
rm -rf /var/lib/apt/lists/*
EOF

Best Practices:

  • Combine multiple RUN instructions to reduce layers
  • Clean up cache and temporary files in the same layer
  • Use && to connect commands to ensure stop on failure

COPY and ADD - Copy Files

dockerfile
# COPY basic usage
COPY app.js /usr/src/app/
COPY . /usr/src/app/

# COPY with ownership
COPY --chown=node:node . /usr/src/app/

# ADD (can extract compressed files)
ADD app.tar.gz /usr/src/app/
ADD https://example.com/file.tar.gz /tmp/

Key Differences:

  • COPY only copies local files
  • ADD can extract compressed files and download from URLs
  • Prefer COPY unless you need ADD's special features

WORKDIR - Working Directory

dockerfile
# Set working directory
WORKDIR /usr/src/app

# Use with environment variables
ENV APP_HOME=/usr/src/app
WORKDIR $APP_HOME

# Can be used multiple times
WORKDIR /usr/src/app
WORKDIR test
RUN npm test

ENV - Environment Variables

dockerfile
# Set environment variables
ENV NODE_ENV=production
ENV PORT=3000
ENV APP_VERSION=1.0.0

# Set multiple variables
ENV NODE_ENV=production \
    PORT=3000 \
    HOST=0.0.0.0

# Use variables in other instructions
RUN echo "Building for $NODE_ENV"

EXPOSE - Expose Ports

dockerfile
# Expose port
EXPOSE 8080

# Expose multiple ports
EXPOSE 80 443

# Specify protocol
EXPOSE 80/tcp
EXPOSE 53/udp

CMD and ENTRYPOINT - Container Startup Commands

dockerfile
# CMD (can be overridden)
CMD ["nginx", "-g", "daemon off;"]
CMD ["node", "app.js"]
CMD echo "Hello World"

# ENTRYPOINT (not easily overridden)
ENTRYPOINT ["docker-entrypoint.sh"]
ENTRYPOINT ["node", "app.js"]

# Combining ENTRYPOINT and CMD
ENTRYPOINT ["node"]
CMD ["app.js"]

Key Differences:

  • CMD provides default command that can be overridden
  • ENTRYPOINT sets main command that is not easily overridden
  • When both exist, CMD becomes parameters to ENTRYPOINT

USER - Set User

dockerfile
# Set user by name
USER node

# Set user by ID
USER 1000:1000

# Create user first
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser

VOLUME - Data Volumes

dockerfile
# Create data volume
VOLUME ["/data"]

# Create multiple volumes
VOLUME ["/data", "/logs"]

# Specify volume with mount point
VOLUME /var/lib/postgresql/data

ARG - Build Arguments

dockerfile
# Define build argument
ARG VERSION=latest
ARG BUILD_NUMBER

# Use in FROM instruction
ARG BASE_IMAGE=ubuntu:20.04
FROM ${BASE_IMAGE}

# Use in other instructions
ARG VERSION
RUN echo "Building version ${VERSION}"

# Set default value
ARG TARGETPLATFORM=linux/amd64

LABEL - Metadata

dockerfile
# Single label
LABEL maintainer="developer@example.com"

# Multiple labels
LABEL version="1.0" \
      description="My application" \
      vendor="My Company"

HEALTHCHECK - Health Monitoring

dockerfile
# Basic health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost/ || exit 1

# Using built-in command
HEALTHCHECK --interval=1m --timeout=3s \
  CMD pg_isready -U postgres || exit 1

# Disable health check
HEALTHCHECK NONE

Multi-stage Builds

Basic Multi-stage Build

dockerfile
# Build stage
FROM node:16-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Production stage
FROM node:16-alpine AS production
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./package.json
EXPOSE 3000
CMD ["node", "dist/app.js"]

Advanced Multi-stage Build

dockerfile
# Dependencies stage
FROM node:16-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# Build stage
FROM node:16-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

# Production stage
FROM node:16-alpine AS runtime
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=deps --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/package.json ./package.json
USER appuser
EXPOSE 3000
CMD ["node", "dist/app.js"]

Build Optimization

Layer Caching

dockerfile
# Good: Copy only package files first
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Bad: Copy everything at once (breaks cache)
COPY . .
RUN npm ci
RUN npm run build

Reduce Image Size

dockerfile
# Use Alpine base image
FROM node:16-alpine

# Remove unnecessary packages in same layer
RUN apt-get update && apt-get install -y \
    build-essential \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Use .dockerignore to exclude files

.dockerignore File

dockerignore
# Exclude node_modules
node_modules

# Exclude git files
.git
.gitignore

# Exclude logs
*.log
logs/

# Exclude temp files
tmp/
*.tmp

# Exclude development files
.env.local
.env.development

Security Best Practices

Non-root User

dockerfile
FROM node:16-alpine

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Set ownership
WORKDIR /app
COPY --chown=appuser:appgroup . .

# Switch to non-root user
USER appuser

Minimal Permissions

dockerfile
# Use minimal base image
FROM alpine:3.14

# Install only necessary packages
RUN apk add --no-cache \
    ca-certificates \
    && update-ca-certificates

# Remove package manager
RUN apk del --purge apk-tools

Secrets Management

dockerfile
# Use secrets (Docker 18.09+)
# docker build --secret id=mysecret,src=/local/secret .
FROM alpine
RUN --mount=type=secret,id=mysecret \
    cat /run/secrets/mysecret

# Use build-time arguments for non-sensitive data
ARG APP_VERSION
ENV APP_VERSION=${APP_VERSION}

Build Optimization Commands

bash
# Build with no cache
docker build --no-cache -t myapp .

# Build with specific platform
docker build --platform linux/amd64 -t myapp .

# Build with build arguments
docker build --build-arg VERSION=1.0 -t myapp .

# Use BuildKit (Docker 18.09+)
DOCKER_BUILDKIT=1 docker build -t myapp .

# Parallel builds
docker build --parallel -t myapp .

Common Patterns

Node.js Application

dockerfile
FROM node:16-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

FROM node:16-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

FROM node:16-alpine AS runtime
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=deps --chown=appuser:appgroup /app/node_modules ./node_modules
USER appuser
EXPOSE 3000
CMD ["node", "dist/app.js"]

Python Application

dockerfile
FROM python:3.9-slim AS base
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

FROM base AS deps
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM base AS runtime
COPY --from=deps /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
COPY --from=deps /usr/local/bin /usr/local/bin
COPY . .
EXPOSE 8000
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:app"]

Chapter Summary

This chapter deeply explained Dockerfile:

Key Points:

  • Basic Instructions: FROM, RUN, COPY, WORKDIR, etc.
  • Build Optimization: Layer caching, multi-stage builds
  • Security Best Practices: Non-root users, minimal base images
  • Advanced Features: Build arguments, health checks
  • Common Patterns: Application-specific Dockerfiles

Best Practices:

  • Use specific image tags
  • Implement multi-stage builds
  • Optimize layer caching
  • Use .dockerignore
  • Run as non-root user
  • Keep images small and secure

In the next chapter, we will learn about image management best practices and optimization techniques.

Further Reading

Content is for learning and research only.