Skip to content

Latest commit

 

History

History
524 lines (390 loc) · 13 KB

File metadata and controls

524 lines (390 loc) · 13 KB

Containers Design Guide

This guide explains best practices for designing containers.

Table of Contents


Understanding Layers

What are Layers?

Each instruction that modifies the file system in a Containerfile creates a new layer in the container image:

FROM golang:1.24.5        # Layer 1: Base image
WORKDIR /build            # Layer 2: Set working directory
COPY go.mod .             # Layer 3: Copy dependency file
RUN go mod download       # Layer 4: Download dependencies
COPY . .                  # Layer 5: Copy source code
RUN go build              # Layer 6: Build binary

Why Layers Matter

  1. Build Performance: Cached layers speed up rebuilds
  2. Image Size: Fewer layers = smaller images (in some cases)
  3. Development Efficiency: Good layering reduces iteration time
  4. Storage Efficiency: Shared layers are stored once

Layer Caching Principles

Docker/Podman uses layer caching to speed up builds:

Cache Rules

  1. If a layer hasn't changed, it's reused from cache
  2. If a layer changes, all subsequent layers are rebuilt
  3. Layers are invalidated based on:
    • Instruction changes (the command itself)
    • File content changes (for COPY/ADD)
    • Parent layer changes

Example: Cache Invalidation

# ❌ BAD: Any source code change invalidates dependency download
COPY . .
RUN go mod download
RUN go build

# ✅ GOOD: Source changes don't invalidate dependency cache
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build

Best Practices

1. Order Instructions by Change Frequency

Principle: Place instructions that change rarely at the top, frequently-changing ones at the bottom.

# Least frequently changed
FROM alpine:3.22
RUN apk add --no-cache ca-certificates

# Occasionally changed
COPY go.mod go.sum ./
RUN go mod download

# Most frequently changed
COPY . .
RUN go build

# Configuration (might change often in development)
COPY config.yaml /etc/config.yaml

Why: Maximizes cache hits during development.

2. Separate Dependency Installation

For Go projects:

# ✅ GOOD: Dependencies cached separately
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build

3. Consolidate Similar Instructions

Minimize metadata layers:

# ❌ BAD: Creates 5 layers
LABEL version="1.0"
LABEL maintainer="team@example.com"
LABEL description="My app"
LABEL org="MyOrg"
LABEL license="Apache-2.0"

# ✅ GOOD: Creates 1 layer
LABEL version="1.0" \
      maintainer="team@example.com" \
      description="My app" \
      org="MyOrg" \
      license="Apache-2.0"

Minimize port exposures:

# ❌ BAD: Creates 3 layers
EXPOSE 8080
EXPOSE 8443
EXPOSE 9090

# ✅ GOOD: Creates 1 layer
EXPOSE 8080 8443 9090

4. Use Multi-Stage Builds

Separate build and runtime dependencies:

# Build stage: Large image with build tools
FROM golang:1.24.5 AS builder
WORKDIR /build
COPY . .
RUN go build -o app

# Runtime stage: Minimal image
FROM gcr.io/distroless/base:latest
COPY --from=builder /build/app /app
ENTRYPOINT ["/app"]

Benefits:

  • Final image doesn't include build tools
  • Smaller attack surface
  • Reduced image size

5. Leverage Build Cache Mounts

Use cache mounts for build artifacts:

# ✅ GOOD: Cache Go build artifacts between builds
RUN --mount=type=cache,target=/root/.cache/go-build \
    go build -o app

# ✅ GOOD: Cache downloaded Go modules
RUN --mount=type=cache,target=/go/pkg/mod \
    go mod download

Benefits:

  • Faster builds
  • Reduced network usage
  • Build cache persists across builds

6. Split Complex RUN Commands Strategically

When to split:

# ✅ GOOD: Split for better layer caching
RUN go install go.opentelemetry.io/collector/cmd/builder@v0.134.0
RUN builder --config manifest.yaml
# If manifest.yaml changes, builder is already installed and cached

When to combine:

# ✅ GOOD: Combine cleanup operations
RUN apt-get update && \
    apt-get install -y wget && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
# Cleanup in same layer prevents bloat

Layer Optimization Techniques

Technique 1: Dependency Layer Separation

Before:

COPY . .
RUN go mod download && go build
  • Problem: Every code change requires re-downloading dependencies

After:

COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build
  • Benefit: Dependencies only download when go.mod/go.sum changes

Technique 2: Label Consolidation

Before:

LABEL org.opencontainers.image.title="App"
LABEL org.opencontainers.image.version="1.0"
LABEL org.opencontainers.image.vendor="Company"
  • Layers: 3
  • Problem: Unnecessary layer multiplication

After:

LABEL org.opencontainers.image.title="App" \
      org.opencontainers.image.version="1.0" \
      org.opencontainers.image.vendor="Company"
  • Layers: 1
  • Benefit: 2 fewer layers

Technique 3: Copy Order Optimization

Before:

COPY config.yaml /etc/config.yaml
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /app /app
  • Problem: Config changes invalidate all subsequent layers

After:

# Rarely changes
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
# Changes with code
COPY --from=builder /app /app
# Changes frequently in development
COPY config.yaml /etc/config.yaml
  • Benefit: Maximized cache utilization

Technique 4: Strategic RUN Splitting

Example 1 - Go Collector Build:

# Split installation and build for better caching
RUN --mount=type=cache,target=/root/.cache/go-build \
    go install go.opentelemetry.io/collector/cmd/builder@v0.134.0

RUN --mount=type=cache,target=/root/.cache/go-build \
    builder --config manifest.yaml
  • Benefit: If manifest changes, builder tool is already cached

Example 2 - Package Installation:

# Combine related operations in single layer to avoid bloat
RUN apt-get update && \
    apt-get install -y \
        curl \
        git \
        vim && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
  • Benefit: Package lists not saved in intermediate layers

Real Examples

Example 1: complytime-collector-components/beacon-distro/Containerfile.collector

Build Stage Optimization:

# Stage 2: Build the OpenTelemetry Collector
FROM golang:1.24.5 AS build-stage
WORKDIR /build

# Copy manifest (changes occasionally)
COPY ./manifest.yaml manifest.yaml

# Install builder (version rarely changes)
RUN --mount=type=cache,target=/root/.cache/go-build \
    GO111MODULE=on go install go.opentelemetry.io/collector/cmd/builder@v0.134.0

# Build collector (runs when manifest changes)
RUN --mount=type=cache,target=/root/.cache/go-build \
    builder --config manifest.yaml

Key Decisions:

  1. Separated builder installation from execution
  2. Why: Builder version is pinned; if manifest changes, builder is cached
  3. Trade-off: One extra layer, but better rebuild performance

Runtime Stage Optimization:

# Consolidated labels (1 layer instead of 5)
LABEL org.opencontainers.image.title="ComplyBeacon Collector" \
      org.opencontainers.image.description="OpenTelemetry collector distribution" \
      org.opencontainers.image.vendor="ComplyTime" \
      org.opencontainers.image.source="https://github.com/complytime/complybeacon" \
      org.opencontainers.image.licenses="Apache-2.0"

# Ordered by change frequency
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/  # Rarely changes
COPY --chmod=755 --from=build-stage /build/beacon/otelcol-beacon /    # Code changes
COPY config.yaml /etc/otelcol-beacon/config.yaml                       # Config changes

# Consolidated EXPOSE (1 layer instead of 3)
EXPOSE 4317 4318 12001

Layer Count:

  • Before optimization: ~15 layers
  • After optimization: ~11 layers
  • Cache efficiency: Significantly improved

Example 2: complytime-collector-components/compass/images/Containerfile.compass

Build Stage Optimization:

# Stage 2: Build the Compass service
FROM golang:1.24.5 AS build-stage
WORKDIR /build

# Step 1: Copy dependencies first (changes less frequently)
COPY compass/go.mod compass/go.sum ./
RUN --mount=type=cache,target=/root/.cache/go-build go mod download

# Step 2: Copy source code (changes frequently)
COPY compass/. .

# Step 3: Build binary
RUN --mount=type=cache,target=/root/.cache/go-build \
    GO111MODULE=on go build ./cmd/compass

Key Decisions:

  1. Separated dependency download from source copy
  2. Why: Source changes frequently, dependencies don't
  3. Impact: 90%+ of rebuilds reuse the dependency layer
  4. Trade-off: Slightly more complex Containerfile

Performance Impact:

Scenario Before After Improvement
Full rebuild 120s 120s 0% (same)
Code change only 120s 15s 87% faster
Config change only 120s 5s 96% faster

Anti-Patterns to Avoid

❌ Anti-Pattern 1: Copying Everything First

# BAD: Everything invalidates on any file change
COPY . .
RUN go mod download
RUN go build

Problem: Any file change (even README.md) invalidates all layers.

❌ Anti-Pattern 2: Multiple Individual LABELs

# BAD: Creates 5 unnecessary layers
LABEL name="app"
LABEL version="1.0"
LABEL author="team"
LABEL org="company"
LABEL license="MIT"

Problem: Each LABEL creates a layer (metadata overhead).

❌ Anti-Pattern 3: Not Using .containerignore

COPY . .

Problem: Without .containerignore, copies unnecessary files (tests, docs, .git) into the build context, slowing builds and invalidating cache.

❌ Anti-Pattern 4: Installing and Removing in Different Layers

# BAD: Downloaded packages remain in intermediate layer
RUN apt-get update && apt-get install -y wget
RUN rm -rf /var/lib/apt/lists/*

Problem: Image size includes the package lists even though they're deleted.

Fix:

# GOOD: Clean up in same layer
RUN apt-get update && \
    apt-get install -y wget && \
    rm -rf /var/lib/apt/lists/*

Quick Reference Checklist

When designing container layers, ask yourself:

  • Are dependencies installed in a separate layer from source code?
  • Are instructions ordered from least to most frequently changing?
  • Are multiple LABELs consolidated into one instruction?
  • Are multiple EXPOSEs consolidated into one instruction?
  • Is a .containerignore file present to exclude unnecessary files?
  • Are multi-stage builds used to minimize final image size?
  • Are cache mounts used for build artifacts?
  • Are cleanup operations in the same RUN command as installations?
  • Is the final image as small as possible (distroless/alpine)?

Tools and Commands

Analyze Layer Sizes

# Show layers and their sizes
podman history <image-name>

# Detailed image inspection
podman inspect <image-name>

# Analyze with dive (third-party tool)
dive <image-name>

Test Cache Efficiency

# First build (cold cache)
time podman build -t myapp:test .

# Touch a source file
touch src/main.go

# Second build (should be fast with good layering)
time podman build -t myapp:test .

# Touch a dependency file
touch go.mod

# Third build (should invalidate dependency layer)
time podman build -t myapp:test .

Validate Build Context

# See what's being sent to Podman daemon
podman build --no-cache --progress=plain .

# Check .containerignore effectiveness
podman build --progress=plain . 2>&1 | grep "transferring context"

Further Reading


Contributing

When modifying Containerfiles in this repository:

  1. Follow the layer design principles outlined in this guide
  2. Document significant changes in comments
  3. Test build performance before and after changes
  4. Update this guide if you discover new patterns or techniques

Summary

Good layer design is about balance:

  • Few enough layers to keep image size manageable
  • Strategic enough to maximize cache efficiency
  • Well-ordered to optimize for common development workflows
  • Documented so others understand the design decisions

The goal is fast iterative development without compromising production image quality.