Which package is bloating your Docker image?

Published: (May 25, 2026 at 07:24 AM EDT)
4 min read
Source: Dev.to

Source: Dev.to

layer‑blame: git blame for image layers

layer‑blame attributes every byte in a Docker image to the package that introduced it.

Example: Alpine

docker save alpine:3.20 -o alpine.tar
layer-blame alpine.tar
Image total: 8.4 MB across 1 layers  ·  package attribution: 100%

Layer 0  8.4 MB  ADD alpine-minirootfs-3.20.10-aarch64.tar.gz /
      4.8 MB  libcrypto3                      pkg   ← largest line highlighted
    911.7 KB  libssl3                         pkg
    906.0 KB  busybox                         pkg
    706.5 KB  musl                            pkg
    327.1 KB  apk-tools                       pkg

Every byte in the layer now has an owner; libcrypto3 accounts for 4.8 MB of the 8.4 MB.

Comparison with other tools

ToolWhat it shows
docker historyLayer sizes, but not contents
diveInteractive file‑by‑file view of a layer
layer‑blamePackage‑level attribution of every byte (non‑interactive report)

dive is great for browsing; layer‑blame is great for attribution. Use them together.

Example: python:3.12‑slim

docker save python:3.12-slim -o py.tar
layer-blame --top 5 py.tar
Image total: 137.7 MB across 4 layers  ·  package attribution: 69%

Layer 0  95.8 MB  # debian.sh --arch 'arm64' out/ 'trixie' ...
     22.5 MB  libc6                           pkg
      9.1 MB  coreutils                       pkg
      7.4 MB  perl-base                       pkg
      7.3 MB  libssl3t64                      pkg
      6.6 MB  util-linux                      pkg

Layer 2  38.2 MB  RUN /bin/sh -c set -eux; savedAptMark="$(apt-mark showmanual)"; …
      6.3 MB  /usr/local/lib/libpython3.12.so.1.0                          file
      1.8 MB  /usr/local/lib/python3.12/ensurepip/_bundled/pip-...whl       file
      1.1 MB  /usr/local/lib/python3.12/lib-dynload/unicodedata...so        file

In Layer 2 the biggest contributors are reported as files because the image builds Python from source, leaving those bytes unowned by any dpkg package. The “package attribution: 69%” header tells you that 69 % of the image maps cleanly to packages; the rest deserves a closer look.

Usage

docker save -o image.tar
layer-blame [flags] image.tar

No Docker daemon is requiredlayer‑blame parses the saved tarball (or a plain OCI layout) directly, making it CI‑friendly.

Flags

FlagDefaultMeaning
--top N5Show the top N contributors per layer
--no-colorfalseDisable ANSI color (also respects NO_COLOR and non‑TTY output)
--versionfalsePrint version, commit, build date

How it works (five deterministic steps)

  1. Load the tarball / OCI layout via go-containerregistry, normalizing Docker‑save, BuildKit, and containerd/OCI formats.
  2. Walk each layer’s filesystem diff, recording every added file and its size. Whiteout (deletion) markers are ignored.
  3. Build a file→package index from the image’s own package databases:
    • Alpine – /lib/apk/db/installed
    • Debian/Ubuntu – /var/lib/dpkg/info/*.list
  4. Group added bytes by owning package. Files with no owner are reported individually, ensuring large unattributed artifacts surface by name.
  5. Map each layer back to the Dockerfile instruction (created_by from the image config) and print the table.

The novel part is step 4 – the JOIN between added bytes and the package database. Everything else is plumbing.

Limitations & Caveats

  • Scratch / distroless images have no package database, so attribution falls back to file level.
  • Multi‑stage COPY --from files lose their origin package information and appear unattributed in the destination layer.
  • Currently supports only apk (Alpine) and dpkg (Debian/Ubuntu). RPM and language‑specific managers (npm, pip, etc.) are not yet implemented.
  • It is a single Go binary, MIT‑licensed.

Installation

Prebuilt binary (Linux/macOS/Windows, amd64 + arm64)

curl -sSfL https://github.com/mk668a/layer-blame/releases/latest/download/layer-blame__linux_amd64.tar.gz \
  | tar -xz layer-blame

Build with Go

go install github.com/mk668a/layer-blame@latest

Repository:

When to use it

Next time a PR inflates your image size and a review asks “why is this 800 MB?”, run a single command and get a package name attached to the offending bytes—saving you an afternoon of manual digging with dive. If you encounter surprising attribution on a quirky image, open an issue; edge cases (RPM, cross‑stage COPY) are where the tool gets most interesting.

0 views
Back to Blog

Related posts

Read more »

Kubernetes Tools

kubectl You run everything with kubectl: get pods, describe, logs, exec, delete, apply—often dozens of times a day across multiple namespaces. It works, but re...