A Docker Builder Recipe for Cypress & Playwright in CI
Source: Dev.to
Containerized e2e is the difference between a green build and a four-hour debugging session over Slack DM. “Works on my machine” stops being funny the third time it happens. This guide gives you a battle-tested Dockerfile and docker-compose.yml for running Cypress and Playwright in CI, plus the YoBox plumbing that keeps disposable inboxes and webhook receivers out of your container image. The Docker Builder tool scaffolds the baseline; this article explains the why behind every line. A solid e2e container has five properties: Deterministic — same image, same result, this year and next. Both Playwright and Cypress publish official images. They are large but they save you from chasing missing Chromium libs at 1 AM. Use them. FROM mcr.microsoft.com/playwright:v1.49.0-jammy FROM cypress/included:14.0.0 رA production Dockerfile
WORKDIR /app
https://yobox.dev/api \ PWDEBUG=0
ENTRYPOINT ["npx", "playwright", "test"] ```
Key choices:
npm ci not npm install — reproducible installs.
```yaml services: app: build: ./web ports: ["3000:3000"]
e2e: build: ./tests environment: - BASE_URL=http://app:3000 - YOBOX=https://yobox.dev/api depends_on: - app command: ["--shard=1/1"] ```
That's it. No mail server. No webhook tunnel. The tests reach YoBox over the public internet, which is exactly what CI does too — meaning your local run matches CI byte-for-byte.
GitHub Actions, four shards:
jobs:
Image pulls are the single biggest cost in containerized e2e. Two tricks:
uses: docker/setup-buildx-action@v3
uses: docker/build-push-action@v6
with:
context: ./tests
cache-from: type=gha
cache-to: type=gha,mode=max
load: true
tags: e2e:latest
GHA's registry cache turns a cold 4-minute build into a 20-second warm build.
Mount an output volume and upload on failure:
run: docker compose run --rm -v "$PWD/test-results:/app/test-results" e2e
if: failure()
uses: actions/upload-artifact@v4
with:
name: traces-${{ matrix.shard }}
path: test-results
Pair Playwright's trace: "on-first-retry" with this and every flaky failure ships you a viewable trace.
| Layer | Size | Notes | | -------------------- | -------- | ------------------------------------ | | Playwright base | ~1.2 GB | Includes Chromium, Firefox, WebKit. | | Cypress base | ~1.4 GB | Includes Electron + Xvfb. | | node_modules | 200–500 MB | Cacheable separately. | | Source | r.json());
npm install in CI. Always npm ci. Always.
Can I run this on ARM (M-series Macs)? Yes — both Playwright and Cypress publish multi-arch images.
How do I avoid pulling the image every run? Use a self-hosted runner with a Docker volume, or GHA's registry cache.
Should I bake node_modules into the image? Yes for CI, no for local dev where you bind-mount source.
Where do I store test reports? Upload as an artifact and link from the PR. Don't commit them.
Containerized e2e is non-negotiable for any team running tests across more than one machine. The recipe above — official base image, npm ci, sharded compose, GHA cache, YoBox-backed inboxes and webhooks — gets you to a green pipeline in an afternoon. Generate your starting Dockerfile from the Docker Builder and customize from there.
See also: The Only docker-compose.yml Pattern You Need, Cypress + YoBox, Playwright + YoBox.
For self-hosted runners with bandwidth caps, a multi-stage build keeps only the runtime needed for tests:
\`dockerfile FROM mcr.microsoft.com/playwright:v1.49.0-jammy AS deps WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci
FROM mcr.microsoft.com/playwright:v1.49.0-jammy WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . ENTRYPOINT ["npx", "playwright", "test"] \`
\v1.49.0-jammy\ is reproducible across years; \latest\ is reproducible for about 12 hours. Pin in CI, float in personal sandboxes.
The GitHub Actions cache is fast but capped per repo. Self-hosted runners with a persistent Docker volume win above ~50 e2e jobs per day. Below that, GHA cache is simpler.
Most teams adopting containerized e2e do it after they've outgrown a single CI machine. The migration order that works: containerize the test runner first, then add sharding, then move to a self-hosted runner pool once cache pressure shows up.