I got tired of downloading Playwright artifacts from CI, so I changed the workflow
Source: Dev.to
⚠️ Collection Error: Content refinement error: Error: 429 429 Too Many Requests: you (bkperio) have reached your weekly usage limit, upgrade for higher limits: https://ollama.com/upgrade
I got tired of downloading Playwright artifacts from CI — so I changed the workflow
Debugging Playwright failures in CI has always felt more manual than it should be. Not because the data isn’t there — it is.
But because it’s scattered. A typical failure for me looks like this: open CI job
download artifacts
open trace viewer locally
check screenshots
scroll logs
try to line everything up
It works… but it’s slow. Especially when multiple tests fail at once. The issue isn’t lack of data. It’s that there’s no single place to understand what happened. Everything lives in separate files: traces
screenshots
logs
CI output
So debugging turns into stitching together context manually. It gets worse with: parallel runs
flaky tests
multiple failures triggered by the same root cause
At that point you’re not debugging — you’re reconstructing events. I wanted to answer one simple question faster: “What actually happened in this run?” So I changed the workflow. Instead of downloading artifacts and inspecting things one by one,
I pushed everything from a run into a single view. That view shows: all failed tests across jobs
traces, screenshots, logs in one place
failures grouped if they look related
a short summary of what likely happened
The goal wasn’t to add more data — it was to remove the jumping between tools. Instead of this: open CI
download artifacts
open trace
go back to logs
repeat
You just open one link and see: which tests failed
whether they failed for the same reason
what the UI looked like at failure
what the logs say
No downloading, no switching contexts. Two things stood out immediately. You can tell pretty quickly if: it’s one bug causing multiple failures
or a bunch of unrelated issues
That alone saves a lot of time. Grouping similar failures makes it obvious when: multiple tests break for the same reason
vs random flakes
Before that, everything just looked like chaos. This still feels like a workaround. The ecosystem gives you all the pieces,
but not a clean way to reason about failures at the run level. I’m curious how others are handling this today. Do you rely mostly on trace viewer?
Do you download artifacts every time?
Any workflows that actually reduce debugging time? I open-sourced what I’ve been using here: 👉 https://github.com/adnangradascevic/playwright-reporter Would love feedback — especially if you’re dealing with a lot of CI failures.