Regression testing workflow: the risk first checks that keep releases stable

Published: (December 29, 2025 at 04:00 AM EST)
7 min read
Source: Dev.to

Source: Dev.to

Workflow shown: risk‑first regression scoping → golden‑path baseline → targeted probes → evidence‑backed results.
Example context: Sworn on PC Game Pass (Windows) used only as a real‑world backing example.
Build context: tested on the PC Game Pass build 1.01.0.1039.
Scope driver: public SteamDB patch notes used as an external change signal (no platform parity assumed).
Outputs: a regression matrix with line‑by‑line outcomes, session timestamps, and bug tickets with evidence.

Regression testing workflow diagram for live builds: change introduced, risk‑based scope, targeted regression checks, behaviour verification, and outputs (defects, confirmation notes, evidence, retest results).

Regression testing flow used to verify stability after change during a time‑boxed Sworn (PC) pass.

Regression testing scope: what I verified and why

This article is grounded in a self‑directed portfolio regression pass on Sworn using the PC Game Pass (Windows) build 1.01.0.1039, run in a one‑week solo timebox.

Scope was change‑driven and risk‑based:

  • golden‑path stability (launch → play → quit → relaunch)
  • save‑and‑continue integrity
  • core menus
  • audio sanity
  • input handover
  • side‑effect probes suggested by upstream patch notes

No Steam and Game Pass parity claim is made.

What regression testing is (in practice)

For me, regression testing is simple: after a change, does existing behaviour still hold?

  • Not “re‑test everything”.
  • Not “run a checklist because that’s what we do”.

A regression pass is selective by design. Coverage is driven by risk:

  1. What is most likely to have been impacted?
  2. What is most expensive if broken?
  3. What must remain stable for the build to be trusted?

Regression testing outputs: pass/fail results with evidence

  • Clear outcomes: pass or fail.
  • Backed by evidence and repeatable verification.
  • No opinions, no “vibes”.

Golden‑path smoke baseline for regression testing

I start every regression cycle with a repeatable golden‑path smoke because it prevents wasted time. If the baseline is unstable, deeper testing is noise.

In this Sworn pass, the baseline line was BL‑SMOKE‑01:

cold launch → main menu → gameplay → quit to desktop → relaunch → main menu

I also include a quick sanity listen for audio cut‑outs during this flow.

“Some systems absolutely cannot break. Those are the ones you want to verify on every build before spending time on deeper testing.”
Conrad Bettmann, QA Manager (Rovio Entertainment)

Why baseline stability matters in regression testing

The golden path includes the most common player actions (launch, play, quit, resume). If those are unstable, you get cascading failures that masquerade as unrelated defects.

Regression testing scope: change signals and risk

For this project I used SteamDB patch notes as an external oracle:

SWORN 1.0 Patch #3 (v1.0.3.1111), 13 Nov 2025

That does not mean I assumed those changes were present on PC Game Pass. Instead, I used the patch notes as a change signal to decide where to probe for side effects on the Game Pass build. This is useful when you have no internal access, no studio data, and no changelog for your platform.

Knowing what changed and where helps you focus regression on affected areas, rather than running very wide checks that probably won’t find anything valuable. It’s usually best to mix multiple oracles instead of relying on a single source.

“External oracles are a pragmatic way to drive risk‑based regression when internal documentation is unavailable.”
Conrad Bettmann, QA Manager (Rovio Entertainment)

Regression outcomes: pass vs. not applicable (with evidence)

  • SteamDB notes mention a music‑cutting‑out fix, so I ran an audio runtime probe (STEA‑103‑MUSIC) and verified music continuity across combat, pause/unpause, and a level load – pass.
  • SteamDB also mentions a Dialogue Volume slider. On the Game Pass build that control was not present, so the check was recorded as not applicable with evidence of absence (STEA‑103‑AVOL).

How my regression matrix is structured

My Regression Matrix lines are written to be auditable. Each line includes:

  1. A direct check
  2. A side‑effect check (if applicable)
  3. A clear outcome
  4. An evidence link

That keeps results reviewable and prevents “I think it’s fine” reporting.

Example matrix lines

IDDescription
BL‑SMOKE‑01Baseline smoke
BL‑SET‑01Settings persistence
BL‑SAVE‑01Save‑and‑Continue integrity
BL‑DEATH‑01Post‑death flow sanity
STEA‑103‑MUSICAudio runtime continuity probe
STEA‑103‑AVOLAudio settings presence check
STEA‑103‑CODEXCodex and UI navigation sanity
BL‑IO‑01Input handover + hot‑plug
BL‑ALT‑01Alt‑Tab sanity
BL‑ECON‑01Enhancement spend + ownership persistence

Save‑and‑Continue regression testing: anchors, not vibes

Save‑and‑Continue flows are a classic regression risk area because failures can look intermittent. To reduce ambiguity, I verify using anchors.

In this pass (BL‑SAVE‑01) I anchored:

  • Room splash nameWirral Forest
  • Health bucket60/60
  • Weapon typesword
  • Start of objective text

I then verified those anchors after menu Continue and after a full relaunch.

Outcome: pass – anchors matched throughout (session S2).

Why anchors make regression results repeatable

“Continue worked” is not useful if someone else cannot verify what you resumed into. Anchors turn “seems fine” into a repeatable verification result.

QA evidence for regression testing: what I capture and why

For regression, evidence matters. I capture the following for each check:

Evidence typePurpose
Screen recordingsVisual proof of UI state, transitions, and any glitches
Log excerptsShow internal state, error messages, or confirmation of actions
Audio clipsVerify continuity, absence of cut‑outs, and correct volume levels
Screenshots with timestampsTie visual state to a specific moment in the test session
Automated test output (if any)Provide reproducible steps and results from scripts

All evidence is stored in a shared folder and linked from the regression matrix, ensuring anyone can audit the outcome without relying on memory or subjective description.

Evidence Guidelines

  • Video clips – Show input, timing, and outcome together (ideal for flow and audio checks).
  • Screenshots – Support UI state, menu presence/absence, and bug clarity.
  • Session timestamps – Keep verification reviewable without scrubbing long recordings.
  • Environment notes – Platform, build, input devices, cloud‑saves enabled.

If the evidence cannot answer what was done, what happened, and what should have happened, it is not evidence.

Regression‑Testing Examples (from the Sworn pass)

Example regression bug: Defeat overlay blocks the Stats screen (SWOR‑6)

Bug: [PC][UI][Flow] Defeat overlay blocks Stats; Continue starts a new run (SWOR‑6)

  • Expectation: After Defeat, pressing Continue reveals the full Stats screen in the foreground and waits for player confirmation.
  • Actual: Defeat stays in the foreground, Stats renders underneath with a loading icon, then a new run starts automatically. Outcome – you cannot review Stats.
  • Repro rate: 3/3 (observed during progression verification S2 and reconfirmed in a dedicated retest S6).

Patch‑note probe example: Music continuity check (STEA‑103‑MUSIC)

SteamDB notes mention a fix for music cutting out, so I ran STEA‑103‑MUSIC:

  • Test: 10 min runtime with combat transitions, plus pause/unpause and a level load.
  • Outcome: Pass – music stayed continuous across those transitions (S3).

Evidence‑backed “not applicable” example: Missing Dialogue Volume slider (STEA‑103‑AVOL)

SteamDB notes mention a Dialogue Volume slider, but on the Game Pass build the Audio menu only showed Master, Music, and SFX.

  • Outcome: Not applicable with evidence of absence (STEA‑103‑AVOL, S4).
  • This avoids inventing parity and keeps the matrix honest.

Accessibility issues logged as a known cluster (no new build to retest)

On Day 0 (S0) I captured onboarding accessibility issues as a known cluster (B‑A11Y‑01: SWOR‑1, SWOR‑2, SWOR‑3, SWOR‑4).

  • Because there was no newer build during the week, regression retest was not applicable until a new build exists.
  • This is logged explicitly rather than implied.

Results Snapshot (for transparency)

In this backing pass the matrix recorded:

  • 8 Pass
  • 1 Fail
  • 1 Not applicable
  • 1 Known accessibility cluster captured on Day 0 with no newer build available for retest

Counts are included for context, not as the focus of the article.

Regression‑Testing Takeaways (risk, evidence, and verification)

  • Regression testing is change‑driven verification, not “re‑test everything”.
  • A repeatable golden‑path baseline stops you wasting time on an unstable build.
  • External patch notes can be used as a risk signal without assuming platform parity.
  • Anchors make progression and resume verification credible and repeatable.
  • Not applicable is a valid outcome if it is evidenced, not hand‑waved.
  • Pass results deserve evidence too, because they are still claims.

Regression‑Testing FAQ (manual QA)

Is regression testing just re‑testing old bugs?
No. It verifies that existing behaviour still works after a change, covering previously working systems whether or not bugs were ever logged against them.

Do you need to re‑test everything in regression?
No. Effective regression testing is selective. Scope is driven by change and risk, not by feature count.

How do you scope regression without internal patch notes?
By using external change signals such as public patch notes, previous builds, and observed behaviour as oracles, without assuming platform parity.

What’s the difference between regression and exploratory testing?

  • Regression: verifies known behaviour after change.
  • Exploratory: searches for unknown risk and emergent failure modes.
    They complement each other but answer different questions.

Is a pass result meaningful in regression testing?
Yes. A pass is still a claim, so regression passes should be supported with evidence, not just a checkbox.

When is “not applicable” a valid regression outcome?
When a feature is not present on the build under test and that absence is confirmed with evidence. Logging this explicitly is more honest than assuming parity or silently skipping the check.

  • (Add links to the workbook tabs: Regression Matrix, Sessions Log, Bug Log)
  • (Add links to evidence clips)

This dev.to post stays focused on the regression workflow. The case‑study links out to the workbook tabs and evidence clips.

Back to Blog

Related posts

Read more »