3 on-call rotation mistakes that burn out your best engineers first

Published: (April 29, 2026 at 06:32 AM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Mistake 1: Measuring shifts per engineer instead of load per engineer

Equal shifts are not equal load. A week with two P1 incidents resolved in 20 minutes each is not the same as a week with twelve alerts that each require 45 minutes of investigation at 2 am. If you track only who was on‑call and not what that shift actually cost, you will consistently underestimate the burden on senior engineers, who resolve things faster but get paged more often because they’re trusted to handle anything.

What to track

  • Actionable pages per shift per engineer.
  • Load data (time spent, incident severity) rather than headcount fairness.

How to fix

  1. Perform alert hygiene first: delete alerts that receive no action for 30 consecutive days.
  2. Rebalance the schedule based on the load data you’ve collected.

Mistake 2: Putting engineers on independent on‑call before shadow shifts

The correct progression before anyone carries the pager alone:

  1. Observer phase – the engineer receives all the same pages, takes no action, and watches how the primary responder handles them.
  2. Reverse shadow – the engineer leads the response while an experienced engineer watches.
  3. Independent – the engineer handles on‑call solo.

Skipping these steps raises MTTR on every incident the engineer handles alone and creates a perception that on‑call is dangerous rather than manageable.

Recommendation
Allocate four to six weeks of partial senior engineer time upfront. This investment costs significantly less than the first major incident where an unprepared engineer makes the situation worse.


Mistake 3: Treating on‑call as part of the job with no additional recognition

An engineer paged three times outside business hours in a single week and then expected to deliver full sprint capacity the following week is being asked to absorb a cost that isn’t being acknowledged. Simple recognition mechanisms are enough:

  • Time‑in‑lie for overnight pages.
  • Reduced sprint commitment after heavy on‑call weeks.
  • Explicit acknowledgment in performance reviews.

The failure mode is pretending the cost doesn’t exist.


Opsgenie note

If Opsgenie is still in your stack, note that end‑of‑support is April 5, 2027. Export all runbooks and escalation policies now, as the format doesn’t migrate cleanly into alternatives.

0 views
Back to Blog

Related posts

Read more »