Alert Design in OT. If Everything Screams, Nothing Is Heard
Source: Dev.to
The Problem with OT Alert Overload
If you work in OT and your screens are constantly lit up with red, the hard truth is that your alert system is not protecting you—it is training operators to ignore risk. Most control rooms are drowning in noise. Operators click “acknowledge” on alarms they don’t truly understand just to make the screen usable again, and eventually a serious event slips through.
Why It’s Not Just a Technology Issue
Alert design shapes what operators see, what they ignore, and when they react. A lazy design creates blind operators rather than visibility. Most OT alerting is broken because it never started from a clear philosophy; it grew as a messy patchwork of ad‑hoc rules, vendor defaults, and forgotten temporary alerts. The result is a wall of noise that produces an “alert storm” for a single underlying event.
Inconsistent Categories and Priorities
Terms such as informational, warning, critical, security, and system may look tidy on a slide deck, but in practice they are used inconsistently—sometimes a “warning” is more serious than a “critical.” When labels don’t map to how operators think about risk, they become decorative rather than functional.
If everything looks urgent, nothing feels urgent. Red banners, flashing icons, and intrusive pop‑ups for minor issues cause operators to stop believing the system and to develop their own hidden ranking of alerts.
Vague Alert Text
Many alerts read like “An anomaly was detected on the device” or “Security event triggered.” Such messages give no immediate indication of what is happening, what is at risk, or what action is required. If operators must click through multiple screens or call someone else just to understand the basics, the alert becomes a puzzle—acceptable for after‑action reviews, but ignored during a live shift.
Human Factors in the Control Room
Operators are not lazy machines; they are humans under constant cognitive load. Their focus is limited, and when bombarded with too many stimuli the brain resorts to shortcuts:
- Noise filtering: If the majority of alerts never matter, operators learn to ignore them as a survival tactic.
- Habituation: Repeatedly firing the same low‑value alert causes the brain to downgrade its importance, just like weekly fire‑drill alarms that are eventually ignored.
- Postponement: “I will check it later” becomes a cultural norm when low‑value alerts dominate, giving attackers a window to hide within ignored classes.
- Pattern hunting: Operators build mental rules based on recurring behavior. When those rules blur the line between real risk and routine noise, genuine attacks are waved away as “Monday noise.”
What a Good OT Alert Should Do
A well‑designed alert must enable an operator to answer three questions in seconds:
- What is happening? – Provide a specific description (e.g., “Unauthorised login attempt to PLC 3”).
- What is at risk? – Explain why it matters now (e.g., potential loss of control of a critical pump).
- What action is required and how urgent is it? – State the next step and its priority (e.g., “Call the on‑duty security engineer immediately”).
If any of these questions are unanswered, the alert is incomplete and will be ignored, misjudged, or acted upon too late.
Simple, Strict Rules for Effective Alert Design
1. Limit Alert Volume
Set a hard limit on how many alerts any one operator can see in a shift. If the system exceeds that limit, remove low‑value alerts or merge related ones into a single event. Without a ceiling, alert volume will grow until the system collapses under its own spam.
2. Separate Alert Streams
Create distinct streams for safety, process, and security alerts. Each stream should have its own visual style, sound, and escalation path:
- Safety – People and physical damage.
- Process – Quality, performance, and uptime.
- Security – Access, misuse, and hostile behaviour.
An operator should be able to glance at a screen and instantly know which category the alert belongs to.
3. Reserve Intrusive Sounds for True High‑Priority Events
Use loud sounds and pop‑ups only for genuinely critical alerts. If minor warnings scream, operators will mute the system, rendering the sound meaningless when a real emergency occurs.
4. Eliminate Duplicates
One underlying event should generate a single alert. Use correlation to suppress multiple alerts that refer to the same issue, reducing the “attention tax” with no added value.
5. Use Clear, Actionable Text
Write alerts in plain language that tells the operator what, why, and what to do. Avoid vague phrases like “Issue on the host” or “Anomaly detected.” Include relevant identifiers (device name, location, severity) so the operator can act immediately.
6. Enforce Consistent Priorities
Define a priority hierarchy (e.g., Critical > High > Medium > Low) and apply it uniformly across all alert categories. Ensure that visual cues (color, icon) match the defined priority.
7. Provide Context at a Glance
Where possible, embed key context (e.g., current value vs. threshold, affected process) directly in the alert banner so the operator does not need to navigate away to understand the situation.
Moving from Noise to Signal
By applying these rules, you shift the alert system from a chaotic flood of red banners to a clear, actionable signal. Operators regain trust in the system, can focus their limited attention where it truly matters, and are better equipped to respond to genuine threats before they become incidents.