Solved: Pacemaker/DRBD: Auto-failback kills active DRBD Sync Primary to Secondary. How to prevent this?

Published: (December 27, 2025 at 06:09 PM EST)
8 min read
Source: Dev.to

Source: Dev.to

Executive Summary

TL;DR: Pacemaker’s default auto‑failback behavior can disrupt an active DRBD primary by attempting premature promotion on a recovering node, leading to service outages and potential data risks. This issue can be prevented by:

  • Configuring negative resource stickiness (e.g., -10000) on the DRBD master/slave clone resource.
  • Implementing manual failback (standby mode or location constraints).
  • Setting up graceful, delayed promotion with robust STONITH, increased cluster-delay, and generous promoted-stop-timeout values.

Why This Happens

Pacemaker/DRBD clusters provide high availability, but the default behavior often tries to ā€œfail backā€ resources to their preferred node as soon as that node recovers. In a DRBD setup this can be disastrous:

SymptomDescription
Service outagesApplications on the active DRBD primary stop or become unresponsive.
DRBD status changesThe primary flips to Secondary, Unknown, or shows a conflict state (e.g., WFConnection, StandAlone).
Pacemaker log entriesLogs show promotion attempts on the recovering node and demotion or fencing actions on the current primary. Look for drbd_promote, drbd_demote, or conflict messages.

Example Pacemaker Log Snippet

Sep 20 10:35:01 node-a pacemakerd[12345]: info: Status: Requesting promote of drbd_res on node-a
Sep 20 10:35:01 node-a pacemakerd[12345]: crit: Result: promote_drbd_res_on_node-a: CIB_R_ERR_OP_FAILED
Sep 20 10:35:01 node-b pacemakerd[12345]: info: Status: Requesting demote of drbd_res on node-b
Sep 20 10:35:01 node-b pacemakerd[12345]: info: drbd_demote: stdout [drbd_demote: Attempting to demote resource 'r0']
Sep 20 10:35:02 node-b pacemakerd[12345]: warn: drbd_demote: stderr [drbd_demote: Cannot demote 'r0', it is still in use.]
Sep 20 10:35:02 node-b pacemakerd[12345]: crit: Result: demote_drbd_res_on_node-b: CIB_R_ERR_OP_FAILED

Running drbd-overview during the issue will show unexpected roles or connections.

Example drbd-overview Output During Conflict

0:r0   Connected Primary/Primary UpToDate/UpToDate
       [WARNING: This indicates split‑brain in a two‑node cluster, which Pacemaker should prevent]
       [More likely, you'll see a quick flip or errors.]

Core Problem

  1. Resource locality preference – Pacemaker prefers to keep resources on their ā€œpreferredā€ nodes. When a node recovers, Pacemaker treats it as a suitable candidate again.
  2. DRBD primary requirement – In a two‑node synchronous (Protocol C) DRBD setup, only one node may be Primary at a time.
  3. Premature promotion attempt – Pacemaker may try to promote the DRBD resource on the recovering node immediately, before the current primary can be safely demoted.
  4. Resulting conflict – This can lead to:
    • Promotion failure (if the RA detects another primary).
    • A race condition where both nodes briefly think they should be primary.
    • DRBD’s internal conflict resolution (automatic demotion or fencing), which can cause I/O disruption and application failure.

Solutions

1. Negative Resource Stickiness

# Example: set a very high negative stickiness on the DRBD clone
pcs resource defaults resource-stickiness=-10000
  • Guarantees that Pacemaker won’t automatically fail back the DRBD resource.
  • The resource stays on the current primary until an administrator moves it manually.

2. Manual Failback Strategies

MethodHow It Works
Standby modePut the recovering node into standby (pcs node standby <node>). Pacemaker will not schedule any resources on it until you clear standby.
Location constraints with negative scoresCreate a location rule that gives the Promoted role a large negative score on the recovering node. Example:
```bash
pcs constraint location drbd_res prefers node-a=INFINITY
pcs constraint location drbd_res avoids node-b=-1000 role=Promoted
| **Explicit move** | Use `pcs resource move drbd_res <node>` when you’re ready to promote. |

### 3. Graceful & Delayed Promotion  

1. **Robust STONITH** – Ensure fencing works reliably; otherwise Pacemaker may skip safe demotion.  
2. **Increase `cluster-delay`** – Gives the cluster more time to propagate state changes before acting.  

   ```bash
   pcs property set cluster-delay=30s
  1. Set generous promoted-stop-timeout – Allows the old primary to finish demotion cleanly.

    pcs resource update drbd_res meta promoted-stop-timeout=120s
  2. Optional: migration-threshold – Prevents rapid back‑and‑forth moves.

    pcs resource defaults migration-threshold=3

Checklist for a Safe DRBD‑Pacemaker Cluster

  • STONITH configured and tested on all nodes.
  • Negative stickiness (or equivalent location constraints) applied to DRBD resources.
  • Cluster‑delay and promoted‑stop‑timeout values tuned for your workload.
  • Monitoring (e.g., pcs status, drbd-overview, Pacemaker logs) set up to alert on promotion/demotion events.
  • Documentation of manual failback procedures for operators.

TL;DR (Re‑stated)

Set a high negative resource-stickiness (or use location constraints) to stop automatic failback.
When you need to move the primary, do it manually (standby, pcs resource move, or explicit location rules).
If you prefer automatic failback, make sure STONITH works, increase cluster-delay, and raise promoted-stop-timeout so the old primary can demote cleanly before the new one promotes.

By following these steps you can avoid ā€œkillā€ scenarios, keep DRBD resources stable, and maintain true high‑availability for your services.

Preventing ā€œkillā€ā€ÆScenarios When a DRBD Node Recovers

When a node that was previously Primary comes back online while another node is already acting as Primary, Pacemaker may try to promote the recovering node again. If the promotion cannot be performed cleanly, the active node is forced out of its role, resulting in an un‑graceful shutdown of services (the so‑called kill).

Typical Causes

  • Lack of graceful demotion – Pacemaker does not have enough time or a clear mandate to demote the current Primary before the recovering node asserts itself.
  • Insufficient or slow fencing (STONITH) – The cluster cannot reliably isolate the failing node.

Below are three proven ways to avoid this situation while keeping the cluster highly available.

1ļøāƒ£ Keep the Primary ā€œStickyā€ (Negative Resource‑Stickiness)

Idea – Tell Pacemaker never to move the DRBD resource back to a node that has just recovered, unless an administrator does it manually.

How it works

  • A large negative resource-stickiness on the DRBD clone makes the current Primary ā€œstickyā€.
  • Optionally add a location constraint that prefers the node that already holds the Primary.

Example configuration

# 1. Define the DRBD Master/Slave resource (example)
pcs resource create drbd_r0 ocf:linbit:drbd \
    drbd_resource=r0 \
    op monitor interval="60s" \
    op promote interval="30s" start-timeout="90s" stop-timeout="90s" \
    op demote interval="30s" start-timeout="90s" stop-timeout="90s" \
    --clone globally-unique=true ordered=true interleave=true

# 2. Add a huge negative stickiness → disables automatic fail‑back
pcs resource meta drbd_r0-clone resource-stickiness=-10000

# 3. Filesystem resource that depends on drbd_r0 being Primary
pcs resource create fs_data ocf:heartbeat:Filesystem \
    device="/dev/drbd/by-res/r0" directory="/mnt/data" fstype="ext4" \
    op monitor interval="30s"

# 4. Ensure fs_data runs only when drbd_r0-clone is promoted
pcs constraint colocation add fs_data with drbd_r0-clone INFINITY target-role=Promoted

# 5. Ensure ordering: promote DRBD first, then start the filesystem
pcs constraint order promote drbd_r0-clone then start fs_data
ProsCons
Highly predictable and reliable.Requires manual pcs resource move/migrate to fail‑back after recovery.
Prevents split‑brain caused by aggressive auto‑fail‑back.Downtime may increase if the admin is slow to intervene.
Simplifies troubleshooting – no resource flapping.Resources can stay on a less‑preferred node for a long time.

2ļøāƒ£ Put the Recovering Node in Standby (Maintenance Mode)

Idea – Prevent Pacemaker from starting any resources on the node that just came back, giving you time to verify its health before allowing a promotion.

Steps

# On the admin workstation (or any cluster node)
pcs node standby <node>
  • The node stays in standby; Pacemaker will not schedule resources on it.
  • After verification, bring it back:
pcs node unstandby <node>
  • Then manually move the DRBD resource if you want a fail‑back.
ProsCons
Gives complete administrative control over fail‑back.Requires continuous monitoring and manual steps after each recovery.
Minimises risk of unintentional primary conflicts.Potentially longer downtime because human interaction is needed.
Guarantees node health before any promotion.Less ā€œautomaticā€ for a typical HA environment.

3ļøāƒ£ Use a Location Constraint to Block Promotion on Recovery

Idea – Assign a very low (or -INFINITY) score to the recovering node for the Promoted role, so Pacemaker will never promote the DRBD resource there automatically.

Example

# Assume node‑a is the preferred primary.
# When node‑a recovers we want to keep it from promoting drbd_r0.

# Prefer the other node (node‑b) for the Promoted role
pcs constraint location drbd_r0-clone prefers=node-b=100 target-role=Promoted

# Explicitly avoid the recovering node for promotion
pcs constraint location drbd_r0-clone avoids=node-a=-INFINITY target-role=Promoted
  • You can combine this with the negative stickiness from Solution 1 for extra safety.
ProsCons
Provides fine‑grained control without putting the whole node in standby.Still requires manual intervention to perform a fail‑back.
Prevents automatic promotion, thus avoiding split‑brain.Slightly more complex to maintain the constraints.
Works together with stickiness for a ā€œdouble‑lockā€.May need adjustments if the cluster topology changes.

šŸ“‹ Summary

SolutionHow it stops the ā€œkillā€ scenarioWhen to use it
Negative stickiness (Solution 1)Keeps the current Primary on its node; the recovered node stays Secondary until an admin moves it.Preferred when you want automatic fail‑over but manual fail‑back.
Standby/maintenance mode (Solution 2)Stops Pacemaker from touching the recovering node at all, giving you time to verify health.Useful in environments where node health checks are mandatory before any resource runs.
Location constraint (Solution 3)Gives the recovering node a score that forbids promotion, while still allowing it to run as Secondary.Good when you need per‑resource control without taking the whole node offline.

All three approaches achieve the same goal: the recovering node never automatically promotes its DRBD resource without explicit administrative approval. Choose the one that best matches your operational workflow and the level of automation you desire.

Solution Overview

The solution leverages several Pacemaker global options and resource meta‑attributes to ensure a sequential and controlled transition of the DRBD primary role. Key elements include:

  • Robust fencing (STONITH)
  • Increased cluster-delay for state propagation
  • Carefully configured timeouts for resource actions

1. Ensure Robust Fencing (STONITH)

Why?
If Pacemaker cannot reliably fence a failed node, no fail‑back strategy is truly safe.

# Enable STONITH globally
pcs property set stonith-enabled=true

# Define the quorum policy (choose stop or freeze as required)
pcs property set no-quorum-policy=stop   # or 'freeze'

# Create STONITH devices (example using fence_ipmi)
pcs stonith create fence_ipmi_node1 fence_ipmi \
    ipaddr=192.168.1.10 pcmk_host_list=node-a \
    login=admin passwd=password \
    op monitor interval=60s

pcs stonith create fence_ipmi_node2 fence_ipmi \
    ipaddr=192.168.1.11 pcmk_host_list=node-b \
    login=admin passwd=password \
    op monitor interval=60s

2. Increase cluster-delay

Give Pacemaker more time to propagate state changes and avoid premature decisions.

pcs property set cluster-delay=60s

This tells Pacemaker to wait 60 seconds after a node joins or leaves before making significant resource‑placement decisions.

3. Configure DRBD Timeouts

ParameterPurpose
promoted-stop-timeoutMaximum time Pacemaker will wait for a demote operation to complete on a Master/Slave resource.
stop-failure-is-fatal=falsePrevents a failed demote (stop) from immediately marking the resource as permanently failed on that node.
pcs resource update drbd_r0-clone \
    op demote timeout="120s" promoted-stop-timeout="180s" \
    op stop timeout="120s" \
    meta stop-failure-is-fatal=false

4. Optional: Resource Stickiness for Preferred Node

If you want a graceful auto‑failback to a preferred node, you can add a modest positive stickiness or location rule, but ensure the safeguards above are in place.

Back to Blog

Related posts

Read more Ā»