Bug ID 748253: Race condition between clustered DIAMETER devices can cause the standby to disconnect its mirror connection

Last Modified: Sep 13, 2023

Affected Product(s):
BIG-IP LTM(all modules)

Known Affected Versions:
11.2.1, 11.3.0, 11.4.0, 11.4.1, 11.5.0, 11.5.1, 11.5.2, 11.5.3, 11.5.4, 11.5.5, 11.5.6, 11.5.7, 11.5.8, 11.5.9, 11.5.10, 11.6.0, 11.6.1, 11.6.2, 11.6.3, 11.6.3.1, 11.6.3.2, 11.6.3.3, 11.6.3.4, 11.6.4, 11.6.5, 11.6.5.1, 11.6.5.2, 11.6.5.3, 12.0.0, 12.0.0 HF1, 12.1.0 HF1, 12.0.0 HF2, 12.1.0 HF2, 12.0.0 HF3, 12.0.0 HF4, 12.1.1 HF1, 12.1.1 HF2, 12.1.2 HF1, 12.1.2 HF2, 12.1.0, 12.1.1, 12.1.2, 12.1.3, 12.1.3.1, 12.1.3.2, 12.1.3.3, 12.1.3.4, 12.1.3.5, 12.1.3.6, 12.1.3.7, 12.1.4, 12.1.4.1, 12.1.5, 12.1.5.1, 12.1.5.2, 12.1.5.3, 12.1.6, 13.0.0, 13.0.0 HF1, 13.0.0 HF2, 13.0.0 HF3, 13.0.1, 13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.5, 14.1.0.6, 14.1.2

Fixed In:
15.0.0, 14.1.2.1, 13.1.3

Opened: Oct 30, 2018

Severity: 3-Major

Symptoms

Depending on the DIAMETER settings of the BIG-IP system, there can be a race condition in a mirrored device cluster where the standby BIG-IP system resets its mirror connection to the active device.

Impact

The standby is no longer mirroring the active system, and gets out of sync with it. There may be connections lost if a failover occurs.

Conditions

-- MRF DIAMETER in use. -- The DIAMETER session profile on the BIG-IP system is configured to use a non-zero watchdog timeout. -- The DIAMETER session profile on the BIG-IP system is configured to use Reset on Timeout. -- This is more likely to happen if (in the DIAMETER session profile) the Maximum Watchdog Failures is set to 1, and the Watchdog Timeout is configured to be the same value as the remote DIAMETER system.

Workaround

To mitigate this issue: 1. Configure the Maximum Watchdog Failures to a value greater than 1. 2. Configure the Watchdog Timeout as something different from the same timeout on the remote peer, preferably to something that will have little overlap (i.e., the two timers should fire at the exact same time very infrequently).

Fix Information

Prevented the standby from sending DWR packets to the active device, so that it no longer expects DWA responses that never arrive.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips