Bug ID 888341: HA Group failover may fail to complete Active/Standby state transition

Last Modified: Sep 30, 2020

Bug Tracker

Affected Product:  See more info
BIG-IP TMOS(all modules)

Known Affected Versions:
11.6.0, 11.6.0 HF1, 11.6.0 HF2, 11.6.0 HF3, 11.6.0 HF4, 11.6.0 HF5, 11.6.0 HF6, 11.6.0 HF7, 11.6.0 HF8, 11.6.1, 11.6.1 HF1, 11.6.1 HF2, 11.6.2, 11.6.2 HF1, 11.6.3, 11.6.3.1, 11.6.3.2, 11.6.3.3, 11.6.3.4, 11.6.4, 11.6.5, 11.6.5.1, 11.6.5.2, 12.0.0, 12.0.0 HF1, 12.0.0 HF2, 12.0.0 HF3, 12.0.0 HF4, 12.1.0, 12.1.0 HF1, 12.1.0 HF2, 12.1.1, 12.1.1 HF1, 12.1.1 HF2, 12.1.2, 12.1.2 HF1, 12.1.2 HF2, 12.1.3, 12.1.3.1, 12.1.3.2, 12.1.3.3, 12.1.3.4, 12.1.3.5, 12.1.3.6, 12.1.3.7, 12.1.4, 12.1.4.1, 12.1.5, 12.1.5.1, 12.1.5.2, 13.0.0, 13.0.0 HF1, 13.0.0 HF2, 13.0.0 HF3, 13.0.1, 13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 13.1.3, 13.1.3.1, 13.1.3.2, 13.1.3.3, 13.1.3.4, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.5, 14.1.0.6, 14.1.2, 14.1.2.1, 14.1.2.2, 14.1.2.3, 14.1.2.4, 14.1.2.5, 14.1.2.6, 14.1.2.7, 14.1.2.8, 15.0.0, 15.0.1, 15.0.1.1, 15.0.1.2, 15.0.1.3, 15.0.1.4, 15.1.0, 15.1.0.1, 15.1.0.2, 15.1.0.3, 15.1.0.4, 15.1.0.5, 16.0.0, 16.0.0.1

Opened: Mar 09, 2020
Severity: 2-Critical

Symptoms

After a long uptime interval (i.e., the sod process has been running uninterrupted for a long time), HA Group failover may not complete despite an HA Group score change occurring. As a result, a BIG-IP unit with a lower HA Group score may remain as the Active device. Note: Uptime required to encounter this issue is dependent on the number of traffic groups: the more traffic groups, the shorter the uptime, e.g.: -- 1 floating traffic group: 2485~ days. -- 2 floating traffic groups: 1242~ days. -- 4 floating traffic groups: 621~ days. -- 8 floating traffic groups: 310~ days. -- 9 floating traffic groups: 276~ days. Note: You can confirm sod process uptime in tmsh: # tmsh show /sys service sod

Impact

HA Group Active/Standby state transition may not complete despite HA Group score change.

Conditions

HA Group failover configured. Note: No other failover configuration is affected except for HA Group failover, specifically, these are not affected: o VLAN failsafe failover. o Gateway failsafe failover. o Failover triggered by loss of network failover heartbeat packets. o Failover caused by system failsafe (i.e., the tmm process was terminated on the Active unit).

Workaround

There is no workaround. The only option is to reboot all BIG-IP units in the device group on a regular interval. The interval is directly dependent on the number of traffic groups.

Fix Information

None

Behavior Change