Bug ID 888341: HA Group failover may fail to complete Active/Standby state transition

Last Modified: Mar 22, 2020

Bug Tracker

Affected Product:  See more info
BIG-IP TMOS(all modules)

Known Affected Versions:
11.6.0, 11.6.0 HF1, 11.6.0 HF2, 11.6.0 HF3, 11.6.0 HF4, 11.6.0 HF5, 11.6.0 HF6, 11.6.0 HF7, 11.6.0 HF8, 11.6.1, 11.6.1 HF1, 11.6.1 HF2, 11.6.2, 11.6.2 HF1, 11.6.3, 11.6.3.1, 11.6.3.2, 11.6.3.3, 11.6.3.4, 11.6.4, 11.6.5, 12.0.0, 12.0.0 HF1, 12.0.0 HF2, 12.0.0 HF3, 12.0.0 HF4, 12.1.0, 12.1.0 HF1, 12.1.0 HF2, 12.1.1, 12.1.1 HF1, 12.1.1 HF2, 12.1.2, 12.1.2 HF1, 12.1.2 HF2, 12.1.3, 12.1.3.1, 12.1.3.2, 12.1.3.3, 12.1.3.4, 12.1.3.5, 12.1.3.6, 12.1.3.7, 12.1.4, 12.1.4.1, 12.1.5, 13.0.0, 13.0.0 HF1, 13.0.0 HF2, 13.0.0 HF3, 13.0.1, 13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 13.1.3, 13.1.3.1, 13.1.3.2, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.4, 14.1.0.5, 14.1.0.6, 14.1.2, 14.1.2.1, 14.1.2.2, 14.1.2.3, 15.0.0, 15.0.1, 15.0.1.1, 15.0.1.2, 15.1.0

Opened: Mar 09, 2020
Severity: 2-Critical

Symptoms

After a long uptime interval (i.e., the sod process has been running uninterrupted for a long time), high availability (HA) Group failover may not complete despite an high availability (HA) Group score change occurring. As a result, a BIG-IP unit with a lower high availability (HA) Group score may remain as the Active device. Note: Uptime required to encounter this issue is dependent on the number of traffic groups: the more traffic groups, the shorter the uptime. For example: -- For 1 floating traffic group, after 2485~ days. -- For 2 floating traffic groups, after 1242~ days. -- For 4 floating traffic groups, after 621~ days. -- For 8 floating traffic groups, after 310~ days. -- For 9 floating traffic groups, after 276~ days. Note: You can confirm sod process uptime in tmsh: # tmsh show /sys service sod

Impact

HA Group Active/Standby state transition may not complete despite high availability (HA) Group score change.

Conditions

-- high availability (HA) Group failover mode configured. Note: No other failover configuration is affected except for high availability (HA) Group failover. o VLAN failsafe failover. o Gateway failsafe failover. o Failover triggered by loss of network failover heartbeat packets. o Failover caused by system failsafe (i.e., the TMM process was terminated on the Active unit).

Workaround

There is no workaround. The only option is to reboot all BIG-IP units in the device group on a regular interval. The interval is directly dependent on the number of traffic groups.

Fix Information

None

Behavior Change