Bug ID 463903: Behavior Change: HA Score calculation when minimum-threshold attribute is in use

Last Modified: Apr 10, 2019

Bug Tracker

Affected Product:  See more info
BIG-IP LTM(all modules)

Known Affected Versions:
11.2.0, 11.2.1, 11.3.0, 11.4.0, 11.4.1, 11.5.0, 11.5.1, 11.5.1 HF1, 11.5.1 HF10, 11.5.1 HF11, 11.5.1 HF2, 11.5.1 HF3, 11.5.1 HF4, 11.5.1 HF5, 11.5.1 HF6, 11.5.1 HF7, 11.5.1 HF8, 11.5.1 HF9, 11.5.2, 11.5.2 HF1, 11.5.3, 11.5.3 HF1, 11.5.3 HF2, 11.5.4, 11.5.4 HF1, 11.5.4 HF2, 11.5.4 HF3, 11.5.4 HF4, 11.5.5, 11.5.6, 11.5.7, 11.5.8, 11.5.9, 11.6.0, 11.6.0 HF1, 11.6.0 HF2, 11.6.0 HF3, 11.6.0 HF4, 11.6.0 HF5, 11.6.0 HF6, 11.6.0 HF7, 11.6.0 HF8, 11.6.1, 11.6.1 HF1, 11.6.1 HF2, 11.6.2, 11.6.2 HF1, 11.6.3, 11.6.3.1, 11.6.3.2, 11.6.3.3, 11.6.3.4, 11.6.4, 12.0.0, 12.0.0 HF1, 12.0.0 HF2, 12.0.0 HF3, 12.0.0 HF4, 12.1.0, 12.1.0 HF1, 12.1.0 HF2, 12.1.1, 12.1.1 HF1, 12.1.1 HF2, 12.1.2, 12.1.2 HF1, 12.1.2 HF2, 12.1.3, 12.1.3.1, 12.1.3.2, 12.1.3.3, 12.1.3.4, 12.1.3.5, 12.1.3.6, 12.1.3.7, 12.1.4, 12.1.4.1

Fixed In:
13.0.0

Opened: May 22, 2014
Severity: 4-Minor
Related AskF5 Article:
K68062382

Symptoms

HA Groups periodically compute an HA health score to determine which BIG-IP device is the 'best' device to host a Traffic Group. If another device has a better score than the current device then the Traffic Group fails over to the other device. The HA Group provides a set of thresholds that, if not met, will evoke failover regardless of HA scores. Some problems with this are the following: The HA score measured by two devices may vary slightly over time. The device with the 'best' score may vary thus triggering unnecessary failovers even though the difference in the scores is negligible. The current method is not flexible. HA groups cannot be easily incorporate other 'boolean' values (i.e VLAN failsafe) or other methods that pick the next active device, (i.e load-aware algorithm). For these reasons, the HA Group has been refactored into two separate objects, HA Monitor and HA Score, to decouple failure detection from failure remediation. An HA Monitor determines when a traffic group can no longer run on the current device. The HA Score failover method picks the highest scoring device as the next device to host the traffic group. An HA monitor may be combined with other Failover Methods to, for example, failover to the next device in round robin order. If any component (trunk, pool, cluster member) of a high availability (HA) group violates its 'minimum' requirement (defined by the 'minimum-threshold'), the total HA group score is not forced to 0 (zero) if there is an active bonus set.

Impact

Because of how HA group score is calculated, the system might incorrectly report a viable traffic group or may fail over unnecessarily.

Conditions

-- Configuration contains an HA group with associated components (trunks, pools, cluster members). -- 'Minimum-threshold' setting of any of those members is non-zero. -- Non-zero number of members.

Workaround

Use tmsh to configure the 'minimum-threshold' parameter for each of the HA group components (trunks, pools, and cluster members) to a value that specifies the minimum number of monitored objects required to consider this contributor valid.

Fix Information

The HA Group configuration did not provide flexibility and could cause unnecessary failovers. The calculation of the HA group score for a device was limited. This release accesses the HA group and configures the minimum requirement for each of the contributors to consider that contributor valid. The HA Group Score calculation has been changed by summing the weights of the group contributors (cluster members, pool members, and trunk members). If any of the contributors has a 0 (zero) value, the entire score is considered 0. An HA Group score of 0 is considered 'offline' or 'ineligible'.

Behavior Change

The high availability (HA) Group Score calculation has been modified as follows: HA Monitor is simply a list of health checks that all must be true for device to host a traffic group. -- Configuration contains an HA group with associated components (trunks, pools, cluster members). -- 'Minimum-threshold' setting of any of those members ---- Number of available members in a pool is at least N. ---- Number of connected links in a trunk is at least N. ---- Number of available blades in a chassis is at least N, If all of the health checks pass then the traffic group continues to run on current device, otherwise it fails over. The 'minimum-threshold' setting of any of those members is non-zero with non-zero amount of members. If a component value is 0 because it has 0 members (but also has a minimum-threshold equal to 0) then the group is summed normally.