Bug ID 480699: HA mirroring can overflow buffer limits on larger platforms

Last Modified: Jul 13, 2024

Affected Product(s):
BIG-IP LTM(all modules)

Known Affected Versions:
10.2.4, 11.4.0, 11.4.1, 11.5.0, 11.5.1, 11.5.1 HF1, 11.5.1 HF2, 11.5.1 HF3, 11.5.1 HF4, 11.5.1 HF5, 11.6.0, 11.6.0 HF1, 11.6.0 HF2, 11.6.0 HF3, 11.6.0 HF4

Fixed In:
12.0.0, 11.6.0 HF5, 11.5.2, 11.5.1 HF6

Opened: Sep 23, 2014

Severity: 2-Critical

Related Article: K15728

Symptoms

When using mirroring, some connections between HA peers may overflow buffers and enter a state in which the buffer is repeatedly reset due to overflow.

Impact

In this state, failover can lose more than the expected number of L4 connections, and no L7 connections are mirrored. Note that any failure invalidates L7 mirroring; L4 mirroring recovers from occasional HA connection failures including those related to overflow (provided the HA connection remains up for at least one minute after reconnecting).

Conditions

LTM logs show resets, usually within one minute of each other. Viewing tmctl ha_stat shows the 'overflows' count incrementing by one approximately every minute or less. The 'buffered' count then increases, until at the maximum the 'overflows' count increments again. This does not apply to cases in which client and server bandwidth are far in excess of mirroring bandwidth, nor to cases in which there are occasional but not frequent overflows.

Workaround

Try increasing the statemirror.queuelen to 256 MB (the current maximum) until repeated buffer overflows stop. If overflows continue after the maximum is set, there is no further workaround.

Fix Information

Increased the maximum statemirror.queuelen db variable limits. If necessary, the statemirror.queuelen can now be increased beyond 256 MB up to 1 GB. Note that increasing the statemirror.queuelen increases memory requirements to approximately twice the queuelen multiplied by the number of TMMs, and also increases the time required to detect an error in the mirroring connection. The statemirror.queuelen should be kept as low as possible to prevent repeated failure.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips