Bug ID 866957: Load balancing IPsec tunnels

Last Modified: Jul 12, 2023

Affected Product(s):
BIG-IP LTM(all modules)

Known Affected Versions:
16.0.0, 16.0.0.1, 16.0.1, 16.0.1.1, 16.0.1.2

Fixed In:
16.1.0

Opened: Jan 07, 2020

Severity: 2-Critical

Symptoms

IPsec can experience packet loss on oversubscribed TMM instances (reaching 100% CPU transiently or consistently) and other TMM instances do not share the load.

Impact

If random assignment of IPsec tunnels to TMM instances results in one TMM needing more than 100% CPU to handle all the traffic, packets are lost. When packets are lost, they are retransmitted, and BIG-IP network performance drops in proportion to the packet loss.

Conditions

-- A large number of IPsec tunnels. -- The Security Associations (SAs) associated with IPsec tunnels are not balanced across TMMs. -- Other TMMs are less busy.

Workaround

None

Fix Information

The following sys db variables offer better tmm load balancing for IPsec tunnels. By default these variables are zero. tmsh modify sys db ipsec.sp.owner value 1 tmsh modify sys db ipsec.sp.migrate value 1 tmsh modify sys db ipsec.pfkey.load value 2 F5 strongly recommends that these values only be set under the direction of F5 Support, Consultants or Pre-Sales engineers. This issue occurs only in extreme cases. TMM instances can be CPU pinned by other traffic outside of IPsec, so it is critical to first ascertain that IPsec traffic is resulting in poor TMM performance before implementing the variables. Notation "sp" refers to an IPsec object where tunnel SAs live. Variable ipsec.sp.owner controls whether these have a tmm owner that can be assigned. Variable ipsec.sp.migrate controls whether tunnels can migrate automatically based on CPU load. Variable ipsec.pfkey.load controls frequency of inter-TMM messages about CPU load to enable load-balancing decisions. The value of ipsec.pfkey.load is seconds between CPU load update messages. The highest possible frequency is once a second, for value "1". Any value larger than 4 or 5 seconds runs a risk of using CPU load information too out-of-date to accurately balance load.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips