Bug ID 1128369: GTM (DNS) /Common/bigip monitor instances may show 'big3d: timed out' state

Last Modified: Jul 24, 2024

Affected Product(s):
BIG-IP TMOS(all modules)

Known Affected Versions:
13.1.4.1, 13.1.5, 13.1.5.1, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.5, 14.1.0.6, 14.1.2, 14.1.2.1, 14.1.2.2, 14.1.2.3, 14.1.2.4, 14.1.2.5, 14.1.2.6, 14.1.2.7, 14.1.2.8, 14.1.3, 14.1.3.1, 14.1.4, 14.1.4.1, 14.1.4.2, 14.1.4.3, 14.1.4.4, 14.1.4.5, 14.1.4.6, 14.1.5, 14.1.5.1, 14.1.5.2, 14.1.5.3, 14.1.5.4, 14.1.5.6, 15.0.0, 15.0.1, 15.0.1.1, 15.0.1.2, 15.0.1.3, 15.0.1.4, 15.1.0, 15.1.0.1, 15.1.0.2, 15.1.0.3, 15.1.0.4, 15.1.0.5, 15.1.1, 15.1.2, 15.1.2.1, 15.1.3, 15.1.3.1, 15.1.4, 15.1.4.1, 15.1.5, 15.1.5.1, 15.1.6, 15.1.6.1, 15.1.7, 15.1.8, 15.1.8.1, 15.1.8.2, 15.1.9, 15.1.9.1, 15.1.10, 15.1.10.2, 15.1.10.3, 15.1.10.4, 16.0.0, 16.0.0.1, 16.0.1, 16.0.1.1, 16.0.1.2, 16.1.0, 16.1.1, 16.1.2, 16.1.2.1, 16.1.2.2, 16.1.3, 16.1.3.1, 16.1.3.2, 16.1.3.3, 16.1.3.4, 16.1.3.5, 16.1.4, 16.1.4.1, 16.1.4.2, 16.1.4.3, 17.0.0, 17.0.0.1, 17.0.0.2, 17.1.0, 17.1.0.1, 17.1.0.2, 17.1.0.3, 17.1.1, 17.1.1.1, 17.1.1.2, 17.1.1.3

Fixed In:
16.1.5

Opened: Jul 22, 2022

Severity: 3-Major

Symptoms

On affected versions of BIG-IP DNS, targets monitored with a "bigip" type monitor may show as 'big3d: timed out', or flap between that state and green. While there can be many causes of the 'big3d: timed out' state (which indicates that a GTM monitor probe reply was expected, but not received within the timeout period), this particular cause is due to the order that the probes are sent, resulting in a bunching effect, where all the probes related to the same big3d (LTM) device are sent in rapid succession, leading to the message buffer between big3d and mcpd on the LTM becoming congested. When gtmd schedules monitor probes, all the probes with the same interval are grouped together and spread out across the interval period. The issue is that within that list, monitors for the same gtm server can be grouped together, causing them to be sent to big3d in rapid succession. When this happens, some of the messages relating to BIG-IP monitor probes may be dropped, and no response is sent back to the members of the GTM sync group.

Impact

DNS (GTM) monitored targets that use a /Common/bigip probe type may be incorrectly marked down with a state of 'big3d: timed out'. Note that this is not the only cause of this down state.

Conditions

- Running an affected version of BIG-IP DNS (versions that include the changes from ID863917) - Use of a /Common/bigip monitor probe type - Monitoring of sufficient targets per LTM to cause the message buffer between big3d and mcpd to fill (there is no indication or log message when this has happened)

Workaround

It is possible to work around this issue by creating separate monitor lists for each gtm server, so that all the probes related to the same big3d are spread out in time across the monitoring interval. To do this: - Create a separate BIG-IP monitor for each gtm server object with monitored virtual servers. - Set the interval value for each of those BIG-IP monitors to a different value. For example, instead of the default 30-second BIG-IP probe interval, create monitors of 30,31,32,33,34,35,... seconds. Values of less than 30 seconds are not recommended, as these will increase the monitoring load further. - Apply the new monitors to each gtm server so that each one has a different monitoring interval.

Fix Information

gtmd monitor probes with the same interval are scrambled in oder so that the probes related to a target big3d (LTM) will be spread evenly across the entire interval time. This results in avoiding the bunching of probes to a given target LTM, thereby preventing congestion at the target LTM.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips