Bug ID 422460: Delayed response by very busy MCPD disrupts communication with TMM

Last Modified: Oct 16, 2023

Affected Product(s):
BIG-IP AFM, APM, ASM, LTM(all modules)

Known Affected Versions:
11.0.0, 11.6.0 HF1, 11.6.0 HF2, 11.6.0 HF3, 11.6.0 HF4, 11.5.1 HF1, 11.5.1 HF2, 11.5.1 HF3, 11.5.1 HF4, 11.5.1 HF5, 11.5.1 HF6, 11.5.1 HF7, 11.5.1 HF8, 11.5.1 HF9, 11.5.1 HF10, 11.5.1 HF11, 11.5.2 HF1, 11.5.3 HF1, 11.5.3 HF2, 11.5.4 HF1, 11.5.4 HF2, 11.5.4 HF3, 11.5.4 HF4, 11.1.0, 11.2.0, 11.2.1, 11.3.0, 11.4.0, 11.4.1, 11.5.0, 11.5.1, 11.5.2, 11.5.3, 11.5.4, 11.5.5, 11.5.6, 11.5.7, 11.5.8, 11.5.9, 11.5.10

Fixed In:
11.6.0 HF5, 11.4.1 HF9

Opened: Jun 03, 2013

Severity: 2-Critical

Related Article: K14498

Symptoms

TMM restarts without any core file on startup or when mcpd is loading the configuration if the size of configuration is considered large (e.g., more than 1000 passive monitors). TMM restarts without any core file while running "tmsh show sys connection" or "tmsh show sys connection" with a large connection table (e.g., 500 KB and 600 KM, respectively).

Impact

Traffic processed by the affected TMM instance is interrupted while TMM restarts. TMM might enter a restart loop and restart multiple times, without producing a core file. You might see errors similar to the following in log/tmm or log/daemon: -- LTM01 crit tmm11[28599]: 01010020:2: MCP Connection aborted, exiting. -- LTM01 emerg logger: Re-starting tmm. This might cause serious traffic disruption.

Conditions

This issue occurs when all of the following conditions are met: -- The mcpd process loads a large configuration with thousands of objects. -- The platform is running 12 or more TMM instances (BIG-IP 11000/11050 platforms, or VIPRION B4300 blades). Or: -- You run "tmsh show sys connection" or "tmsh show sys connection all-properties". -- The platform is running 12 or more TMM instances (BIG-IP 11000/11050 platforms, or VIPRION B4300/B4450 blades).

Workaround

This workaround is a mitigation and may not work in all cases; the zero-window timeout may need to be adjusted to a higher value for some configurations. To work around this issue, increase the timeout used for the MCP connection. 1. Open the tmm_base.tcl file for modification. 2. Locate the tcp _mcptcp stanza. 3. Add the following line: zero_window_timeout 300000 This lengthens the timeout, which avoids the restart. For more information, see K14498: The mcpd connection to TMM may time out on either startup or configuration load and cause TMM to restart, available here: https://support.f5.com/csp/article/K14498.

Fix Information

For most configurations, TMM no longer restarts on startup/config-load if it has too many objects to publish back during config load. The mitigation fix increases an internal buffer. This provides sufficient time for most configurations. Some configurations might require still more time. If the issue still occurs, you can increase the zero-window timeout until the configuration loads without problems. To completely address the issue,

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips