Bug ID 542564: bigd detection and logging of load and overload

Last Modified: Jul 13, 2024

Affected Product(s):
BIG-IP LTM(all modules)

Known Affected Versions:
11.6.0, 11.6.0 HF1, 11.6.0 HF2, 11.6.0 HF3, 11.6.0 HF4, 11.6.0 HF5, 11.6.0 HF6, 11.6.0 HF7, 11.6.0 HF8, 12.0.0, 12.0.0 HF1, 12.0.0 HF2

Fixed In:
12.1.0, 12.0.0 HF3, 11.6.1

Opened: Aug 31, 2015

Severity: 2-Critical

Symptoms

The bigd process cannot detect overload, and does not log its load status. This makes it difficult to determine whether bigd is close to its limits.

Impact

bigd might fail to service monitors in a timely fashion, when under extreme load, which might result in 'flapping' nodes/pool members (where the node/pool member goes down and back up even though the server itself has not gone down).

Conditions

The bigd process might reach limits when there is very high load with high probe rate (monitor instances per second).

Workaround

-- Increase the probe interval for monitors so they probe less often. -- Switch from more 'expensive' monitors (e.g., https) to simpler monitors (e.g., http, tcp, tcp half-open, icmp).

Fix Information

This release provides modifications to peak performance to significantly reduce the chance of node flapping. In addition, the ability to monitor bigd load has been added. Because bigd is not integrated with tmstats, the system logs load stats to the debug log file, /var/log/bigdlog. When debug logging is turned on, stats are mixed with the debug output. Load stats can be emitted independently with the following sys db var: modify sys db bigd.debug.timingstats value enable. With this db variable enabled, the system emits bigd load data to the debug log periodically (every 15 seconds per bigd process). The columns correspond to these stats: - load (0-100%) 1-minute mean. - load (0-100%) 5-minute mean. - number of monitor instances active for this bigd process. - number of active file descriptors, 30-second average, this process. - peak number of active file descriptors past 30 seconds, this process. In addition, the system logs warning messages to /var/log/ltm when bigd reaches 80%, 90%, and 95% load levels. The system logs an overload error to /var/log/ltm when bigd detects it is overloaded. The load level indicating overload is in the bigd.overload.latency sys db variable, which is set to 98% load, by default.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips