Bug ID 578971: When mcpd is restarted on a blade, cluster members may be temporarily marked as failed

Last Modified: Mar 12, 2019

Bug Tracker

Affected Product:  See more info
BIG-IP All(all modules)

Known Affected Versions:
11.4.1, 11.5.1, 11.5.1 HF1, 11.5.1 HF10, 11.5.1 HF11, 11.5.1 HF2, 11.5.1 HF3, 11.5.1 HF4, 11.5.1 HF5, 11.5.1 HF6, 11.5.1 HF7, 11.5.1 HF8, 11.5.1 HF9, 11.5.2, 11.5.2 HF1, 11.5.3, 11.5.3 HF1, 11.5.3 HF2, 11.5.4, 11.5.4 HF1, 11.5.4 HF2, 11.5.4 HF3, 11.5.4 HF4, 11.5.5, 11.5.6, 11.5.7, 11.5.8, 11.6.0, 11.6.0 HF1, 11.6.0 HF2, 11.6.0 HF3, 11.6.0 HF4, 11.6.0 HF5, 11.6.0 HF6, 11.6.0 HF7, 11.6.0 HF8, 11.6.1, 11.6.1 HF1, 12.1.0, 12.1.0 HF1, 12.1.0 HF2, 12.1.1, 12.1.1 HF1, 12.1.1 HF2, 12.1.2, 12.1.2 HF1, 12.1.2 HF2, 12.1.3, 12.1.3.1, 12.1.3.2, 12.1.3.3, 12.1.3.4, 12.1.3.5, 12.1.3.6

Fixed In:
13.0.0, 12.1.3.7, 11.6.1 HF2, 11.5.9

Opened: Mar 09, 2016
Severity: 3-Major

Symptoms

When mcpd is restarted on a blade, the clusterd process on that blade may become blocked for some time. This may result in cluster member heartbeat timeouts, which are seen in the /var/log/ltm log file with messages that include: "Slot 1 suffered heartbeat timeout ..." This causes cluster members to be marked failed. The condition resolves itself within one minute, and the cluster fully recovers on its own.

Impact

Though all blades recover on their own, the cluster members being marked fail may result in a failover.

Conditions

Mcpd is restarted on a blade.

Workaround

There is no workaround for this issue. It is recommended to avoid restarting mcpd on any blade belonging to the active unit of an HA group. The issue resolves itself within about a minute, and all cluster members will be marked as up again.

Fix Information

The clusterd daemon has been fixed to no longer become blocked when mcpd is restarted. This prevents the cluster member heartbeat timeouts from occurring, and thus no cluster members will be marked failed.

Behavior Change