Last Modified: Oct 06, 2020
See more info
Known Affected Versions:
11.4.1, 11.5.1, 11.5.1 HF1, 11.5.1 HF10, 11.5.1 HF11, 11.5.1 HF2, 11.5.1 HF3, 11.5.1 HF4, 11.5.1 HF5, 11.5.1 HF6, 11.5.1 HF7, 11.5.1 HF8, 11.5.1 HF9, 11.5.10, 11.5.2, 11.5.2 HF1, 11.5.3, 11.5.3 HF1, 11.5.3 HF2, 11.5.4, 11.5.4 HF1, 11.5.4 HF2, 11.5.4 HF3, 11.5.4 HF4, 11.5.5, 11.5.6, 11.5.7, 11.5.8, 11.5.9, 12.0.0, 12.0.0 HF1, 12.0.0 HF2, 12.0.0 HF3, 12.0.0 HF4
Opened: Sep 30, 2014
For certain blade failures scenarios the HA score on the remaining blades does not update, and thus a failover does not occur, for at least ten seconds. This is because the remaining blades wait for a ten second timeout period before marking the powered-off blade as down.
The expected failover will not occur for at least ten seconds
A blade is powered off via the serial console or the 'bladectl' command, or the blade is physically removed from the chassis, and the chassis is configured in an HA pair where the loss of a blade should result in a failover.
There is no workaround for this issue.
The issue has been addressed with two separate changes. The first results in a cluster member being marked down immediately when its blade is physically removed from the chassis. The second is the addition of a DB variable ("Clusterd.PeerMemberTimeout") that allows configuring of the timeout value used to determine when an unresponsive blade has been marked down. This controls how long before an unresponsive cluster member is marked down by its peers. Its default value is ten seconds, and it can be set as low as one second. This can help lower the delay before a failover occurs in the event of other blade power down scenarios, such as when a blade is powered down via the serial console or the 'bladectl' command.