Last Modified: Oct 04, 2024
Affected Product(s):
BIG-IP (all modules)
Known Affected Versions:
13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 13.1.3, 13.1.3.1, 13.1.3.2, 13.1.3.3, 13.1.3.4, 13.1.3.5, 13.1.3.6, 13.1.4, 13.1.4.1, 13.1.5, 13.1.5.1, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 15.0.0, 15.0.1, 15.0.1.1, 15.0.1.2, 15.0.1.3, 15.0.1.4, 15.1.1, 15.1.2, 15.1.2.1, 15.1.3, 15.1.3.1, 15.1.4, 15.1.4.1, 15.1.5, 15.1.5.1, 15.1.6, 15.1.6.1, 15.1.7, 15.1.8, 15.1.8.1, 15.1.8.2, 15.1.9, 15.1.9.1, 15.1.10, 15.1.10.2, 15.1.10.3, 15.1.10.4, 15.1.10.5, 16.0.0, 16.0.0.1, 16.0.1, 16.0.1.1, 16.0.1.2, 16.1.0, 16.1.1, 16.1.2, 16.1.2.1, 16.1.2.2, 16.1.3, 16.1.3.1, 16.1.3.2, 16.1.3.3, 16.1.3.4, 16.1.3.5, 16.1.4, 16.1.4.1, 16.1.4.2, 16.1.4.3, 16.1.5, 16.1.5.1, 17.0.0, 17.0.0.1, 17.0.0.2, 17.1.0, 17.1.0.1, 17.1.0.2, 17.1.0.3, 17.1.1, 17.1.1.1, 17.1.1.2, 17.1.1.3, 17.1.1.4
Opened: Mar 17, 2023 Severity: 3-Major
On a multi-slot chassis, VCMP guest, or F5OS tenant, clusterd can enter a shutdown state causing some slots to become unavailable. The event that can cause this is called a partition and occurs when clusterd stops receiving heartbeat packets from a slot over the mgmt_bp interface but is still receiving them over the tmm_bp interface. Here is the error that is logged when this occurs: Mar 17 10:38:28 localhost err clusterd[4732]: 013a0004:3: Marking slot 1 SS_FAILED due to partition detected on mgmt_bp from peer 2 to local 1 When this occurs, clusterd enters a shutdown state and at times will never recover. Here is an example, tmsh show sys cluster command where clusterd is in the shutdown yet waiting state: ----------------------------------------- Sys::Cluster: default ----------------------------------------- Address 172.0.0.160/23 Alt-Address :: Availability available State enabled Reason Cluster Enabled Primary Slot ID 2 Primary Selection Time 03/17/23 10:38:30 ---------------------------------------------------------------------------------- | Sys::Cluster Members | ID Address Alt-Address Availability State Licensed HA Clusterd Reason ---------------------------------------------------------------------------------- | 1 :: :: unknown enabled false unknown shutdown ShutDown: default/1 waiting for blade 2 | 2 :: :: available enabled true standby running Run
The unavailable slots/blades will not accept traffic.
Multi-slot chassis, VCMP guest, or F5OS tenant. A blade determines there is a partition where it's receiving cluster packets over the tmm+bp interface but not the mgmt_bp interface.
Running tmsh show sys cluster will report the primary slot and all slot statuses. For all blades reporting shutdown or less likely initializing and "waiting for blade(s)" restart clusterd on that slot with bigstart restart clusterd. Ensure you do not restart clusterd on the primary slot.
None