Bug ID 1081281: Multi-node BIG-IP tenants may fail to cluster after rolling upgrade

Last Modified: May 29, 2024

Affected Product(s):
F5OS Velos(all modules)

Fixed In:
F5OS-C 1.6.0

Opened: Feb 16, 2022

Severity: 2-Critical

Symptoms

BIG-IP tenant instances may fail to cluster after a rolling upgrade, due to the CHASSIS_SERIAL_NO being set incorrectly in the config-map that is used to deploy the tenant instance. This can be seen in the "show tmsh sys cluster" output on the tenant showing the slots in a failed state: root@(localhost)(cfg-sync Standalone)(/S1-green-P::Active)(/Common)(tmos)# show sys cluster ----------------------------------------- Sys::Cluster: default ----------------------------------------- Address 10.238.133.200/24 Alt-Address :: Availability available State enabled Reason Cluster Enabled Primary Slot ID 1 Primary Selection Time 07/21/22 01:10:47 ------------------------------------------------------------------------------------------- | Sys::Cluster Members | ID Address Alt-Address Availability State Licensed high availability (HA) Clusterd Reason ------------------------------------------------------------------------------------------- | 1 :: :: available enabled true active running Run | 2 :: :: offline enabled false unknown shutdown Slot Failed | 3 :: :: offline enabled false unknown shutdown Slot Failed | 4 :: :: offline enabled false unknown shutdown Slot Failed | 5 :: :: offline enabled false unknown shutdown Slot Failed | 6 :: :: offline enabled false unknown shutdown Slot Failed | 7 :: :: offline enabled false unknown shutdown Slot Failed | 8 :: :: offline enabled false unknown shutdown Slot Failed This condition can verified by display the config map for a tenant instance and verifying that the CHASSIS_SERIAL_NO field is empty. e.g. From the system controller shell: oc get cm -n partition-1 <tenant_name>-<blade_#>-configmap -o json | egrep CHASSIS Bad Entry: # oc get cm -n partition-1 bigiptenant1-1-configmap -o json | egrep CHASSIS; done "CHASSIS_SERIAL_NO": "", Good Entry: oc get cm -n partition-1 bigiptenant1-2-configmap -o json | egrep CHASSIS; done "CHASSIS_SERIAL_NO": "chs414616s",

Impact

If this issue occurs, one more instance of the tenant may not communicate correctly, which can cause some or all of the data plane to not function correctly, causing an outage.

Conditions

This can happen during a rolling upgrade if the CHASSIS_SERIAL_NO field is not read correctly and the tenant instance is restarted as part of the rolling upgrade. This is an intermittent issue.

Workaround

1.) Set tenant(s) state to provisioned for BIG-IP, or configured for BIG-IP Next. 2.) Once the tenant(s) have stopped, disable the partition. 3.) Re-enable the partition. 4.) Set tenant(s) state back to deployed.

Fix Information

N/A

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips