Bug ID 1135181: Controller rolling upgrade may cause blades to reboot into partition "none", deleting tenant data

Last Modified: May 29, 2024

Affected Product(s):
F5OS Velos(all modules)

Known Affected Versions:
F5OS-C 1.5.0

Fixed In:
F5OS-C 1.6.0, F5OS-C 1.5.1

Opened: Aug 09, 2022

Severity: 2-Critical

Symptoms

System controller components can hang during controller rolling upgrade, resulting in failure to start the partitions correctly, and other incorrect operation. Partition instance state may show as "failed", "offline", or "running", rather than the normal "running-active"/"running-standby". If switchd hangs in during rolling upgrade, this will cause failure messages when blades reboot.

Impact

Tenant instance data (the virtual disk image) may be deleted from the blades if the blades are rebooted while this issue is occurring.

Conditions

Performing a system controller rolling upgrade to version F5OS-C 1.5.0 from an earlier version.

Workaround

The problem can be avoided by performing an out-of-service upgrade, using the "out-of-service true" option with the system controller "system image set-version" command. If a VELOS chassis has already undergone a rolling upgrade to F5OS-C 1.5.0, reboot both system controllers to get them back into a stable state. If blades in a partition were affected, reboot those blades after rebooting the system controllers. The tenant instance data cannot be recovered, and must be recreated and/or restored from a UCS backup.

Fix Information

The system controller components no longer hang during the rolling upgrade.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips