Bug ID 950901: Wrong chassis serial number chs599996s for VELOS tenants

Last Modified: May 29, 2024

Affected Product(s):
F5OS Velos(all modules)

Known Affected Versions:
F5OS-C 1.0.0, F5OS-C 1.1.0, F5OS-C 1.1.1, F5OS-C 1.1.2, F5OS-C 1.1.3, F5OS-C 1.1.4, F5OS-C 1.3.0, F5OS-C 1.3.1, F5OS-C 1.3.2

Fixed In:
F5OS-C 1.5.0

Opened: Oct 04, 2020

Severity: 2-Critical

Symptoms

Under some corner cases, system controllers may have a blank serial number in the /etc/PLATFORM file. If partition software is started during this period, any tenants deployed on that partition will report an incorrect chassis serial number chs599996s.

Impact

VELOS tenants may report the wrong serial number, visible in "tmsh show sys hardware" or "tmsh list cm device". If a VELOS tenant spans multiple blades, and the different blades pick up different serial numbers from F5OS, the tenant may fail to properly cluster; multiple tenant blades will function as cluster primary, competing over the cluster management IP. Affected tenants may also generate an erroneous cluster configuration file /shared/db/cluster.conf.chs599996s. The correct cluster.conf.<serial> file will persist but its contents may now be stale. At a future time, when the chassis is rebooted and the correct serial number is provided to the tenant again, it may load the stale cluster configuration file. Depending on what is in the stale cluster configuration file, other unexpected actions may result such as rebooting into an older boot location.

Conditions

- Restarting the system controllers - Removing and adding the blades from the tenant

Workaround

The correct chassis serial number can be seen in the license file, which can be viewed from a tenant by running "tmsh show sys license". If a tenant is currently bifurcated (multiple blades functioning as cluster primary), the immediate mitigation is to set the tenant to "provisioned" and then back to "deployed". If a tenant is reporting the incorrect chassis serial number (chs599996s), then the following should restore the correct serial number: 1. Determine which controller has a blank serial number for the partition, by looking at "grep CHASSIS_SERIAL_NO /var/log/sw-util.log | tail -1" run on each controller, e.g.: [root@controller-2 ~]# for i in controller-{1,2}; do echo -n "$i: "; ssh $i grep CHASSIS_SERIAL_NO /var/log/sw-util.log | tail -1; done controller-1: ++ CHASSIS_SERIAL_NO=chs700144s controller-2: ++ CHASSIS_SERIAL_NO= [root@controller-2 ~]# In this example, controller-2 is affected. 2. Reboot the affected controller. After it reboots, check whether it has a blank serial number. Repeat this step until it boots, and reports a non-blank serial number. 3. From the partition, set the tenant to "provisioned" and then back to "deployed". 4. After the tenant reboots, confirm the serial numbers is now correct (not chs599996s) in the output of "tmsh show sys hardware"

Fix Information

None

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips