Bug ID 780437: Upon rebooting a VIPRION chassis provisioned as a vCMP host, some vCMP guests can return online with no configuration.

Last Modified: Sep 11, 2019

Bug Tracker

Affected Product:  See more info
BIG-IP vCMP(all modules)

Known Affected Versions:
12.1.0, 12.1.0 HF1, 12.1.0 HF2, 12.1.1, 12.1.1 HF1, 12.1.1 HF2, 12.1.2, 12.1.2 HF1, 12.1.2 HF2, 12.1.3, 12.1.3.1, 12.1.3.2, 12.1.3.3, 12.1.3.4, 12.1.3.5, 12.1.3.6, 12.1.3.7, 12.1.4, 12.1.4.1, 12.1.5, 13.0.0, 13.0.0 HF1, 13.0.0 HF2, 13.0.0 HF3, 13.0.1, 13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 13.1.3, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.4, 14.1.0.5, 14.1.0.6, 14.1.2, 15.0.0, 15.0.1

Opened: May 07, 2019
Severity: 2-Critical

Symptoms

It is possible, although unlikely, for a vCMP host to scan the /shared/vmdisks directory for virtual disk files while the directory is unmounted. As such, virtual disk files that existed before the reboot will not be detected, and the vCMP host will proceed to create them again. The virtual disks get created again, delaying the guests from booting. Once the guests finally boot, they have no configuration. Additionally, the new virtual disk files are created on the wrong disk device, as /shared/vmdisks is still unmounted. Symptoms for this issue include: -- Running the 'mount' command on affected host blades and noticing that /shared/vmdisks is not mounted. -- Running the 'tmsh show vcmp guest' command on affected host blades (early on after the reboot) and noticing some guests have status 'installing-vdisk'. -- Running the 'lsof' command on affected and unaffected host blades shows different device numbers for the filesystem hosting the virtual disks, as shown in the following example (note 253,16 and 253,1): qemu-kvm 19386 qemu 15u REG 253,16 161061273600 8622659 /shared/vmdisks/s1g2.img qemu-kvm 38655 qemu 15u REG 253,1 161061273600 2678798 /shared/vmdisks/s2g1.img -- The /var/log/ltm file includes entries similar to the following example, indicating new virtual disks are being created for one of more vCMP guests: info vcmpd[x]: 01510007:6: VDisk (s2g1.img/2): Adding. info vcmpd[x]: 01510007:6: VDisk (s2g1.img/2): Syncing with MCP - [filename:s2g1.img slot:2 installed_os:0 state:0] notice vcmpd[x]: 01510006:5: Guest (s2g1): Creating VDisk (/shared/vmdisks/s2g1.img) info vcmpd[x]: 01510007:6: VDisk (s2g1.img/2): Syncing with MCP - [filename:s2g1.img slot:2 installed_os:0 state:1] info vcmpd[x]: 01510007:6: Guest (s2g1): VS_ACQUIRING_VDISK->VS_WAITING_INSTALL info vcmpd[x]: 01510007:6: Guest (s2g1): VS_WAITING_INSTALL->VS_INSTALLING_VDISK notice vcmpd[x]: 01510006:5: Guest (s2g1): Installing image (/shared/images/BIGIP-12.1.2.0.0.249.iso) to VDisk (/shared/vmdisks/s2g1.img). info vcmpd[x]: 01510007:6: VDisk (s2g1.img/2): Syncing with MCP - [filename:s2g1.img slot:2 installed_os:0 state:2]

Impact

-- Loss of entire configuration on previously working vCMP guests. -- The /shared/vmdisks directory, in its unmounted state, may not have sufficient disk space to accommodate all the virtual disks for the vCMP guests designated to run on that blade. As such, some guests may fail to start. -- If you continue using the affected guests by re-deploying configuration to them, further configuration loss may occur after a new chassis reboot during which this issue does not happen. This occurs because the guests would then be using the original virtual disk files; however, their configuration may have changed since then, and so some recently created objects may be missing.

Conditions

-- VIPRION chassis provisioned in vCMP mode with more than one blade in it. -- Large configuration with many guests. -- The VIPRION chassis is rebooted. -- A different issue, of type 'Configuration from primary failed validation' occurs during startup on one or more Secondary blades. By design, MCPD restarts once on affected Secondary blades, which is the trigger for this issue. An example of such a trigger issue is Bug ID 563905: Upon rebooting a multi-blade VIPRION or vCMP guest, MCPD can restart once on Secondary blades.

Workaround

There is no workaround to prevent this issue. However, you can minimize the risk of hitting this issue by ensuring you are running a software version (on the host system) where all known 'Configuration from primary failed validation' issues have been resolved. If you believe you are currently affected by this issue, please contact F5 Networks Technical Support for assistance in recovering the original virtual disk files.

Fix Information

None

Behavior Change