Bug ID 1045177: Stale interfaces are left behind upon portgroup mode change

Last Modified: Nov 07, 2022

Bug Tracker

Affected Product:  See more info
F5OS Install/Upgrade, Velos(all modules)

Known Affected Versions:
1.0.0, 1.0.0-420.0, 1.0.1, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.2.0, 1.2.1, 1.2.2

Fixed In:
1.3.0

Opened: Sep 07, 2021
Severity: 3-Major

Symptoms

In some situations, stale interfaces are left behind in the config cdb when the portgroup mode changes, for instance from 100GB to 40GB, 4x25GB, or 4x10GB. This causes the l2-agent on the blade to exit (and get into a restart loop)

Impact

The interfaces corresponding to portgroups are not present and stale interfaces are left behind.

Conditions

-- reset-to-defaults/backup/restore OR -- live install -- change the portgroup mode, e.g. from 100GB to 40GB -- commit

Workaround

Steps for mitigation: 1) Verify the issue is caused by the lack of pgindex in the Confd database: a) From config mode in a chassis partition, create a backup file. (config)# system database config-backup name test-for-ID1045177 b) Look for pgindex in the /var/F5/partition{id}/configs/test-for-ID1045177 file: grep pgindex /var/F5/partition[1-8]/configs/test-for-ID1045177 c) If no entries are found, this is the issue. Proceed with Step 2. 2) Take UCS backups of BIG-IP tenants in the chassis partition. Download them to a safe location. 3) Set the running-state of BIG-IP tenants in the chassis partition to 'provisioned'. E.g. testpart-1(config)# tenants tenant bigip-tenant-1 config running-state provisioned ; top testpart-1(config)# tenants tenant bigip-tenant-2 config running-state provisioned ; top testpart-1(config)# commit 4) Do not let too much time pass by between steps 4a and 4b. We recommend they are completed within 30 seconds. Step 4a changes the slot partition assignment which causes the blade to reboot, and if 4b is not executed promptly, the blade will boot up and detect that its partition assignment has changed, triggering a data clean-up task that delete previous tenant virtual disk files. This will result in the BIG-IP tenants starting up with factory-default settings. a) Remove the slots corresponding to the impacted chassis partition from the system controller configuration and commit. E.g. syscon-2-active(config)# slots slot 1 partition none ; top syscon-2-active(config)# slots slot 2 partition none ; top syscon-2-active(config)# commit b) Re-add the slots corresponding to the impacted chassis partition from the system controller configuration and commit. E.g. syscon-2-active(config)# slots slot 1 partition NameOfPartition ; top syscon-2-active(config)# slots slot 2 partition NameOfPartition ; top syscon-2-active(config)# commit 5) From the chassis partition CLI, wait until the following command shows all blades present and with the status 'replica'. show system redundancy nodes | repeat 6) From the chassis partition CLI, change the portgroup mode from 100GB to 40GB and commit. You will be prompted to reboot the blades. E.g. (config)# portgroups portgroup 1/1 config mode MODE_40GB; top (config)# portgroups portgroup 1/2 config mode MODE_40GB; top (config)# commit The following warnings were generated: 'portgroups portgroup': Blade(s) 1 will reboot Proceed? [yes,no] yes 7) Monitor 'show system redundancy' and wait again for the blades to show the status 'replica'. show system redundancy nodes | repeat At this point the interfaces should be republished matching the new 40GB mode. 8) Reapply the interface configuration. As appropriate for your environment, assign VLANs to interfaces, assign interfaces to LAGs, and apply per-interface LLDP and STP settings. 9) Set the running-state of BIG-IP tenants in the chassis partition to 'deployed'. testpart-1(config)# tenants tenant bigip-tenant-1 config running-state deployed ; top testpart-1(config)# tenants tenant bigip-tenant-2 config running-state deployed ; top testpart-1(config)# commit

Fix Information

Proper interfaces are now published when portgroup mode is changed. An affected partition may still be affected after an upgrade to F5OS-C 1.3.0 or later, but once remediated, the issue should not occur.

Behavior Change