Bug ID 1207537: Chassis partition ConfD may fail to start completely during controller rolling upgrade

Last Modified: Sep 27, 2024

Affected Product(s):
F5OS Velos(all modules)

Known Affected Versions:
F5OS-C 1.5.0, F5OS-C 1.5.1

Fixed In:
F5OS-C 1.6.0

Opened: Dec 07, 2022

Severity: 2-Critical

Symptoms

Following a controller rolling upgrade, one or both of the chassis partition controller instances may fail to start completely. This can be seen by running the "show partitions" command. Normal status is that one controller instance will show "running-active" and one will show "running-standby". If any other status is shown (running, offline, failed, or no status), then the database is not operating correctly.

Impact

One or both instances of the chassis partition control plane are not operating. This will prevent the chassis partition rolling upgrade, and may stop tenant traffic.

Conditions

At database startup, it is possible for a chassis partition to hang retrieving the database primary key. The presence of this defect confirmed by observing this message at the end of the partition devel.log file: ERR> 6-Jan-2023::17:51:49.205 partition1 confd[109]: confd encryptedStrings command timed out after 300000 ms inactivity

Workaround

If the chassis partition is in this state, it can be recovered by disabling the partition, waiting for both instances to transition to "disabled", and then re-enabling. The error state is unlikely to occur unless the partition startup happens during a controller failover.

Fix Information

None

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips