Bug ID 1388525: Partition configuration database locks up, preventing database changes

Last Modified: May 29, 2024

Affected Product(s):
F5OS F5OS-C(all modules)

Known Affected Versions:
F5OS-C 1.6.0, F5OS-C 1.6.1

Fixed In:
F5OS-C 1.6.2

Opened: Oct 24, 2023

Severity: 2-Critical

Symptoms

At times, the partition HA cluster fails to start up correctly, leading to issues with database replicas and the secondary controller instance not reaching "standby". The "show system redundancy" command at the partition CLI can confirm this issue. Blades will be either "offline" or "failed", with a reason of "reconnecting" or "database disconnected" for an extended period (more than a few seconds).

Impact

Blades fail to initialize, causing tenants to not restart correctly.

Conditions

Write transactions occurring during HA cluster formation can sometimes interfere with database initialization/replication, most often observed when multiple blades reboot together during a rolling upgrade.

Workaround

Disable and re-enable the partition. If both partition controller instances are healthy (active/standby), use the "go-standby" command at the partition CLI.

Fix Information

The HA framework recognizes the database replication lockup and automatically resets the cluster.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips