Bug ID 1038981: BIG-IQ High availability (HA) configuration fails when initial database copy takes longer than 15 minutes

Last Modified: Dec 07, 2023

Affected Product(s):
BIG-IQ Platform(all modules)

Known Affected Versions:
8.0.0, 8.0.0.1

Opened: Aug 05, 2021

Severity: 3-Major

Symptoms

You might see one or more of following symptoms: 1. When you run the following command on the standby BIG-IQ to examine the restjavad log $ grep 'Re-starting restjavad' /var/log/daemon* You might see several restjavad service restarts logged. For example: .... logger[10602]: Re-starting restjavad .... logger[7385]: Re-starting restjavad 2. The BIG-IQ GUI might halt with a message similar to: 'Waiting for BIG-IQ services to become available' 3. If you run the following command to view the standby BIG-IQ log: $ tail -f /var/log/ha_pg_basebackup.log When database replication starts you will see the following message: ...Setup_slave] Configuring pg_basebackup waiting for checkpoint If the log shows has not reached 100% , you might see errors similar to the following: 538210/545848 kB (98%), 0/1 tablespace pg_basebackup: could not create directory "/var/lib/pgsql/data/base/1": File exists pg_basebackup: removing data directory "/var/lib/pgsql/data" could not remove file or directory "/var/lib/pgsql/data": Directory not empty pg_basebackup: failed to remove data directory Under normal circumstances, you would typically see a success message similar to: ..Setup_slave] Completed pg_basebackup successfully.

Impact

BIG-IQ configurations with a large database or limited network bandwidth cannot form a successful BIG-IQ high availability configuration.

Conditions

When creating a BIG-IQ high availability configuration, the standby BIG-IQ pulls the database from the active BIG-IQ. If this takes longer than 15 minutes to complete, the high availability (HA) configuration fails. This can happen on low bandwidth networks or when the database is very large.

Workaround

The following process is only to recover the standby BIG-IQ. 1. Browse to System::THIS DEVICE::BIG-IQ high availability (HA) on the primary BIG-IQ 2. DO NOT click the 'Repair Standby Database' button. 3. Log in to the standby BIG-IQ device from the command line. 4. If you see messages that restjavad service restarts, you might be unable to type commands. Type the following command to stop the service: $ bigstart stop restjavad 5. Type the following command to reset the database on the standby BIG-IQ: $ pgsh -i -f 6. After the standby BIG-IQ has recovered, click 'Revert to standalone' on the active BIG-IQ. If these steps don't work, we can reset the database on the standby BIG-IQ manually by (i) stopping postgres service (ii) deleting the /var/lib/pgsql/data directory (iii) restarting postgres service.

Fix Information

None

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips