Bug ID 1091785: DBDaemon restarts unexpectedly and/or fails to restart under heavy load

Last Modified: Jan 20, 2023

Bug Tracker

Affected Product:  See more info
BIG-IP GTM, LTM(all modules)

Known Affected Versions:
13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 13.1.3, 13.1.3.1, 13.1.3.2, 13.1.3.3, 13.1.3.4, 13.1.3.5, 13.1.3.6, 13.1.4, 13.1.4.1, 13.1.5, 13.1.5.1, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.5, 14.1.0.6, 14.1.2, 14.1.2.1, 14.1.2.2, 14.1.2.3, 14.1.2.4, 14.1.2.5, 14.1.2.6, 14.1.2.7, 14.1.2.8, 14.1.3, 14.1.3.1, 14.1.4, 14.1.4.1, 14.1.4.2, 14.1.4.3, 14.1.4.4, 14.1.4.5, 14.1.4.6, 14.1.5, 14.1.5.1, 14.1.5.2, 14.1.5.3, 15.0.0, 15.0.1, 15.0.1.1, 15.0.1.2, 15.0.1.3, 15.0.1.4, 15.1.0, 15.1.0.1, 15.1.0.2, 15.1.0.3, 15.1.0.4, 15.1.0.5, 15.1.1, 15.1.2, 15.1.2.1, 15.1.3, 15.1.3.1, 15.1.4, 15.1.4.1, 15.1.5, 15.1.5.1, 15.1.6, 15.1.6.1, 15.1.7, 15.1.8, 15.1.8.1, 16.0.0, 16.0.0.1, 16.0.1, 16.0.1.1, 16.0.1.2, 16.1.0, 16.1.1, 16.1.2, 16.1.2.1, 16.1.2.2, 16.1.3, 16.1.3.1, 16.1.3.2, 16.1.3.3, 17.0.0, 17.0.0.1, 17.0.0.2

Opened: Mar 30, 2022
Severity: 3-Major

Symptoms

While under heavy load, the Database monitor daemon (DBDaemon) may: - Restart for no apparent reason - Restart repeatedly in rapid succession - Log the following error while attempting to restart: java.net.BindException: Address already in use (Bind failed) - Fail to start (remain down) after several attempts, leaving database monitors disabled and marking monitored resources Down.

Impact

The DBDaemon restarts for no apparent reason. The DBDaemon fails to start (remain down) after several attempts, leaving database monitors disabled and marking monitored resources Down.

Conditions

- One or more active GTM and/or LTM database monitors are configured with short probe-timeout, interval and timeout values (for example, 2, 5, or 16 respectively). - A large number (for example, 2,000) of GTM and/or LTM database monitor instances (combinations of above monitor and pool member) are configured. - Active GTM and/or LTM database monitors are configured with debug yes and/or count 0.

Workaround

The conditions that are suspected to cause these symptoms include effects of ID1025089. Measures to prevent or reduce occurrences of ID1025089 (by reducing database monitor workload) will likely also prevent or reduce occurrences of these symptoms. If the DBDaemon fails to restart, the following steps may allow DBDaemon to restart successfully upon the next database monitor probe: -- Check for a running instance of DBDaemon with the following command: ps ax | grep -v grep | grep DBDaemon -- If DBDaemon is running, this command will return a set of parameters including the numerical process ID (PID) at the beginning of the line and a command line that begins with "/usr/lib/jvm/jre/bin/java" and includes the parameter "com.f5.eav.DBDaemon", such as: 24943 ? Ssl 46:49 /usr/lib/jvm/jre/bin/java -cp /usr/lib/jvm/jre/lib/rt.jar:/usr/lib/jvm/jre/lib/charsets.jar:/usr/share/monitors/postgresql-jdbc.jar:/usr/share/monitors/DB_monitor.jar:/usr/share/monitors/log4j.jar:/usr/share/monitors/mssql-jdbc.jar:/usr/share/monitors/mysql-connector-java.jar:/usr/share/monitors/ojdbc6.jar -Xmx512m -Xms64m -XX:-UseLargePages -DLogFilePath=/var/log/DBDaemon-0.log com.f5.eav.DBDaemon 1521 24943 0 -- If a running DBDaemon process is identified, use the "kill" command to terminate the running DBDaemon process: kill # (where # is the DBDaemon PID from the above "ps" command) -- Repeat the above "ps" command to confirm that the DBDaemon process has been terminated. If a new DBDaemon process has not been started (with a different PID), proceed to the next steps. -- Check the /var/run directory for the presence of any files with names beginning with "DBDaemon", such as: /var/run/DBDaemon-0.lock /var/run/DBDaemon-0.pid /var/run/DBDaemon-0.start.lock Note: The numeric value in the above example filenames corresponds to the Route Domain of pool members monitored by database monitors. If the database monitors are only applied to pool members in the default route domain (RD 0), that value will be "0" as seen above. If database monitors are applied to pool members in a non-default route domain (RD 7, for example), the numeric value will correspond to that route domain, such as: /var/run/DBDaemon-7.lock /var/run/DBDaemon-7.pid /var/run/DBDaemon-7.start.lock -- If no DBDaemon process is running, delete any /var/run/DBDaemon* files. It is especially important to delete: /var/run/DBDaemon-#.start.lock (indicates DBDaemon restart is in progress and that no further restart actions should be attempted) /var/run/DBDaemon-#.pid (indicates current DBDaemon PID) -- If the above actions do not result in DBDaemon restarting upon the next database monitor ping, then a complete BIG-IP restart will likely be required to recover from unknown conditions within the Java subsystem that may prevent successful DBDaemon operation: bigstart restart or: reboot

Fix Information

None

Behavior Change