Bug ID 462258: AD/LDAP server connection failures might cause apd to stop processing requests when service is restored

Last Modified: Apr 10, 2019

Bug Tracker

Affected Product:  See more info
BIG-IP APM(all modules)

Known Affected Versions:
11.6.0, 11.5.1, 11.5.0, 11.4.1, 11.4.0, 11.3.0, 11.2.1, 11.2.0, 11.1.0, 11.0.0

Fixed In:
12.0.0, 11.6.1, 11.5.1 HF6, 11.4.1 HF9

Opened: May 14, 2014
Severity: 3-Major
Related AskF5 Article:
K16501

Symptoms

AD/LDAP server connection failures might cause APM apd to stop processing requests when service is restored. These symptoms accompany the problem: - Too many file descriptors open by apd. - 'Too many open files' error messages in the log file. - Running qkview to gather diagnostic data reveals the information similar to the following in 'netstat -pano' from qkview: tcp 270 0 127.0.0.1:10001 10.10.225.85:53212 ESTABLISHED 12191/apd off (0.00/0/0) tcp 269 0 127.0.0.1:10001 10.10.225.4:56305 ESTABLISHED 12191/apd off (0.00/0/0) tcp 272 0 127.0.0.1:10001 10.10.57.10:57508 CLOSE_WAIT 12191/apd off (0.00/0/0) tcp 0 0 127.1.1.1:56230 127.7.0.1:389 ESTABLISHED 12191/apd keepalive (1909.72/0/0) The last line with timer 'keepalive (1909.72/0/0)' indicates that apd has been waiting for a response for too long. Other lines with Recv-Q '272' indicate that apd is not reading incoming requests as expected (specifically, that the internal worker queue is overloaded because all threads are waiting for the one hanging thread to be processed).

Impact

Potential connection failures to backend server.

Conditions

This occurs between the connect and search phases of the AD/LDAP server connection operation, most likely when a AAA Server is configured to use pool as a backend. In this case, apd can always connect locally to layered virtual server, but the pool monitor has a server availability check interval, so a lag in the request to an unavailable server might cause apd to hang.

Workaround

None

Fix Information

Active Directory and LDAP server connection operations time out in 3 minutes, so a thread does not block any other, and service can recover as soon as the connection to the backend is restored.

Behavior Change