Bug ID 1021149: Search and data collection can become unhealthy after adding a Data Collection Device (DCD)

Last Modified: Sep 13, 2023

Affected Product(s):
BIG-IQ AppIQ(all modules)

Known Affected Versions:
7.0.0, 7.0.0.1, 7.0.0.2, 7.1.0, 7.1.0.1, 7.1.0.2, 7.1.0.3, 7.1.6, 7.1.6.1, 7.1.7, 7.1.7.1, 7.1.7.2, 7.1.8, 7.1.8.1, 7.1.8.2, 7.1.8.3, 7.1.8.4, 7.1.8.5, 7.1.9, 7.1.9.7, 7.1.9.8, 7.1.9.9, 8.0.0, 8.0.0.1

Fixed In:
8.1.0.1

Opened: May 25, 2021

Severity: 3-Major

Related Article: K68489751

Symptoms

Data Collection Devices (DCD) become unhealthy after adding a DCD and logs messages similar to the following in /var/log/elasticsearch/eslognode.log 2020-07-14T00:04:13,292][INFO ][o.e.d.z.ZenDiscovery ] [xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx] master_left [{xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx}{xxxxxxxxxxx}{xxxxx}{192.0.2.3}{192.0.2.3:9300}{zone=default}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout] [2020-07-14T00:04:13,293][WARN ][o.e.d.z.ZenDiscovery ] [xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes:

Impact

Data Collection Devices are not healthy.

Conditions

There are long-lived TCP connections on port 9300 between BIG-IQ and data collection devices (DCD). Those connections send keepalive probes after being idle for 299 seconds. If an intermediate device idle-times out this TCP connection in less than 299 seconds, the ES cluster will experience stability problems.

Workaround

On each BIG-IQ and DCD, perform the following steps: 1. Edit the file /etc/bigstart/scripts/elasticsearch and modify the following setting to meet the needs of the environment: sysctl -w net.ipv4.tcp_keepalive_time=299 \ net.ipv4.tcp_keepalive_intvl=60 \ 2. Set net.ipv4.tcp_keepalive_time to the number of seconds a connection is idle before keepalives are sent. 3. Set net.ipv4.tcp_keepalive_intvl to the interval between keepalives Example values that work in most environments: sysctl -w net.ipv4.tcp_keepalive_time=20 \ net.ipv4.tcp_keepalive_intvl=20 \ 4. To set the values type the following commands: Note: The values set in /etc/bigstart/scripts/elasticsearch take effect on the next boot, but will not persist after an upgrade. sysctl -w net.ipv4.tcp_keepalive_time=20 sysctl -w net.ipv4.tcp_keepalive_intvl=20 See also https://support.f5.com/csp/article/K68489751

Fix Information

This issue is fixed and the DCD no longer becomes unhealthy.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips