Bug ID 907321: Elasticsearch instance on the BIG-IQ CM generates an OutOfMemory error

Last Modified: Oct 06, 2020

Bug Tracker

Affected Product:  See more info
BIG-IQ AppIQ(all modules)

Opened: May 07, 2020
Severity: 3-Major

Symptoms

On certain conditions the Elasticsearch instance on the BIG-IQ CM might run out of memory., and the following error is observed in the Elasticsearch log: org.elasticsearch.ElasticsearchException: java.lang.OutOfMemoryError: GC overhead limit exceeded

Impact

The Elasticsearch cluster is not stable and prone to multiple restarts.

Conditions

In order for this issue to occur there must be a very large volume of traffic throughput to your BIG-IP systems. 1. Setup a large scale BIG-IQ system with 3 DCD's monitoring dozens of BIG-IP devices 2.Generate a large amount of statistical data collection by sending a high volume of traffic throughput to the BIG-IP devices in your system.

Workaround

If your configuration includes an active/standby pair for your BIG-IQ CM setup, you must perform this procedure on both the active and standby devices. 1. On the BIG-IQ CM node go to: vi /etc/biq_daemon_provision.json 2. Edit the restjavad memory to include: "big_iq": { "restjavad": { "active": true, "memory_allocation": { "SYS_4GB": "800m", "SYS_8GB": "3500m", "SYS_16GB": "6000m", "SYS_32GB": "12700m", -->>> change this to "10300m" "SYS_64GB": "20000m", "SYS_128GB": "20000m" }, "new_ratio": { "SYS_32GB": "1" } }, 3. Edit the elasticsearch memory ( under "big_iq") }, "elasticsearch": { "active": true, "memory_allocation": { "SYS_4GB": "100m", "SYS_8GB": "200m", "SYS_16GB": "500m", "SYS_32GB": "1600m", -->>> increase this to "4000m" "SYS_64GB": "3200m", "SYS_128GB": "6400m" } }, 4. Use to bigstart restart restjavad 5. Use bigstart restart elasticsearch 6. Reduce the amount of shards in the Elasticsearch cluster by running the following API's one by one on the BIG-IQ CM console: curl localhost:8898/mgmt/ap/v1/platform-config/resources/ap:es:time_based_index/es-index -H "Content-Type: application/json;charset=UTF-8" -X PATCH -d \ "{\"indexFamily\":\"statistics\",\"deploymentTarget\":\"any\",\"id\":\"es-index-statistics-tl0\",\"ownerGroup\":\"appiq\",\"kind\":\"ap:es:time_based_index\",\"retentionTime\":\"PT10H\",\"indexLevel\":\"tl0\",\"aggregationPeriod\":\"PT30S\",\"dateTimeFormat\":\"YYYY-DDD-HH\",\"rotationPeriod\":\"PT1H\",\"settings\":{\"index.number_of_replicas\": 1 , \"index.number_of_shards\": 3 }}" curl localhost:8898/mgmt/ap/v1/platform-config/resources/ap:es:time_based_index/es-index -H "Content-Type: application/json;charset=UTF-8" -X PATCH -d \ "{\"indexFamily\":\"statistics\",\"deploymentTarget\":\"any\",\"id\":\"es-index-statistics-tl1\",\"ownerGroup\":\"appiq\",\"kind\":\"ap:es:time_based_index\",\"retentionTime\":\"P7D\",\"indexLevel\":\"tl1\",\"aggregationPeriod\":\"PT1H\",\"dateTimeFormat\":\"YYYY-DDD-HH\",\"rotationPeriod\":\"P1D\",\"settings\":{\"index.number_of_replicas\": 1 , \"index.number_of_shards\": 3 }}" curl localhost:8898/mgmt/ap/v1/platform-config/resources/ap:es:time_based_index/es-index -H "Content-Type: application/json;charset=UTF-8" -X PATCH -d \ "{\"indexFamily\":\"statistics\",\"deploymentTarget\":\"any\",\"id\":\"es-index-statistics-tl2\",\"ownerGroup\":\"appiq\",\"kind\":\"ap:es:time_based_index\",\"retentionTime\":\"P180D\",\"indexLevel\":\"tl2\",\"aggregationPeriod\":\"P1D\",\"dateTimeFormat\":\"YYYY-DDD-HH\",\"rotationPeriod\":\"P30D\",\"settings\":{\"index.number_of_replicas\": 1 , \"index.number_of_shards\": 3 }}" curl localhost:8898/mgmt/ap/v1/platform-config/resources/ap:es:time_based_index/es-index -H "Content-Type: application/json;charset=UTF-8" -X PATCH -d \ "{\"indexFamily\":\"statistics\",\"deploymentTarget\":\"any\",\"id\":\"es-index-statistics-tl3\",\"ownerGroup\":\"appiq\",\"kind\":\"ap:es:time_based_index\",\"retentionTime\":\"P365D\",\"indexLevel\":\"tl3\",\"aggregationPeriod\":\"P30D\",\"dateTimeFormat\":\"YYYY-DDD-HH\",\"rotationPeriod\":\"P180D\",\"settings\":{\"index.number_of_replicas\": 1 , \"index.number_of_shards\": 3 }}"

Fix Information

None

Behavior Change