Bug ID 1100185: External Storage Snapshots fail when Data Collection Devices have performance issues

Last Modified: Apr 26, 2022

Bug Tracker

Affected Product:  See more info
BIG-IQ Device(all modules)

Known Affected Versions:
7.1.0, 7.1.0.1, 7.1.0.2, 7.1.0.3, 7.1.6, 7.1.6.1, 7.1.7, 7.1.7.1, 7.1.7.2, 7.1.8, 7.1.8.1, 7.1.8.2, 7.1.8.3, 7.1.8.4, 7.1.8.5, 7.1.9, 7.1.9.7, 7.1.9.8, 7.1.9.9, 8.1.0, 8.1.0.1, 8.1.0.2

Opened: Apr 18, 2022
Severity: 3-Major

Symptoms

A scheduled Snapshot fails and the /var/log/restjavad.x.log displays the following message: [WARN][xxx][/cm/shared/event/alerts AlertCollectionWorker] Data Collection device data snapshot xxxxxxxz has failed because the following error occurred, null

Impact

Scheduled storage snapshots on External storage fails.

Conditions

When BIG-IQ DCD has performance issues, some scheduled Snapshot requests on External Storage can take more than 30 seconds to initialize. This can cause the Snapshot to fail.

Workaround

NOTE: The following work around is for BIG-IQ 8.2.x and will not work on other older BIG-IQ version 7.x/8.1.x. Edit /var/config/rest/config/restjavad.properties.json with multiple properties to bypass the issues. For this you need to decide on what value to set based on how long the snapshots actually take to finish. Note that even though BIG-IQ reports snapshot as failed, the snapshot would actually complete in the background. Check how long the snapshots take by comparing the time of schedule with the time the snapshot gets created. For example, if you notice that takes 45min to complete, set a larger timeout value in config file, for this example, to a timeout of 1 hour. 1hr = 3600s = 3600000ms You need to set that value in two entries apacheAsyncClient:socketTimeoutMillis & elasticsearch:esRestOperationTimeoutMillis. Note the lower case 's' in elasticsearch The BIG-IQ config file should look like after the changes { "platform" : { "apacheAsyncClient" : { .... "socketTimeoutMillis": "3600000" <==== For REST & Elasticsearch "Sockets" }, "elasticsearch" : <==== NOTE: change in casing - lower case 's' in elasticsearch { "esRestOperationTimeoutMillis": "3600000" <==== For REST framework's ES inter-worker call timeouts }, ...

Fix Information

None

Behavior Change