Bug ID 909205: BIG-IQ statistics reports are missing the latest event data

Last Modified: Nov 07, 2022

Bug Tracker

Affected Product:  See more info
BIG-IQ AppIQ(all modules)

Known Affected Versions:
7.0.0, 7.0.0.1, 7.0.0.2, 7.1.0, 7.1.0.1, 7.1.0.2, 7.1.0.3, 7.1.6, 7.1.6.1, 7.1.7, 7.1.7.1, 7.1.7.2, 7.1.8, 7.1.8.1, 7.1.8.2, 7.1.8.3, 7.1.8.4, 7.1.8.5, 7.1.9, 7.1.9.7, 7.1.9.8, 7.1.9.9, 8.0.0

Fixed In:
8.0.0.1

Opened: May 14, 2020
Severity: 3-Major

Symptoms

BIG-IQ statistics reports are missing the latest event data.

Impact

This issue results in a loss of statistics data.

Conditions

This occurs when the default system-generated Elasticsearch indices are manually deleted by administrators. This can also happen when the disk becomes full and BIG-IQ runs out of disk space due to accumulating events/logs.

Workaround

The workaround involves repairing the elasticsearch indices storing statistics data. To identify the statistics that are impacted - list all index names & index aliases # curl -s localhost:9200/_cat/indices?v # curl -s localhost:9200/_cat/aliases?v - run below command (or create a periodic task) to detect affected/corrupted elasticsearch cluster indices # curl -s localhost:9200/_cat/indices?h=index | grep _writer - every index output in above command requires a repair procedure outlined in the remainder of article as the name of index is not expected to have '_writer' suffix (only names of index aliases will have the '_writer' suffix) lets say it reports youraffectedindex1_writer/youraffectedindex2_writer - for every index reported above you can see the size of statistics data that have been accumulated under it (this it the amount of data not getting reported in BIG-IQ GUI) # curl -s localhost:9200/_cat/indices | youraffectedindex1_writer # curl -s localhost:9200/_cat/indices | youraffectedindex2_writer If statistics data for impacted/corrupt indices is non-critical (and permanent deletion is acceptable), then please follow procedure A. If you need to preserve the corrupt data proceed to procedure B. However, both procedures repair the impacted elasticsearch indices. ------------ Procedure A ------------ Use this procedure to permanently delete the corrupt data and repair the impacted indices 1. Deactivate the impacted service. - on BIG-IQ CM, navigate to System -> BIG-IQ DATA COLLECTION -> BIG-IQ Data Collection Devices - deactivate the impacted service (based on the reported indices earlier) for all DCDs Ex: Access / DOS Protection / Fraud Protection Service / IPSec / Network Security / Web Application Security - wait for 5 minutes to allow existing connections to close - on BIG-IQ DCD verify all external connections to special ports are closed: # netstat -an | grep -E "9997|8018|8514|8020|8008" 2. Delete the impacted elasticsearch index. - on CM (or DCD) remove the incorrect index from elasticsearch, ex: # curl -sX DELETE localhost:9200/youraffectedindex1_writer Note: above command should return {"acknowledged":true} - confirm the index is no longer reported by below command # curl -s localhost:9200/_cat/indices | grep _writer 3. Reactivate the service. - on BIG-IQ CM, navigate to System -> BIG-IQ DATA COLLECTION -> BIG-IQ Data Collection Devices and activate the impacted services again ------------ Procedure B ------------ Use this procedure to recover the corrupt data and repair the impacted indices 1. Deactivate the impacted service. see step #1 from Procedure A 2. Create a temporary elasticsearch index. - identify naming convention (including a date suffix) by observing output of below command # curl -s localhost:9200/_cat/indices?h=index ex: youraffectedindex1_2020-xx-xxtxx-xx-xx-xxxx - create a temporary index using name identified in above step # curl localhost:9200/youraffectedindex1_2020-xx-xxtxx-xx-xx-xxxx -X PUT -d {} ex: # curl localhost:9200/afmlogindex_2020-02-01t10-10-10-0100 -X PUT -d {} 3. Reindex data from corrupt index to the temporary elasticsearch index. - this may take time depending on size of data in affected index - begin the indexing process # curl -s localhost:9200/_reindex?wait_for_completion=false -d '{"source":{"index":"youraffectedindex1_writer"},"dest":{"index":"youraffectedindex1_2020-xx-xxtxx-xx-xx-xxxx"}}' | jq . ex: # curl -s localhost:9200/_reindex?wait_for_completion=false -d '{"source":{"index":"afmlogindex_writer"},"dest":{"index":"afmlogindex_2020-02-01t10-10-10-0100"}}' | jq . - above command will print a task-identifier ex: "task": "rur5BcBNTGqdtydEDjHMAA:22687961" - use above reported unique task-identifier and repeatedly query the progress of the task until it completes # curl -s localhost:9200/_tasks/rur5BcBNTGqdtydEDjHMAA:22687961 | jq .completed true - verify the index status # curl -s localhost:9200/_cat/indices?v # curl -s localhost:9200/_cat/aliases?v 4. Remove the corrupted elasticsearch index. - see step #2 in Procedure A. 5. Create a new *_writer alias for the temporary elasticsearch index. - run below command to create an alias (for your newly created index) named as the index you just deleted # curl localhost:9200/_aliases -X POST -d '{"actions" : [ {"add" : { "index" : "youraffectedindex1_2020-xx-xxtxx-xx-xx-xxxx" , "alias" : "youraffectedindex1_writer" }} ] }' ex: # curl localhost:9200/_aliases -X POST -d '{"actions" : [ {"add" : { "index" : "afmlogindex_2020-02-01t10-10-10-0100" , "alias" : "afmlogindex_writer" }} ] }' - verify the index status # curl -s localhost:9200/_cat/indices?v # curl -s localhost:9200/_cat/aliases?v 6. Reactivate the impacted service. see step #3 in Procedure A. 7. Confirm newly initialized elasticsearch indices. - verify the index status # curl -s localhost:9200/_cat/indices?v # curl -s localhost:9200/_cat/aliases?v - confirm that you see a new index named yourimpactedindex1_YYYY_MM_DD and that yourimpactedindex1_writer points to this latest index instance - this happens because eventually index are rotated and removed (this also depends on the retention policy)

Fix Information

Statistics and event data displays as expected.

Behavior Change