Bug ID 744267: Database size keeps growing and /var disk space fills up quickly

Last Modified: Nov 07, 2022

Bug Tracker

Affected Product:  See more info
BIG-IQ Platform(all modules)

Known Affected Versions:

Opened: Sep 17, 2018
Severity: 3-Major
Related Article:


The records in the AlertsIgnoredCollectionWorker do not expire causing the database to keep growing. This causes the space in /var to fill up.


As the /var keeps filling and the database keeps growing, the system might eventually run out of disk space and cause issues with the BIG-IQ system activities.


This happens on BIG-IQ systems with 1 or more data collection devices (DCDs).


**** Step 1 To slow the rate of alerts, complete this procedure: 1) Save a copy of the existing /var/config/rest/config/restjavad.properties.json file. 2) Edit the file restjavad.properties.json, locate the 'policymgmt' property, and change the values as shown in the following list: "policymgmt": { "groomingIntervalSec": 259200, "alertsQueryFrequencyMS": 3600000, "alertsQueryTimeFilterMS": 172800000, "alertsRequestLimit": 1 }, Note: Systems that have been upgraded from v5.x might not have the 'policymgmt' properties in the file. In that case, insert them right after the 'fileObjectGroomer' property, and make sure to add the relevant commas to separate objects and preserve the JSON structure. 3) To verify that the file loads properly, save the file and issue the following command: jq .platform /var/config/rest/config/restjavad.properties.json Note: If this command errors, repeat steps 2 and 3 to correct the errors prior to proceeding. 4) Issue the following command: tmsh restart sys service restjavad Completing these steps slows the alerts to ensure the size of /var does not continue to increase rapidly. You can now proceed with removing any unnecessary objects and possibly shrinking the database by re-indexing the database. **** Step 2 To remove the records, complete this procedure: 1) Create a file and add the following script to a file, e.g., /shared/tmp/remove_alerts, on the BIG-IQ console. for(var i = db.bigiqLiveObjects.find({'_value.kind':'cm:shared:policymgmt:alerts-ignored:alertsignoredstate'}).count(); i >=0; ) { print(i); var removeIdsArray=db.bigiqLiveObjects.find({'_value.kind':'cm:shared:policymgmt:alerts-ignored:alertsignoredstate'}, {_id : 1}).limit(100000).toArray().map(function(doc) { return doc._id; }); db.bigiqLiveObjects.remove({_id: {$in: removeIdsArray}}); i-=100000;} 2) Save the file. 3) Initiate the script from bash using the following command: mongo bigiqDb /shared/tmp/remove_alerts Note: Before proceeding to Step 3, rerun this script as many times as needed to ensure that the count is at, or near, zero. If there is still a large number of records (more than 10 KB), it might indicate that Step 1 did not complete properly, so the number of records is still growing rapidly. In this case, repeat Step 1 and Step 2 before continuing. **** Step 3 After Steps 1 and 2 complete and the number of records is staying at or near zero, it is recommended that you re-index the database to reclaim excess space in /var. IMPORTANT: This process requires network downtime, so you should perform following steps during a maintenance window. Using BIG-IQ Configuration Management, complete the following procedure: 1. Create a backup of the current database: - mkdir /var/tmp/toku_backup - dump-rest-storage /var/tmp/toku_backup 2. Stop services: - tmsh stop sys service tokumond restjavad 3. Create a copy of the bigiqLiveObjects collection: # mongo TokuMX mongo shell v2.0.3-mongodb-2.4.10 ... bigiq0:PRIMARY> use bigiqDb switched to db bigiqDb bigiq0:PRIMARY> db.bigiqLiveObjects.copyTo('bigiqLiveObjectsBackup') [number of documents copied] 4. Re-index the collection and, if successful, delete the backup: bigiq0:PRIMARY> db.bigiqLiveObjects.reIndex() [response in json format - look for <"ok" : 1> to confirm success] bigiq0:PRIMARY> db.bigiqLiveObjectsBackup.drop() true Note: While step 4 is executing, use a secondary bash shell to start mongo and check the progress using db.currentOp(). The 'message' line shows a progress percentage for each of the 12 indices, 1 index at a time. Once it completes all of the indices, the system posts the message: "ok" : 1. 5. Exit mongo cli: bigiq0:PRIMARY> quit() 6. After waiting for 5 minutes, restart services: - tmsh start sys service tokumond restjavad

Fix Information


Behavior Change