Bug ID 895845: Implement automatic conflict resolution for gossip-conflicts in REST

Last Modified: Jul 29, 2020

Bug Tracker

Affected Product:  See more info
BIG-IP TMOS(all modules)

Known Affected Versions:
12.1.0, 12.1.0 HF1, 12.1.0 HF2, 12.1.1, 12.1.1 HF1, 12.1.1 HF2, 12.1.2, 12.1.2 HF1, 12.1.2 HF2, 12.1.3, 12.1.3.1, 12.1.3.2, 12.1.3.3, 12.1.3.4, 12.1.3.5, 12.1.3.6, 12.1.3.7, 12.1.4, 12.1.4.1, 12.1.5, 12.1.5.1, 12.1.5.2, 13.0.0, 13.0.0 HF1, 13.0.0 HF2, 13.0.0 HF3, 13.0.1, 13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 13.1.3, 13.1.3.1, 13.1.3.2, 13.1.3.3, 13.1.3.4, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.5, 14.1.0.6, 14.1.2, 14.1.2.1, 14.1.2.2, 14.1.2.3, 14.1.2.4, 14.1.2.5, 14.1.2.6, 15.0.0, 15.0.1, 15.0.1.1, 15.0.1.2, 15.0.1.3, 15.0.1.4, 15.1.0, 15.1.0.1, 15.1.0.2, 15.1.0.3, 15.1.0.4, 16.0.0

Opened: Apr 03, 2020
Severity: 3-Major

Symptoms

The devices in a high availability (HA) environment are out of sync in strange ways; config sync status indicates 'In Sync', but iApps such as SSL Orchestrator are out of sync.

Impact

If there are gossip conflicts, the devices requires manual intervention to get back in sync.

Conditions

-- high availability (HA) environment with two or more devices. -- Gossip used for config sync. (Note: Gossip sync is used by BIG-IQ for BIG-IP config sync by iAppLX.) -- A gossip conflict occurs for some reason. You can detect gossip conflicts at the following iControl REST endpoint: /mgmt/shared/gossip-conflicts You can check gossip sync status at the following iControl REST endpoint: /mgmt/shared/gossip

Workaround

When two devices are out of sync with different generation numbers due to gossip conflict, you can use the following guidance to resolve the conflict: 1. Update devices info to use the same generation number. 2. This info found on REST Storage worker. Storage worker uses the selflink plus a generation number as the key to a given set of data. 3. Add the data from the unit with the highest generation number to the other unit. 4. Must also take care to increase the generation number on the new data to match that of the highest generation Commands used: 1. Look for GENERATION_MISSING and gossip-conflict objects: tmsh list mgmt shared gossip-conflicts 2. Get the 'selflink in remoteState' attribute. This self link is same across all devices and checks on the browser with each device to discover the device that is on the highest generation number: tmsh list mgmt shared gossip-conflicts <OBJECT_ID> 3. Now you know what device contains the most recent version of your data, run this command to get up-to-date data: restcurl /shared/storage?key=<everything after 'https://localhost/mgmt/' on selflink> 4. Make a post to the out-of-date device that includes the info from the up-to-date device as the post body: restcurl -X POST /shared/storage -d '{<data from above command>}'

Fix Information

None

Behavior Change