Bug ID 1017029: SASP monitor does not identify specific cause of failed SASP Registration attempt

Last Modified: Jul 23, 2021

Bug Tracker

Affected Product:  See more info
BIG-IP LTM(all modules)

Known Affected Versions:
13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 13.1.3, 13.1.3.1, 13.1.3.2, 13.1.3.3, 13.1.3.4, 13.1.3.5, 13.1.3.6, 13.1.4, 13.1.4.1, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.5, 14.1.0.6, 14.1.2, 14.1.2.1, 14.1.2.2, 14.1.2.3, 14.1.2.4, 14.1.2.5, 14.1.2.6, 14.1.2.7, 14.1.2.8, 14.1.3, 14.1.3.1, 14.1.4, 14.1.4.1, 14.1.4.2, 14.1.4.3, 15.0.0, 15.0.1, 15.0.1.1, 15.0.1.2, 15.0.1.3, 15.0.1.4, 15.1.0, 15.1.0.1, 15.1.0.2, 15.1.0.3, 15.1.0.4, 15.1.0.5, 15.1.1, 15.1.2, 15.1.2.1, 15.1.3, 15.1.3.1, 16.0.0, 16.0.0.1, 16.0.1, 16.0.1.1, 16.0.1.2, 16.1.0

Opened: May 07, 2021
Severity: 3-Major

Symptoms

On affected BIG-IP versions, upon startup, the SASP monitor sends a single Registration Request to the SASP GWM (Group Workload Manager) to initiate monitoring of configured LTM pool members. This Registration Request contains all configured LTM pools (SASP Groups) and members (SASP Group Members). If an error is encountered by the SASP GWM with one of the SASP Groups in the request, the registration of all groups fails. However, the GWM does not provide any indication of *which* Group or member does not match the GWM configuration, hindering troubleshooting efforts. The current BIG-IP behavior does not allow identification of the specific pool/member or monitor that is misconfigured and thus responsible for the failed SASP Registration attempt.

Impact

If a single Registration Request fails, the GWM terminates the connection with the Load Balancer (BIG-IP SASP monitor). This behavior is defined by the SASP protocol and SASP GWM implementation. As a result, the SASP monitor will mark all pool members DOWN that are monitored by the SASP monitor, halting traffic from flowing to all pools monitored by the SASP monitor. When an error occurs during registration of the LTM pools (SASP Groups), the GWM does not provide any indication of *which* Group or member does not match the GWM configuration. Since a single error message is returned by the SASP GWM for the entire Registration Request (for all SASP Groups), the SASP monitor cannot indicate which Group (pool/member) or monitor caused the error. This hinders efforts to troubleshoot the cause of the failure, while all traffic has stopped flowing to the SASP-monitored pools.

Conditions

This behavior occurs on affected BIG-IP versions when the LTM SASP monitor is configured to monitor members of multiple LTM pools, and when BIG-IP start/restarts/reboots or the configuration is loaded.

Workaround

To diagnose this issue, first enable saspd debug logging: tmsh mod sys db saspd.loglevel value debug_msg (Optional alternative values include deep_debug and debug, but provide less detail.) With saspd debug logging enabled, a message like the following in /var/log/monitors/saspd.log confirms that an error occurred during the Registration step: SASPProcessor::processRegistrationReply: received error registering workloads with GWM ##.##.##.###:3860: 69 'InvalidGroup' If the above message is found to confirm this issue, the primary path to resolution should be for the BIG-IP administrator to very carefully compare the BIG-IP pool/member and sasp monitor configuration with the SASP GWM configuration, to identify any mismatches or inconsistencies between the configurations. On the BIG-IP system, to help isolate the misconfigured LTM pool(s)/member(s) causing the SASP Registration failure: 1. Remove the sasp monitor from configured LTM pools/members one at a time, and observe whether any pool members still monitored by the sasp monitor are marked UP. 2. Add the sasp monitor back to configured LTM pools/members one at a time, in the same order as removed, except for the last LTM pool/member from which it was removed. 3. Save and reload the configuration, and check whether the LTM pools/members monitored by the sasp monitor are still marked UP. 4. Repeat as necessary if there appear to be multiple LTM pools/members causing a SASP Registration failure. Alternately, it may be possible to choose a different monitor (using a more fault-tolerant protocol) to monitor the status of affected pool members.

Fix Information

None

Behavior Change