Bug ID 636176: Kernel.ntp: livelock in leapsecond insertion :: watchdog reboots

Last Modified: Jul 22, 2020

Bug Tracker

Affected Product:  See more info
BIG-IQ Platform(all modules)

Known Affected Versions:
4.0.0, 4.1.0, 4.2.0, 4.2.0 HF1, 4.3.0, 4.3.0 HF1, 4.3.0 HF2

Opened: Dec 29, 2016
Severity: 2-Critical
Related AskF5 Article:
K16839

Symptoms

On rare occasions systems hang due to leap-second livelock. As a result of this issue, you may encounter one or more of the following symptoms: -- The BIG-IP system fails to process traffic for a brief period of time. -- The BIG-IP system fails over to another host in the device group. -- Error messages similar to the following example may appear in the /var/log/daemon.log file: notice ntpd[6789]: kernel time sync enabled Error messages similar to the following example appear in the /var/log/ltm file: notice boot_marker : ---===[ MD1.2 - BIG-IP 11.3.0 Build 3158.21 ]===--- chmand[6586]: 012a0005:5: CPLD indicates prior Host CPU subsystem reset chmand[6587]: 012a0005:5: Host CPU subsystem reset - PCI reset asserted chmand[6588]: 012a0005:5: Host CPU subsystem reset caused by a Southbridge system reset chmand[6589]: 012a0004:4: Host CPU subsystem reset caused by *** Super I/O watchdog timeout ***

Impact

BIG-IQ system will restart.

Conditions

During the 24 hour window leading up to a leap second event a RedHat kernel livelock condition may occur. A a result the BIG-IP hardware watchdog will trigger a reboot to allow the system to recover. This occurs due to the Redhat kernel-based livelock condition reference by the follwoing link: https://rhn.redhat.com/errata/RHBA-2012-1198.html

Workaround

Once affected, running this command resets the clock and eliminates the issue: date -s "$( date )". You can read more about this issue in SOL16839: The BIG-IP system may reboot when configured to synchronize its clock with an NTP server, available here https://support.f5.com/kb/en-us/solutions/public/16000/800/sol16839.html, and on the Redhat site, here: https://access.redhat.com/solutions/154713.

Fix Information

None

Behavior Change