Bug ID 1041865: Correctable machine check errors [mce] should be suppressed

Last Modified: Jul 16, 2024

Affected Product(s):
BIG-IP TMOS(all modules)

Known Affected Versions:
13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 13.1.3, 13.1.3.1, 13.1.3.2, 13.1.3.3, 13.1.3.4, 13.1.3.5, 13.1.3.6, 13.1.4, 13.1.4.1, 13.1.5, 13.1.5.1, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.5, 14.1.0.6, 14.1.2, 14.1.2.1, 14.1.2.2, 14.1.2.3, 14.1.2.4, 14.1.2.5, 14.1.2.6, 14.1.2.7, 14.1.2.8, 14.1.3, 14.1.3.1, 14.1.4, 14.1.4.1, 14.1.4.2, 14.1.4.3, 14.1.4.4, 14.1.4.5, 14.1.4.6, 14.1.5, 14.1.5.1, 14.1.5.2, 14.1.5.3, 14.1.5.4, 14.1.5.6, 15.0.0, 15.0.1, 15.0.1.1, 15.0.1.2, 15.0.1.3, 15.0.1.4, 15.1.0, 15.1.0.1, 15.1.0.2, 15.1.0.3, 15.1.0.4, 15.1.0.5, 15.1.1, 15.1.2, 15.1.2.1, 15.1.3, 15.1.3.1, 15.1.4, 15.1.4.1, 15.1.5, 15.1.5.1, 15.1.6, 15.1.6.1, 16.0.0, 16.0.0.1, 16.0.1, 16.0.1.1, 16.0.1.2, 16.1.0, 16.1.1, 16.1.2, 16.1.2.1, 16.1.2.2, 16.1.3

Fixed In:
17.0.0, 16.1.3.1, 15.1.7

Opened: Aug 20, 2021

Severity: 2-Critical

Related Article: K16392416

Symptoms

Log emerg in kern.log similar to: emerg kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 10: cc003009000800c1

Impact

Correctable errors are logged in kern.log and to the console. There is no functional impact.

Conditions

Correctable errors can be identified by analyzing the 16-bit value shown in bits [31:16] of the 64-bit error from the /var/log/kern.log message. There are many types of correctable errors that are not service impacting. Correctable errors are part of the ECC DIMM technology. The following is an example of a correctable error. An example is shown below. Log error matches this pattern: Machine Check: 0 [bank number]: [cc003009][0008][00c1] bits [31:16] = 0008

Workaround

If the error message matches the signature in the example above, an RMA is not needed. If the error message does not match that signature, check the system's traffic condition and confirm there is no negative performance impact. If no performance impact is observed then it means the error is a correctable error and an RMA is not required. F5 recommends that you upgrade to a fixed TMOS version, then check that the error message is eliminated. For more information, see K16392416: Memory errors and MCE errors

Fix Information

Fixed “correctable MCE error suppressed” errors.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips