Bug ID 748205: SSD bay identification incorrect for RAID drive replacement

Last Modified: Nov 14, 2022

Bug Tracker

Affected Product:  See more info
BIG-IP Install/Upgrade, TMOS(all modules)

Known Affected Versions:
12.1.0, 12.1.0 HF1, 12.1.0 HF2, 12.1.1, 12.1.1 HF1, 12.1.1 HF2, 12.1.2, 12.1.2 HF1, 12.1.2 HF2, 12.1.3, 12.1.3.1, 12.1.3.2, 12.1.3.3, 12.1.3.4, 12.1.3.5, 12.1.3.6, 12.1.3.7, 12.1.4, 12.1.4.1, 13.0.0, 13.0.0 HF1, 13.0.0 HF2, 13.0.0 HF3, 13.0.1, 13.1.0, 13.1.0.1, 13.1.0.2, 13.1.0.3, 13.1.0.4, 13.1.0.5, 13.1.0.6, 13.1.0.7, 13.1.0.8, 13.1.1, 13.1.1.2, 13.1.1.3, 13.1.1.4, 13.1.1.5, 14.0.0, 14.0.0.1, 14.0.0.2, 14.0.0.3, 14.0.0.4, 14.0.0.5, 14.0.1, 14.0.1.1, 14.1.0, 14.1.0.1, 14.1.0.2, 14.1.0.3, 14.1.0.5, 14.1.0.6, 14.1.2, 14.1.2.1, 14.1.2.2, 14.1.2.3, 14.1.2.4

Fixed In:
15.0.0, 14.1.2.5, 13.1.3, 12.1.5

Opened: Oct 30, 2018
Severity: 2-Critical

Symptoms

On iSeries platforms with dual SSDs, the 'bay' of a given SSD indicated in the 'tmsh show sys raid' command may be incorrect. If a drive fails, or for some other reason it is intended to be replaced, and you are using the bay number listed from the tmsh command, the wrong drive could be removed from the system resulting in system failure to operate or boot.

Impact

Removal of the one working drive could result in system failure and subsequent failure to boot

Conditions

iSeries platform with dual SSDs.

Workaround

If you discover that you removed the incorrect drive, you can attempt to recover by re-inserting the drive into the bay that it was in, and powering on the device. The following steps will help to avoid inadvertently removing the wrong drive: As a rule for systems with this issue: -- Power should be off when you remove a drive. This makes it possible to safely check the serial number of the removed drive. -- Power should be on, and the system should be completely 'up' before you add a new drive. Here are some steps to follow to prevent this issue from occurring. 1. Identify the failed drive, taking careful note of its serial number (SN). You can use any of the following commands to get the serial number: • tmsh show sys raid • tmsh show sys raid array • array 2. Logically remove the failed drive using the following command: tmsh modify sys raid array MD1 remove HD<> 3. Power down the unit. 4. Remove the fan tray and physically remove the failed drive. 5. Manually inspect the SN on the failed drive to ensure that the correct drive was removed. 6. Replace the fan tray. 7. Power on the unit with the remaining, single drive. 8. Once booted, wait for the system to identify the remaining (good) drive. You can confirm that this has happened when it appears in the 'array' command output. 9. Remove fan tray again (with the system running). 10. Install the new drive. 11. Use the 'array' command to determine that the new drive is recognized (Note: the tmsh commands do not show new drive at this stage.) 12. Logically add the new drive using the command command: tmsh modify sys raid array MD1 add HD<> 13. Monitor the rebuild using any of the commands shown in step 1. Note: You must follow these steps exactly. If you insert the new drive while the system is off, and you then boot the system with the previously existing working drive and the new blank drive present, the system recognizes the blank drive as the working Array member, and you cannot add it to the array. That means system responds and replicates as if 'HD already exists'.

Fix Information

None

Behavior Change