HPE SSD Firmware Bug (Critical)

I’m just gonne leave this right here…..

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00092491en_us

I wonder who they outsourced the firmware code out to back in 2015….

IMPORTANT: This HPD8 firmware is considered a critical fix and is required to address the issue detailed below. HPE strongly recommends immediate application of this critical fix. Neglecting to update to SSD Firmware Version HPD8 will result in drive failure and data loss at 32,768 hours of operation and require restoration of data from backup in non-fault tolerance, such as RAID 0 and in fault tolerance RAID mode if more drives fail than what is supported by the fault tolerance RAID mode logical drive. By disregarding this notification and not performing the recommended resolution, the customer accepts the risk of incurring future related errors.

HPE was notified by a Solid State Drive (SSD) manufacturer of a firmware defect affecting certain SAS SSD models (reference the table below) used in a number of HPE server and storage products (i.e., HPE ProLiant, Synergy, Apollo, JBOD D3xxx, D6xxx, D8xxx, MSA, StoreVirtual 4335 and StoreVirtual 3200 are affected).

The issue affects SSDs with an HPE firmware version prior to HPD8 that results in SSD failure at 32,768 hours of operation (i.e., 3 years, 270 days 8 hours). After the SSD failure occurs, neither the SSD nor the data can be recovered. In addition, SSDs which were put into service at the same time will likely fail nearly simultaneously.

To determine total Power-on Hours via Smart Storage Administrator, refer to the link below:

Smart Storage Administrator (SSA) – Quick Guide to Determine SSD Uptime

Yeah you read that right, drive failure after a specific number of run hours. Yeah total drive failure, if anyone running a storage unit with these disks, it can all implode at once with full data loss. Everyone has backups on alternative disks right?

Lesson and review of today is. Double check your disks and any storage units you are using for age, and accept risks accordingly. Also ensure you have backups, as well as TEST them.

Another lesson I discovered is depending on the VM version created will depend which ESXi host it can technically be created on. While this is a “DUH” thing to say, it’s not so obvious when you restore a VM using Veeam and Veeam doesn’t code to tell you the most duh thing ever. Instead the recovery wizard will walk right through to the end and then give you a generic error message “Processing configuration error: The operation is not allowed in the current state.” which didn’t help much until I stumbled across this veeam form post

and the great Gostev himself finishes the post with…

by Gostev » Aug 23, 2019 5:52 pm

“According to the last post, the solution seems to be to ensure that the target ESXi host version supports virtual hardware version of the source VM.”

That’s kool…. or about… why doesn’t Veeam check this for you?!?!?!
Once I realized what the problem was, I simply restored the VM with a new name on the same host it was backed up from (Which was on a 6.5 ESXi host) and I was attempting to restore the VM on a 5.5 ESXi host. Again, after I realized I had created the VM under the options that I picked a higher VM level allowing it only to be used with higher versions of ESXi it was like again… “DUHHH” but then it made me think, why isn’t the software coded to check for such an obvious pre-requisite?
Whatever nothings perfect