HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Sep 12, 2017 10:51 pm

Hi All,

Just posting this to make everyone aware that there is a bug in the current Inform OS 3.3.1 in which processes on the 3PAR will get stuck/frozen during a snapshot removal and cause the system to Panic.

This isn't specifically a Veeam issue, the bug can happen when any system tries to remove a snapshot.

Simply posting this here to make people aware of the issue in case they are looking at updating to 3.3.1. The issue does not effect versions prior to 3.3.1.

Currently there is a fix in the works but no ETA on when it will be released. The current work around is to switch back to hotadd backups.

Thanks,
David.

Post by **foggy** » Sep 14, 2017 4:33 pm this post

Hi David, thanks for sharing. I would like to add though that this is not a 100% reproducible issue, since we have 3.3.1.215 (GA) deployed in our lab and do not see such behavior. So there should be some specific circumstances in which the issue shows up.

Sep 14, 2017 10:43 pm

Hi Foggy,

Yes, you are correct. We were running storage snapshots for 3 weeks without issue before this popped up.

I'm waiting on additional details from HPE but as I understand it there are 3 bugs they are aware of.

Two are resolved by P7 and P11 and the 3rd which hit us is resolved by P12 which is not out of QA as of this post.

Below is what HPE outlined to us in our ticket with them:

The System manager (sysmgr) process is one of the main Kernel process running on our Inserv and is running on the master node. We only allow one system manager process to be run once on the master node.
There are couple of child processes attached to the sysmgr process, for example the pdscrubber process responsible for chunklets relocation (servicemag process) or the tpdtcl process responsible to run cli commands on the Inserv.

When sysmgr became unresponsive and no specific tasks or processes are run on the Inserv (i.e no tuning or servicemag process running) or no pending IOCTL block are pending between the controller nodes it is usually safe to restart the system manager process.

Cause : Automation (Veeam) to incorrectly issue snapshot delete requests out of order and attempt to delete RO snaps (normally hidden), this causes the two snaps to be merged from an exceptions table point of view, but this can't be done because one the snaps is stuck in the "pending delete" and results in multiple node panic's
The snapshot removal process has been enhanced so as not to allow out of order snapshot removal with pending delete and is available in the upcoming 3.3.1.MU1 with Patch 12 CURRENTLY patch 12 is NOT available, but is currently undergoing Software QA testing and is expected to be available soon. (subject to change) (Update from LAB)

This is still ongoing with HPE support so I'll update the post with additional details as I get them.

Thanks,
David.

Massamb · Post by **Massamb** » Sep 19, 2017 7:37 am this post

Hi David, we also have 3.3.1.215 (GA)+P01,P02,P04.
HPE upgraded our production 3PAR on August 12th and so far we do not see such behavior.
In the HPE ticket description you posted is not clear to me which are the two snaps to be merged (in weeam backup there is only one snap involved, right?)
Have you got more details?

Thanks.
Massimo

Post by **znabela** » Sep 25, 2017 11:26 am this post

We are currently having a similar issue with Inform OS 3.3.1.215 (GA)+P01,P02 ... a lot of hung Veeam storage snapshots, a CPG that has run out of space, and unable to allocate more space since sunday morning.

Awaiting call-back from 3PAR 2nd-tier support.

Massamb · Oct 15, 2017 5:32 pm

HPE has released a patch (3.3.1 MU1 P14) that may be related to this issue.
Here the link to the Release Notes:

https://support.hpe.com/hpsc/doc/public ... 27034en_us

jdixon · Post by **jdixon** » Feb 24, 2018 5:19 pm this post

Has anyone tested the latest to see if this is resolved? I'm experiencing stuck backup jobs with Veeam recently but only after I fully patched Windows and HPE drivers. We have a 3PAR 8200 on 3.3.1 and using CLI 3.3.1 (which they don't even have on their downloads for some reason, you have to request it) and 2.7.1 VSS provider.

R&D Forums

HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Who is online