Comprehensive data protection for all workloads
Post Reply
david.buchanan
Service Provider
Posts: 42
Liked: 8 times
Joined: Jun 02, 2015 12:44 am
Full Name: David
Contact:

HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Post by david.buchanan »

Hi All,

Just posting this to make everyone aware that there is a bug in the current Inform OS 3.3.1 in which processes on the 3PAR will get stuck/frozen during a snapshot removal and cause the system to Panic.

This isn't specifically a Veeam issue, the bug can happen when any system tries to remove a snapshot.

Simply posting this here to make people aware of the issue in case they are looking at updating to 3.3.1. The issue does not effect versions prior to 3.3.1.

Currently there is a fix in the works but no ETA on when it will be released. The current work around is to switch back to hotadd backups.

Thanks,
David.
foggy
Veeam Software
Posts: 21138
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Post by foggy »

Hi David, thanks for sharing. I would like to add though that this is not a 100% reproducible issue, since we have 3.3.1.215 (GA) deployed in our lab and do not see such behavior. So there should be some specific circumstances in which the issue shows up.
david.buchanan
Service Provider
Posts: 42
Liked: 8 times
Joined: Jun 02, 2015 12:44 am
Full Name: David
Contact:

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Post by david.buchanan » 1 person likes this post

Hi Foggy,

Yes, you are correct. We were running storage snapshots for 3 weeks without issue before this popped up.

I'm waiting on additional details from HPE but as I understand it there are 3 bugs they are aware of.

Two are resolved by P7 and P11 and the 3rd which hit us is resolved by P12 which is not out of QA as of this post.

Below is what HPE outlined to us in our ticket with them:

The System manager (sysmgr) process is one of the main Kernel process running on our Inserv and is running on the master node. We only allow one system manager process to be run once on the master node.
There are couple of child processes attached to the sysmgr process, for example the pdscrubber process responsible for chunklets relocation (servicemag process) or the tpdtcl process responsible to run cli commands on the Inserv.

When sysmgr became unresponsive and no specific tasks or processes are run on the Inserv (i.e no tuning or servicemag process running) or no pending IOCTL block are pending between the controller nodes it is usually safe to restart the system manager process.

Cause : Automation (Veeam) to incorrectly issue snapshot delete requests out of order and attempt to delete RO snaps (normally hidden), this causes the two snaps to be merged from an exceptions table point of view, but this can't be done because one the snaps is stuck in the "pending delete" and results in multiple node panic's
The snapshot removal process has been enhanced so as not to allow out of order snapshot removal with pending delete and is available in the upcoming 3.3.1.MU1 with Patch 12 CURRENTLY patch 12 is NOT available, but is currently undergoing Software QA testing and is expected to be available soon. (subject to change) (Update from LAB)


This is still ongoing with HPE support so I'll update the post with additional details as I get them.

Thanks,
David.
Massamb
Novice
Posts: 4
Liked: 2 times
Joined: Mar 08, 2017 5:25 pm
Full Name: Massimo
Contact:

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Post by Massamb »

Hi David, we also have 3.3.1.215 (GA)+P01,P02,P04.
HPE upgraded our production 3PAR on August 12th and so far we do not see such behavior.
In the HPE ticket description you posted is not clear to me which are the two snaps to be merged (in weeam backup there is only one snap involved, right?)
Have you got more details?

Thanks.
Massimo
znabela
Service Provider
Posts: 13
Liked: 5 times
Joined: Dec 04, 2014 7:09 am
Full Name: Robert Christiansen
Contact:

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Post by znabela »

We are currently having a similar issue with Inform OS 3.3.1.215 (GA)+P01,P02 ... a lot of hung Veeam storage snapshots, a CPG that has run out of space, and unable to allocate more space since sunday morning.

Awaiting call-back from 3PAR 2nd-tier support.
Robert Christiansen
Infrastructure Specialist @ Danoffice IT
Massamb
Novice
Posts: 4
Liked: 2 times
Joined: Mar 08, 2017 5:25 pm
Full Name: Massimo
Contact:

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Post by Massamb » 1 person likes this post

HPE has released a patch (3.3.1 MU1 P14) that may be related to this issue.
Here the link to the Release Notes:

https://support.hpe.com/hpsc/doc/public ... 27034en_us
jdixon
Novice
Posts: 8
Liked: 1 time
Joined: Nov 23, 2016 4:30 pm
Full Name: Jacob Dixon
Contact:

Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug

Post by jdixon »

Has anyone tested the latest to see if this is resolved? I'm experiencing stuck backup jobs with Veeam recently but only after I fully patched Windows and HPE drivers. We have a 3PAR 8200 on 3.3.1 and using CLI 3.3.1 (which they don't even have on their downloads for some reason, you have to request it) and 2.7.1 VSS provider.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 133 guests