-
- Service Provider
- Posts: 42
- Liked: 8 times
- Joined: Jun 02, 2015 12:44 am
- Full Name: David
- Contact:
HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug
Hi All,
Just posting this to make everyone aware that there is a bug in the current Inform OS 3.3.1 in which processes on the 3PAR will get stuck/frozen during a snapshot removal and cause the system to Panic.
This isn't specifically a Veeam issue, the bug can happen when any system tries to remove a snapshot.
Simply posting this here to make people aware of the issue in case they are looking at updating to 3.3.1. The issue does not effect versions prior to 3.3.1.
Currently there is a fix in the works but no ETA on when it will be released. The current work around is to switch back to hotadd backups.
Thanks,
David.
Just posting this to make everyone aware that there is a bug in the current Inform OS 3.3.1 in which processes on the 3PAR will get stuck/frozen during a snapshot removal and cause the system to Panic.
This isn't specifically a Veeam issue, the bug can happen when any system tries to remove a snapshot.
Simply posting this here to make people aware of the issue in case they are looking at updating to 3.3.1. The issue does not effect versions prior to 3.3.1.
Currently there is a fix in the works but no ETA on when it will be released. The current work around is to switch back to hotadd backups.
Thanks,
David.
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug
Hi David, thanks for sharing. I would like to add though that this is not a 100% reproducible issue, since we have 3.3.1.215 (GA) deployed in our lab and do not see such behavior. So there should be some specific circumstances in which the issue shows up.
-
- Service Provider
- Posts: 42
- Liked: 8 times
- Joined: Jun 02, 2015 12:44 am
- Full Name: David
- Contact:
Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug
Hi Foggy,
Yes, you are correct. We were running storage snapshots for 3 weeks without issue before this popped up.
I'm waiting on additional details from HPE but as I understand it there are 3 bugs they are aware of.
Two are resolved by P7 and P11 and the 3rd which hit us is resolved by P12 which is not out of QA as of this post.
Below is what HPE outlined to us in our ticket with them:
The System manager (sysmgr) process is one of the main Kernel process running on our Inserv and is running on the master node. We only allow one system manager process to be run once on the master node.
There are couple of child processes attached to the sysmgr process, for example the pdscrubber process responsible for chunklets relocation (servicemag process) or the tpdtcl process responsible to run cli commands on the Inserv.
When sysmgr became unresponsive and no specific tasks or processes are run on the Inserv (i.e no tuning or servicemag process running) or no pending IOCTL block are pending between the controller nodes it is usually safe to restart the system manager process.
Cause : Automation (Veeam) to incorrectly issue snapshot delete requests out of order and attempt to delete RO snaps (normally hidden), this causes the two snaps to be merged from an exceptions table point of view, but this can't be done because one the snaps is stuck in the "pending delete" and results in multiple node panic's
The snapshot removal process has been enhanced so as not to allow out of order snapshot removal with pending delete and is available in the upcoming 3.3.1.MU1 with Patch 12 CURRENTLY patch 12 is NOT available, but is currently undergoing Software QA testing and is expected to be available soon. (subject to change) (Update from LAB)
This is still ongoing with HPE support so I'll update the post with additional details as I get them.
Thanks,
David.
Yes, you are correct. We were running storage snapshots for 3 weeks without issue before this popped up.
I'm waiting on additional details from HPE but as I understand it there are 3 bugs they are aware of.
Two are resolved by P7 and P11 and the 3rd which hit us is resolved by P12 which is not out of QA as of this post.
Below is what HPE outlined to us in our ticket with them:
The System manager (sysmgr) process is one of the main Kernel process running on our Inserv and is running on the master node. We only allow one system manager process to be run once on the master node.
There are couple of child processes attached to the sysmgr process, for example the pdscrubber process responsible for chunklets relocation (servicemag process) or the tpdtcl process responsible to run cli commands on the Inserv.
When sysmgr became unresponsive and no specific tasks or processes are run on the Inserv (i.e no tuning or servicemag process running) or no pending IOCTL block are pending between the controller nodes it is usually safe to restart the system manager process.
Cause : Automation (Veeam) to incorrectly issue snapshot delete requests out of order and attempt to delete RO snaps (normally hidden), this causes the two snaps to be merged from an exceptions table point of view, but this can't be done because one the snaps is stuck in the "pending delete" and results in multiple node panic's
The snapshot removal process has been enhanced so as not to allow out of order snapshot removal with pending delete and is available in the upcoming 3.3.1.MU1 with Patch 12 CURRENTLY patch 12 is NOT available, but is currently undergoing Software QA testing and is expected to be available soon. (subject to change) (Update from LAB)
This is still ongoing with HPE support so I'll update the post with additional details as I get them.
Thanks,
David.
-
- Novice
- Posts: 4
- Liked: 2 times
- Joined: Mar 08, 2017 5:25 pm
- Full Name: Massimo
- Contact:
Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug
Hi David, we also have 3.3.1.215 (GA)+P01,P02,P04.
HPE upgraded our production 3PAR on August 12th and so far we do not see such behavior.
In the HPE ticket description you posted is not clear to me which are the two snaps to be merged (in weeam backup there is only one snap involved, right?)
Have you got more details?
Thanks.
Massimo
HPE upgraded our production 3PAR on August 12th and so far we do not see such behavior.
In the HPE ticket description you posted is not clear to me which are the two snaps to be merged (in weeam backup there is only one snap involved, right?)
Have you got more details?
Thanks.
Massimo
-
- Service Provider
- Posts: 13
- Liked: 5 times
- Joined: Dec 04, 2014 7:09 am
- Full Name: Robert Christiansen
- Contact:
Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug
We are currently having a similar issue with Inform OS 3.3.1.215 (GA)+P01,P02 ... a lot of hung Veeam storage snapshots, a CPG that has run out of space, and unable to allocate more space since sunday morning.
Awaiting call-back from 3PAR 2nd-tier support.
Awaiting call-back from 3PAR 2nd-tier support.
Robert Christiansen
Infrastructure Specialist @ Danoffice IT
Infrastructure Specialist @ Danoffice IT
-
- Novice
- Posts: 4
- Liked: 2 times
- Joined: Mar 08, 2017 5:25 pm
- Full Name: Massimo
- Contact:
Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug
HPE has released a patch (3.3.1 MU1 P14) that may be related to this issue.
Here the link to the Release Notes:
https://support.hpe.com/hpsc/doc/public ... 27034en_us
Here the link to the Release Notes:
https://support.hpe.com/hpsc/doc/public ... 27034en_us
-
- Novice
- Posts: 8
- Liked: 1 time
- Joined: Nov 23, 2016 4:30 pm
- Full Name: Jacob Dixon
- Contact:
Re: HPE 3PAR Inform OS 3.3.1 Storage Snapshot Bug
Has anyone tested the latest to see if this is resolved? I'm experiencing stuck backup jobs with Veeam recently but only after I fully patched Windows and HPE drivers. We have a 3PAR 8200 on 3.3.1 and using CLI 3.3.1 (which they don't even have on their downloads for some reason, you have to request it) and 2.7.1 VSS provider.
Who is online
Users browsing this forum: Bing [Bot], Google [Bot] and 133 guests