-
- Influencer
- Posts: 21
- Liked: 6 times
- Joined: Oct 16, 2019 7:43 pm
- Full Name: David Torreggiani
- Contact:
Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
Hello,
we're facing a strange issue.
Everytime i terminate a restore from storage snapshot i get the following error on vCenter/esxi:
Lost connectivity to storage device ...
I noticed that the error pops up after the very last task in the restore, which is the deletion of the cloned snapshot (which should be detached at that point).
We're using vCenter 7.0 and Nimble (latest OS version) as well as Veeam latest (10a).
I've already tried the workaround in which i set access to "volume only" on the Nimble datastore, same thing.
Has anyone seen this issue? Veeam support has not been able to find a solution yet.
On the esxi host i do see the dead paths, which i can clear with a storage rescan.
Regards,
David
we're facing a strange issue.
Everytime i terminate a restore from storage snapshot i get the following error on vCenter/esxi:
Lost connectivity to storage device ...
I noticed that the error pops up after the very last task in the restore, which is the deletion of the cloned snapshot (which should be detached at that point).
We're using vCenter 7.0 and Nimble (latest OS version) as well as Veeam latest (10a).
I've already tried the workaround in which i set access to "volume only" on the Nimble datastore, same thing.
Has anyone seen this issue? Veeam support has not been able to find a solution yet.
On the esxi host i do see the dead paths, which i can clear with a storage rescan.
Regards,
David
-
- Veeam Software
- Posts: 3624
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
Hello,
I believe that the issue happens after the cleanup step: the last step in the workflow which is described here. Probably, it would make sense to ask our support engineers to help you to perform the same sequence of steps manually without leveraging Veeam in order to check that the same error occurs after you unmount a storage snapshot from ESXi.
Please don't forget to provide us with the support case ID.
Thanks!
I believe that the issue happens after the cleanup step: the last step in the workflow which is described here. Probably, it would make sense to ask our support engineers to help you to perform the same sequence of steps manually without leveraging Veeam in order to check that the same error occurs after you unmount a storage snapshot from ESXi.
Please don't forget to provide us with the support case ID.
Thanks!
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
Hi David, as this case was closed long ago due to no response from your side, I'd recommend opening a new one as the behavior you're describing requires deeper investigation. Thanks!
-
- VP, Product Management
- Posts: 7077
- Liked: 1510 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
Please check all Initiator groups on the Nimble storage for the "volume only" setting.
I guess there is one of the groups or individual host enrtoes that is not configured correctly.
https://infosight.hpe.com/InfoSight/med ... 12807.html
I guess there is one of the groups or individual host enrtoes that is not configured correctly.
https://infosight.hpe.com/InfoSight/med ... 12807.html
-
- Expert
- Posts: 149
- Liked: 34 times
- Joined: May 01, 2012 11:56 am
- Full Name: Stuart Green
- Contact:
Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
Yes, I see similar experience on Nimble. As you state the workaround is a rescan of storage on the esxi side as the paths are left in a dead state. After you have completed the Veeam bit. Our monitoring picks these dead paths up, and is cleaned with the rescan.
Particularly when doing a restore from Storage Snapshot.
Particularly when doing a restore from Storage Snapshot.
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
Hi Stuart, do you have a case open with Veeam support on this?
-
- Expert
- Posts: 149
- Liked: 34 times
- Joined: May 01, 2012 11:56 am
- Full Name: Stuart Green
- Contact:
Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
No. As I just see this as expected behaviour and an expectation. It's not Veeam's task to perhaps cleanup / mop up ESXi Dead Paths.
That would be the same no matter what storage device was removed from ESXi. An immediate storage rescan would be required, until those Dead Paths are gone.
That would be the same no matter what storage device was removed from ESXi. An immediate storage rescan would be required, until those Dead Paths are gone.
-
- Influencer
- Posts: 21
- Liked: 6 times
- Joined: Oct 16, 2019 7:43 pm
- Full Name: David Torreggiani
- Contact:
Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
I would be ok if Veeam told me this is expected. We can live with it, it's just a rescan.
The strange thing is that apparently no customers, besides Stuart, has even raised an eye brow on this.
The strange thing is that apparently no customers, besides Stuart, has even raised an eye brow on this.
-
- Enthusiast
- Posts: 36
- Liked: 7 times
- Joined: Sep 07, 2021 5:37 pm
- Full Name: TW
- Contact:
Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
Btw - for anyone having a similar issue - our company had a major issue with this.
We restored a VM on a Nimble snapshot via Veeam. It went great, but within an hour we were getting SCSI abort errors from VeeamOne - but only for the Snapshot datastore.
These errors would cycle through several times an hour - during which multiple hosts would become unresponsive. DRS likely made the issue worse trying to migrate VM's between hosts.
Eventually we opened a ticket, looked through logs, and found that Veeam added the Snapshot during recovery to the entire cluster, but only removed it from 1 host when finished. This confused ESXi thinking it had "all paths down" issues, causing them to seize.
All hosts had to be rebooted individually to remove the attached snapshot - which was very slow since migrating VM's off a host would often fail due to the hosts locking up.
The entire cluster was affected since Nimble permissions were setup by Cluster and not per individual host
Lessons learned:
- Set Nimble access permissions to "VolumeOnly vs "Volumes & Snapshots"
- If you configure SAN access to a cluster vs individual host you will likely still have issues. You can workaround by pulling a single host out of the group before doing the restore.
- setting every host individually in Nimble should also avoid the issue if you don't mind the extra work maintaining the list
Sounds like it's on their radar and might be patched this year, but I am not a Veeam employee so can't verify.
We restored a VM on a Nimble snapshot via Veeam. It went great, but within an hour we were getting SCSI abort errors from VeeamOne - but only for the Snapshot datastore.
These errors would cycle through several times an hour - during which multiple hosts would become unresponsive. DRS likely made the issue worse trying to migrate VM's between hosts.
Eventually we opened a ticket, looked through logs, and found that Veeam added the Snapshot during recovery to the entire cluster, but only removed it from 1 host when finished. This confused ESXi thinking it had "all paths down" issues, causing them to seize.
All hosts had to be rebooted individually to remove the attached snapshot - which was very slow since migrating VM's off a host would often fail due to the hosts locking up.
The entire cluster was affected since Nimble permissions were setup by Cluster and not per individual host
Lessons learned:
- Set Nimble access permissions to "VolumeOnly vs "Volumes & Snapshots"
- If you configure SAN access to a cluster vs individual host you will likely still have issues. You can workaround by pulling a single host out of the group before doing the restore.
- setting every host individually in Nimble should also avoid the issue if you don't mind the extra work maintaining the list
Sounds like it's on their radar and might be patched this year, but I am not a Veeam employee so can't verify.
-
- Novice
- Posts: 8
- Liked: never
- Joined: Aug 20, 2019 5:06 am
- Full Name: Mark Mathieson
- Contact:
Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)
TWuser, we're seeing exactly the same issue: ESXi locked up, all hosts in cluster need to be rebooted.
When you say, "Sounds like it's on their radar", are you referring to "Veeam"?
As of this writing, late 2023, we're patched to the latest levels for V11, but we had the issue two weeks ago, so doesn't look like they're in any hurry.
When you say, "Sounds like it's on their radar", are you referring to "Veeam"?
As of this writing, late 2023, we're patched to the latest levels for V11, but we had the issue two weeks ago, so doesn't look like they're in any hurry.
Who is online
Users browsing this forum: Amazon [Bot] and 59 guests