Host-based backup of VMware vSphere VMs.
Post Reply
superdevo
Influencer
Posts: 21
Liked: 6 times
Joined: Oct 16, 2019 7:43 pm
Full Name: David Torreggiani
Contact:

Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by superdevo »

Hello,
we're facing a strange issue.
Everytime i terminate a restore from storage snapshot i get the following error on vCenter/esxi:

Lost connectivity to storage device ...

I noticed that the error pops up after the very last task in the restore, which is the deletion of the cloned snapshot (which should be detached at that point).
We're using vCenter 7.0 and Nimble (latest OS version) as well as Veeam latest (10a).

I've already tried the workaround in which i set access to "volume only" on the Nimble datastore, same thing.
Has anyone seen this issue? Veeam support has not been able to find a solution yet.

On the esxi host i do see the dead paths, which i can clear with a storage rescan.

Regards,
David
PetrM
Veeam Software
Posts: 3624
Liked: 608 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by PetrM »

Hello,

I believe that the issue happens after the cleanup step: the last step in the workflow which is described here. Probably, it would make sense to ask our support engineers to help you to perform the same sequence of steps manually without leveraging Veeam in order to check that the same error occurs after you unmount a storage snapshot from ESXi.

Please don't forget to provide us with the support case ID.

Thanks!
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by foggy »

Veeam Case: 03809366
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by foggy »

Hi David, as this case was closed long ago due to no response from your side, I'd recommend opening a new one as the behavior you're describing requires deeper investigation. Thanks!
Andreas Neufert
VP, Product Management
Posts: 7077
Liked: 1510 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by Andreas Neufert »

Please check all Initiator groups on the Nimble storage for the "volume only" setting.
I guess there is one of the groups or individual host enrtoes that is not configured correctly.

https://infosight.hpe.com/InfoSight/med ... 12807.html
stuartmacgreen
Expert
Posts: 149
Liked: 34 times
Joined: May 01, 2012 11:56 am
Full Name: Stuart Green
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by stuartmacgreen »

Yes, I see similar experience on Nimble. As you state the workaround is a rescan of storage on the esxi side as the paths are left in a dead state. After you have completed the Veeam bit. Our monitoring picks these dead paths up, and is cleaned with the rescan.

Particularly when doing a restore from Storage Snapshot.
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by foggy »

Hi Stuart, do you have a case open with Veeam support on this?
stuartmacgreen
Expert
Posts: 149
Liked: 34 times
Joined: May 01, 2012 11:56 am
Full Name: Stuart Green
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by stuartmacgreen » 2 people like this post

No. As I just see this as expected behaviour and an expectation. It's not Veeam's task to perhaps cleanup / mop up ESXi Dead Paths.
That would be the same no matter what storage device was removed from ESXi. An immediate storage rescan would be required, until those Dead Paths are gone.
superdevo
Influencer
Posts: 21
Liked: 6 times
Joined: Oct 16, 2019 7:43 pm
Full Name: David Torreggiani
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by superdevo »

I would be ok if Veeam told me this is expected. We can live with it, it's just a rescan.
The strange thing is that apparently no customers, besides Stuart, has even raised an eye brow on this.
TWuser
Enthusiast
Posts: 36
Liked: 7 times
Joined: Sep 07, 2021 5:37 pm
Full Name: TW
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by TWuser » 1 person likes this post

Btw - for anyone having a similar issue - our company had a major issue with this.

We restored a VM on a Nimble snapshot via Veeam. It went great, but within an hour we were getting SCSI abort errors from VeeamOne - but only for the Snapshot datastore.
These errors would cycle through several times an hour - during which multiple hosts would become unresponsive. DRS likely made the issue worse trying to migrate VM's between hosts.

Eventually we opened a ticket, looked through logs, and found that Veeam added the Snapshot during recovery to the entire cluster, but only removed it from 1 host when finished. This confused ESXi thinking it had "all paths down" issues, causing them to seize.
All hosts had to be rebooted individually to remove the attached snapshot - which was very slow since migrating VM's off a host would often fail due to the hosts locking up.
The entire cluster was affected since Nimble permissions were setup by Cluster and not per individual host

Lessons learned:
- Set Nimble access permissions to "VolumeOnly vs "Volumes & Snapshots"
- If you configure SAN access to a cluster vs individual host you will likely still have issues. You can workaround by pulling a single host out of the group before doing the restore.
- setting every host individually in Nimble should also avoid the issue if you don't mind the extra work maintaining the list
Sounds like it's on their radar and might be patched this year, but I am not a Veeam employee so can't verify.
Marvellous
Novice
Posts: 8
Liked: never
Joined: Aug 20, 2019 5:06 am
Full Name: Mark Mathieson
Contact:

Re: Lost connectivity to storage device when terminating restore from Storage Snapshot (Nimble)

Post by Marvellous »

TWuser, we're seeing exactly the same issue: ESXi locked up, all hosts in cluster need to be rebooted.
When you say, "Sounds like it's on their radar", are you referring to "Veeam"?
As of this writing, late 2023, we're patched to the latest levels for V11, but we had the issue two weeks ago, so doesn't look like they're in any hurry.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Majestic-12 [Bot] and 54 guests