Discussions specific to the VMware vSphere hypervisor
Post Reply
JosueM
Expert
Posts: 175
Liked: 11 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

VM instant recovery bottleneck

Post by JosueM »

Good day everyone,

We have the following deploy and when the disk array failed we tried to rapid restore de SQL VM from backup and it was up and running fast but performance was so poor user were not available to work. this vm is about 3.7 TB in size. and while it was running on instant recovery we found that on vsphere max queue depth went up 4,294,967,297.5 for the NFS mounted drive. So we had to made a full restore to a spare non ssd array that took about 10 hours.

Could anyone please help me to find were would be the bottleneck for instant recovery in this case, thanks in advance

Image Image

Gostev
SVP, Product Management
Posts: 27411
Liked: 4541 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VM instant recovery bottleneck

Post by Gostev »

Hello! The bottleneck is your backup storage performance (IOPS capacity).

JosueM
Expert
Posts: 175
Liked: 11 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

Re: VM instant recovery bottleneck

Post by JosueM »

thanks Gostev, what would be a proper cost/value suggested backup storage hardware specs for such scenario?

Gostev
SVP, Product Management
Posts: 27411
Liked: 4541 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VM instant recovery bottleneck

Post by Gostev »

Well, if you want instantly recovered VM performance to be comparable to one of the production VM, then basically your backup storage should be similar to your production storage. While as it stands right now, they are by a few orders of magnitude different in terms of IOPS capacity.

Unfortunately, the only way to get any decent IOPS from 7.2K spinning drives is to use an array with A LOT of them.

I also recommend you upgrade to v10 in case you're on an earlier version still, and ensure your backup repository RAM meets system requirements (this is more important for v10 IR than for previous versions).

JosueM
Expert
Posts: 175
Liked: 11 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

Re: VM instant recovery bottleneck

Post by JosueM »

Hello Gostev,

Well in our case and think most shops in the real life scenarios could be similar to ours, we certainly cannot afford to buy another productive alike hardware for backup storage. We are ok if the recovery hardware can handle 50%-60% workload of the production one, and this instant recovery intent practically was unable to handle a single user cuz performance was really poor.

Also found vsphere logs that this SQL vm on production prior the fail was at 2,000 average IOPS with some spikes up to 6,000 during working ours, max queue depth was about 128. currently we are at version 9.5 update 4 but in our limited knowledge dont believe this case is version related.

Thanks,

soncscy
Expert
Posts: 258
Liked: 109 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey Carel
Contact:

Re: VM instant recovery bottleneck

Post by soncscy »

Hey Josue,

Instant Recovery is cool. I mean really cool. But there's no magic to calculating storage performance, and it does exactly as written on the feature: Restore a VM Instantly for use.

Frankly speaking, I think you just need to adjust your SLAs for the restore time. I get you're a small shop (I deal with quite a few of them), but you cannot code your way past the physical limits of hardware 99.9% of the time.

I think your test here shows you have some vulnerabilities, namely that you just don't have a landing platform for your SQL database.

Did you lose the production environment? if you do a test, does redirecting the snapshots to the production datastore improve your performance to at least a "limp along" level?

JosueM
Expert
Posts: 175
Liked: 11 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

Re: VM instant recovery bottleneck

Post by JosueM »

hey there soncscy , thanks for your input.

we used to have a replica for that vm when it was about 1.4TB size then it grew up pretty fast and today we have not avaliable backup/replication storage for it.

This is the first time we tried to use "instant recovery" in a real life case, sometimes when was needed we did restores for small vms 400GB or less.

yes production disk array was lost, so we decided to perform full restore instead. and I agree with you that we have a clear vulnerability , and that's the spirit of my initial question to get some guidance to find where we have the bottleneck for such thing like instant recovery, and buy/upgrade the necessary components.

soncscy
Expert
Posts: 258
Liked: 109 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey Carel
Contact:

Re: VM instant recovery bottleneck

Post by soncscy » 1 person likes this post

ah, now I understand.

Yeah, basically, you might consider if there's a way to put your most critical backups onto a dedicated storage. I'm not sure what prices in your country are like, but from where I'm from, getting up to 4 TB on SSDs isn't out of the question if you've got a 4 TB SSD. It's not cheap for sure, but at the same time, I just position the question as "would you prefer to pay $N now? Or lose $N*Days in the future because your backups need to restore slowly?"

In almost every situation I've found, the first is far cheaper than the second.

I hope that you can find a stable solution!

Andreas Neufert
VP, Product Management
Posts: 4801
Liked: 944 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: VM instant recovery bottleneck

Post by Andreas Neufert »

JosueM wrote: Aug 07, 2020 1:21 pmcurrently we are at version 9.5 update 4 but in our limited knowledge dont believe this case is version related
Actually, v10 brought next-generation instant recovery engine and other significant improvements for Instant Restore (see the What's New in v10 document for more information). I highly recommend to update to this version.

I was a part of the testing group that did tests with the Instant Restore from a 100+ disk nearline storage vs. a small primary all flash storage. In that specificities case the backup storage won in case of random read testing. So it all depends on what you use.

v10 has more intelligent read caching, which a lot of application benefit from.

Andreas Neufert
VP, Product Management
Posts: 4801
Liked: 944 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: VM instant recovery bottleneck

Post by Andreas Neufert »

Oh and please allow me to share that there are addition options. The product is called as well „Replication“ and you can pro active restore with this a VM to the production storage (Replica from Backup). In case it is just a software issue and your storage system is still OK (maybe use another set of disks for this) you can just start the VM from there. Same applies for any storage system with compatible storage snapshot processing ( see our storage integrations).

JosueM
Expert
Posts: 175
Liked: 11 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

Re: VM instant recovery bottleneck

Post by JosueM »

soncscy wrote: Aug 07, 2020 9:42 pm ah, now I understand.

Yeah, basically, you might consider if there's a way to put your most critical backups onto a dedicated storage. I'm not sure what prices in your country are like, but from where I'm from, getting up to 4 TB on SSDs isn't out of the question if you've got a 4 TB SSD. It's not cheap for sure, but at the same time, I just position the question as "would you prefer to pay $N now? Or lose $N*Days in the future because your backups need to restore slowly?"

In almost every situation I've found, the first is far cheaper than the second.

I hope that you can find a stable solution!
Hello soncscy, I certaintly cannot agree more with you on this. We will push management to invest on better restore hardware. Thanks.

JosueM
Expert
Posts: 175
Liked: 11 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

Re: VM instant recovery bottleneck

Post by JosueM »

Andreas Neufert wrote: Aug 08, 2020 8:01 pm Oh and please allow me to share that there are addition options. The product is called as well „Replication“ and you can pro active restore with this a VM to the production storage (Replica from Backup). In case it is just a software issue and your storage system is still OK (maybe use another set of disks for this) you can just start the VM from there. Same applies for any storage system with compatible storage snapshot processing ( see our storage integrations).
Hello Andreas, thanks for the explanation. We would also look into the upgrade to V10.

Post Reply

Who is online

Users browsing this forum: No registered users and 14 guests