-
- Expert
- Posts: 187
- Liked: 12 times
- Joined: Sep 01, 2012 2:53 pm
- Full Name: Josue Maldonado
- Contact:
VM instant recovery bottleneck
Good day everyone,
We have the following deploy and when the disk array failed we tried to rapid restore de SQL VM from backup and it was up and running fast but performance was so poor user were not available to work. this vm is about 3.7 TB in size. and while it was running on instant recovery we found that on vsphere max queue depth went up 4,294,967,297.5 for the NFS mounted drive. So we had to made a full restore to a spare non ssd array that took about 10 hours.
Could anyone please help me to find were would be the bottleneck for instant recovery in this case, thanks in advance
We have the following deploy and when the disk array failed we tried to rapid restore de SQL VM from backup and it was up and running fast but performance was so poor user were not available to work. this vm is about 3.7 TB in size. and while it was running on instant recovery we found that on vsphere max queue depth went up 4,294,967,297.5 for the NFS mounted drive. So we had to made a full restore to a spare non ssd array that took about 10 hours.
Could anyone please help me to find were would be the bottleneck for instant recovery in this case, thanks in advance
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: VM instant recovery bottleneck
Hello! The bottleneck is your backup storage performance (IOPS capacity).
-
- Expert
- Posts: 187
- Liked: 12 times
- Joined: Sep 01, 2012 2:53 pm
- Full Name: Josue Maldonado
- Contact:
Re: VM instant recovery bottleneck
thanks Gostev, what would be a proper cost/value suggested backup storage hardware specs for such scenario?
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: VM instant recovery bottleneck
Well, if you want instantly recovered VM performance to be comparable to one of the production VM, then basically your backup storage should be similar to your production storage. While as it stands right now, they are by a few orders of magnitude different in terms of IOPS capacity.
Unfortunately, the only way to get any decent IOPS from 7.2K spinning drives is to use an array with A LOT of them.
I also recommend you upgrade to v10 in case you're on an earlier version still, and ensure your backup repository RAM meets system requirements (this is more important for v10 IR than for previous versions).
Unfortunately, the only way to get any decent IOPS from 7.2K spinning drives is to use an array with A LOT of them.
I also recommend you upgrade to v10 in case you're on an earlier version still, and ensure your backup repository RAM meets system requirements (this is more important for v10 IR than for previous versions).
-
- Expert
- Posts: 187
- Liked: 12 times
- Joined: Sep 01, 2012 2:53 pm
- Full Name: Josue Maldonado
- Contact:
Re: VM instant recovery bottleneck
Hello Gostev,
Well in our case and think most shops in the real life scenarios could be similar to ours, we certainly cannot afford to buy another productive alike hardware for backup storage. We are ok if the recovery hardware can handle 50%-60% workload of the production one, and this instant recovery intent practically was unable to handle a single user cuz performance was really poor.
Also found vsphere logs that this SQL vm on production prior the fail was at 2,000 average IOPS with some spikes up to 6,000 during working ours, max queue depth was about 128. currently we are at version 9.5 update 4 but in our limited knowledge dont believe this case is version related.
Thanks,
Well in our case and think most shops in the real life scenarios could be similar to ours, we certainly cannot afford to buy another productive alike hardware for backup storage. We are ok if the recovery hardware can handle 50%-60% workload of the production one, and this instant recovery intent practically was unable to handle a single user cuz performance was really poor.
Also found vsphere logs that this SQL vm on production prior the fail was at 2,000 average IOPS with some spikes up to 6,000 during working ours, max queue depth was about 128. currently we are at version 9.5 update 4 but in our limited knowledge dont believe this case is version related.
Thanks,
-
- Veteran
- Posts: 643
- Liked: 312 times
- Joined: Aug 04, 2019 2:57 pm
- Full Name: Harvey
- Contact:
Re: VM instant recovery bottleneck
Hey Josue,
Instant Recovery is cool. I mean really cool. But there's no magic to calculating storage performance, and it does exactly as written on the feature: Restore a VM Instantly for use.
Frankly speaking, I think you just need to adjust your SLAs for the restore time. I get you're a small shop (I deal with quite a few of them), but you cannot code your way past the physical limits of hardware 99.9% of the time.
I think your test here shows you have some vulnerabilities, namely that you just don't have a landing platform for your SQL database.
Did you lose the production environment? if you do a test, does redirecting the snapshots to the production datastore improve your performance to at least a "limp along" level?
Instant Recovery is cool. I mean really cool. But there's no magic to calculating storage performance, and it does exactly as written on the feature: Restore a VM Instantly for use.
Frankly speaking, I think you just need to adjust your SLAs for the restore time. I get you're a small shop (I deal with quite a few of them), but you cannot code your way past the physical limits of hardware 99.9% of the time.
I think your test here shows you have some vulnerabilities, namely that you just don't have a landing platform for your SQL database.
Did you lose the production environment? if you do a test, does redirecting the snapshots to the production datastore improve your performance to at least a "limp along" level?
-
- Expert
- Posts: 187
- Liked: 12 times
- Joined: Sep 01, 2012 2:53 pm
- Full Name: Josue Maldonado
- Contact:
Re: VM instant recovery bottleneck
hey there soncscy , thanks for your input.
we used to have a replica for that vm when it was about 1.4TB size then it grew up pretty fast and today we have not avaliable backup/replication storage for it.
This is the first time we tried to use "instant recovery" in a real life case, sometimes when was needed we did restores for small vms 400GB or less.
yes production disk array was lost, so we decided to perform full restore instead. and I agree with you that we have a clear vulnerability , and that's the spirit of my initial question to get some guidance to find where we have the bottleneck for such thing like instant recovery, and buy/upgrade the necessary components.
we used to have a replica for that vm when it was about 1.4TB size then it grew up pretty fast and today we have not avaliable backup/replication storage for it.
This is the first time we tried to use "instant recovery" in a real life case, sometimes when was needed we did restores for small vms 400GB or less.
yes production disk array was lost, so we decided to perform full restore instead. and I agree with you that we have a clear vulnerability , and that's the spirit of my initial question to get some guidance to find where we have the bottleneck for such thing like instant recovery, and buy/upgrade the necessary components.
-
- Veteran
- Posts: 643
- Liked: 312 times
- Joined: Aug 04, 2019 2:57 pm
- Full Name: Harvey
- Contact:
Re: VM instant recovery bottleneck
ah, now I understand.
Yeah, basically, you might consider if there's a way to put your most critical backups onto a dedicated storage. I'm not sure what prices in your country are like, but from where I'm from, getting up to 4 TB on SSDs isn't out of the question if you've got a 4 TB SSD. It's not cheap for sure, but at the same time, I just position the question as "would you prefer to pay $N now? Or lose $N*Days in the future because your backups need to restore slowly?"
In almost every situation I've found, the first is far cheaper than the second.
I hope that you can find a stable solution!
Yeah, basically, you might consider if there's a way to put your most critical backups onto a dedicated storage. I'm not sure what prices in your country are like, but from where I'm from, getting up to 4 TB on SSDs isn't out of the question if you've got a 4 TB SSD. It's not cheap for sure, but at the same time, I just position the question as "would you prefer to pay $N now? Or lose $N*Days in the future because your backups need to restore slowly?"
In almost every situation I've found, the first is far cheaper than the second.
I hope that you can find a stable solution!
-
- VP, Product Management
- Posts: 7081
- Liked: 1511 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: VM instant recovery bottleneck
Actually, v10 brought next-generation instant recovery engine and other significant improvements for Instant Restore (see the What's New in v10 document for more information). I highly recommend to update to this version.
I was a part of the testing group that did tests with the Instant Restore from a 100+ disk nearline storage vs. a small primary all flash storage. In that specificities case the backup storage won in case of random read testing. So it all depends on what you use.
v10 has more intelligent read caching, which a lot of application benefit from.
-
- VP, Product Management
- Posts: 7081
- Liked: 1511 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: VM instant recovery bottleneck
Oh and please allow me to share that there are addition options. The product is called as well „Replication“ and you can pro active restore with this a VM to the production storage (Replica from Backup). In case it is just a software issue and your storage system is still OK (maybe use another set of disks for this) you can just start the VM from there. Same applies for any storage system with compatible storage snapshot processing ( see our storage integrations).
-
- Expert
- Posts: 187
- Liked: 12 times
- Joined: Sep 01, 2012 2:53 pm
- Full Name: Josue Maldonado
- Contact:
Re: VM instant recovery bottleneck
Hello soncscy, I certaintly cannot agree more with you on this. We will push management to invest on better restore hardware. Thanks.soncscy wrote: ↑Aug 07, 2020 9:42 pm ah, now I understand.
Yeah, basically, you might consider if there's a way to put your most critical backups onto a dedicated storage. I'm not sure what prices in your country are like, but from where I'm from, getting up to 4 TB on SSDs isn't out of the question if you've got a 4 TB SSD. It's not cheap for sure, but at the same time, I just position the question as "would you prefer to pay $N now? Or lose $N*Days in the future because your backups need to restore slowly?"
In almost every situation I've found, the first is far cheaper than the second.
I hope that you can find a stable solution!
-
- Expert
- Posts: 187
- Liked: 12 times
- Joined: Sep 01, 2012 2:53 pm
- Full Name: Josue Maldonado
- Contact:
Re: VM instant recovery bottleneck
Hello Andreas, thanks for the explanation. We would also look into the upgrade to V10.Andreas Neufert wrote: ↑Aug 08, 2020 8:01 pm Oh and please allow me to share that there are addition options. The product is called as well „Replication“ and you can pro active restore with this a VM to the production storage (Replica from Backup). In case it is just a software issue and your storage system is still OK (maybe use another set of disks for this) you can just start the VM from there. Same applies for any storage system with compatible storage snapshot processing ( see our storage integrations).
Who is online
Users browsing this forum: Semrush [Bot], Stabz and 25 guests