Time taken to restore data from capacity tier

jpek · Post by **jpek** » Nov 25, 2021 5:03 am this post

The data store in the capacity tier is "incremental forever", I am curious if this causes more time to restore data from the capacity tier with long backup retention policy?

Post by **HannesK** » Nov 25, 2021 6:56 am this post

Hello,
What are you trying to compare (which hardware)? How would you compare it with classic block storage?

I believe it's hard to have similar object storage and classic block storage to make it somehow comparable... comparison with hyperscalers makes no sense, as the IO performance would be high, but network connection low compared to classic block storage. I think, nobody tried comparing for example 999 restore points on 60 disks object vs 60 disks block. But I would assume that block storage is the faster in most small-scale cases.

The format on object storage is very different from classic block storage. It's always tons of small objects instead of large backup files on classic repositories.

Best regards,
Hannes

jpek · Post by **jpek** » Nov 25, 2021 8:29 am this post

Thanks for the reply.
What confuses me most is the native format in the capacity tier which is always tons of small objects, if we have 999 restore points and to restore to the 999th restore point, do Veeam need to go thru all the previous 998 increment backup before it can restore to the 999th restore point? or the native format in the capacity tier enable Veeam to restore the 999th backup with visit the previous 998 increment backup?

Post by **HannesK** » Nov 25, 2021 12:25 pm this post

we read the metadata that is needed to create "the full picture".

example 1: forward incremental forever... read metadata of 999 files (few MB per file)
example 2: reverse incremental... read metadata of one file
example 3: capacity tier... read the metadata that is needed

in the end, it's relatively irrelevant. the amount of time is not worth to think about from my point of view.

jpek · Post by **jpek** » Nov 25, 2021 2:10 pm this post

==>"the amount of time is not worth to think about from my point of view."
Is this due to the low network connection compared to classic block storage as you mentioned before?

Post by **HannesK** » Nov 26, 2021 6:54 am this post

more in general. the comparison can only be for example 10Gbit/s vs. 10 Gbit/s and the same number of disks including disk type. anything else would be unfair to one of the solutions.

jpek · Post by **jpek** » Nov 28, 2021 1:23 am this post

Let me ask the question in another way, if we disregard the network/disks conditions (i.e. the disks/networks conditions are similar between performance and capacity tier), keeping the backup in Veeam native format with 999 incremental restore points, will it take longer time to restore the 999th restore point?

soncscy · Nov 28, 2021 4:38 pm

Vincent,

> if we disregard the network/disks conditions (i.e. the disks/networks conditions are similar between performance and capacity tier), keeping the backup in Veeam native format with 999 incremental restore points, will it take longer time to restore the 999th restore point?

I think you're looking for a comparison in the wrong way here. The network/disk conditions are the most important part and are where it's going to differ heavily, and you're also missing the most important difference between on-premises disks and S3: the S3 API.

Capacity Tier as I see it from the description and in practice is basically using the NoSQL DB structure, so instead of some large container and server to control it, all the blocks are self-describing. Theoretically, this should be a lot faster than traditional seeking in my opinion as half of the issues you get with random read IO is quickly fetching needed blocks on both SSDs and spinning disks; this is why bloom filters and btrees are abused heavily in deduplication appliances/software, because it's far less expensive to search metadata in memory than it is to physically hit the disk.

On paper, I'd expect S3 systems to be "faster" overall with perhaps a slower start up as the application builds the list of needed blocks; but since S3 knows how to find the blocks far faster, a performant S3 backend should probably serve up the data faster for a restore.

__But, this comes with the caveats of the backing S3 system__. Write a simple script and spam thousands of concurrent HTTP requests to even AWS or Azure; probably in a few seconds you're going to get slapped by the API and have all your connections closed. The S3 API sends you some warnings about this, and I think I recall in another thread that Veeam does have a "back-off" algorithm which respects the "too many requests" warning and slows down the activity. So it's not just the network performance between you and your S3 provider, it's the API itself and how many requests the provider is willing to handle with your activity.

One thing to be clear about, the API issue is about __concurrent__ requests, not total. From your example, restoring from point 1 vs point 999 is no different really if you're just making a single stream of request. Similarly point 1 vs point 999 is really not that different if you're making 1000 concurrent requests. It's about the concurrency within a time frame, not about the total number of requests you'll eventually make.

So to summarize:

- Those little details are absolutely everything, you cannot remove network/disk performance/API performance from your comparison
- In terms of raw data fetching, I'd expect that the Capacity Tier fetching from point 999 is __likely__ faster, but this is only on paper
- In reality, there are just too many factors outside of the process (as I understand it) to give you a consistent and accurate "S3 vs on-premises" comparison
- Intuitively, I would expect that slowdowns from the API's handling of the egress requests might give on-premises the edge, but this is just a guess, I've not tested this heavily

From experience with clients, usually the API requests are the choke point, and especially with smaller S3 providers, but the times are often comparable to on-premises. Instant recovery stays true to it's name luckily and the performance was "pretty good", good enough that they could survive until a Friday evening when we migrated.

I do somewhat suspect that on-premises would edge out on entire VM restore/migrating to production, but this is purely a guess on my part.

R&D Forums

Time taken to restore data from capacity tier

Re: Time taken to restore data from capacity tier

Re: Time taken to restore data from capacity tier

Re: Time taken to restore data from capacity tier

Re: Time taken to restore data from capacity tier

Re: Time taken to restore data from capacity tier

Re: Time taken to restore data from capacity tier

Re: Time taken to restore data from capacity tier

Who is online