Backup job - automatic vs manual offload

Post by **lowlander** » Feb 04, 2021 6:50 pm this post

Hi,

when configuring a back-up job with GFS you can select GFS for a backup set to define the retention policy for restore points being saved in the performance tier. Pointing this backup job to a SOBR with performance and S3 capacity tier you can configure the SOBR to copy out restore points to S3 after backups are stored in the capacity tier. I assume that the retention on the restore points in the S3 capacity tier are based on the configured GFS policy in the backup job. Is this a correct assumption ?

The otherway around, when we delete a backup job as described above the restore points are available as imported backup. When we do an offload of this imported backup, is there any intelligence in the S3 layer that only changed obects are offloaded and current available backup data in S3 is being reused ? or can this be considered as a new chain that is being offloaded.

Context of this question is what to do with a GFS job that is accidently deleted, and a new job of the same backup set is recreated. Can we continue using the S3 backup data, or will a new first "full backup" offloaded to S3 -> resulting in additional capacity ( x2 ) ?

Post by **Gostev** » Feb 04, 2021 9:11 pm this post

Hi,

1. Correct.

2. Imported backups are not offloaded in principle, since they are not a part of any job. Backups are typically imported in order to perform a restore, so normally there's no reason to offload them.

3. If you create a new job and use Map Backup functionality to point it to the existing chain, then I believe everything will just continue as before with no impact whatsoever. Also, remember you can always simply restore your accidentally deleted jobs from a configuration backup.

Thanks!

Post by **lowlander** » Feb 10, 2021 8:23 pm this post

Thanks !

So automating the offload process in a backup job with GFS will preferrably require a performance tier with the same amount of capacity as the capacity in the S3 tier ( assuming we have a hardware based S3 solution ) ?

Post by **Gostev** » Feb 10, 2021 9:34 pm this post

If you're asking about the Performance Tier extent selection algorithm for backup and backup copy jobs pointed to SOBR, or for downloading files from a Capacity Tier into a Performance Tier: it's a single shared algorithm and it's a bit more complex than just capacity.

Post by **lowlander** » Feb 11, 2021 7:42 am this post

It is more about the ability to offload the backups to the S3 device. From the perspective that you can define a GFS scheme in a backup job : let's say 30 days daily retention, 4 weeks weekly retention, 12 months montly retention and 7 years yearly retention.

Is it possible to keep only the 30 days in the performance tier, and offloading the other restore points to S3 based on the retention scheme > how can we implement this the best way ?
- pro's : are the reduced amount of storage needed;
- cons : S3 retrieval speed during restores / risk on backup data loss when S3 storage is impacted;

or

is it necessary to keep the 30 days, 4 weeks, 12 months and 7 year backups in the performance tier(s) ?
- pro's object storage is in case of a failure or human error ?
- cons additional local storage;

Due to backup data availability I would say the last option is the safest. What is the best way to approach ?

Post by **Gostev** » Feb 11, 2021 1:21 pm this post

We expect and see most customers choosing to have both Copy and Move policies enabled. This way you have a full copy in object storage, and you most recent backup (which you are most likely to restore from) sitting on-prem available for fast restores, while all older backups are in object storage only (not taking any on-prem storage space).

Please note that the cons you're describing for the first approach are not really valid for most scenarios.

Speed: S3 retrieval is quite fast. At v10 launch, we did a demo of on-prem instant recovery from an Amazon S3 backup). Besides, most businesses have much different SLAs for restoring from archived backup files in any case. This attitude comes from early days, when every restore from tape automatically came with at least 24 hours SLA - so businesses expect this and are already prepared for this as it comes to archived backups.

Reliability: public cloud object storage like Amazon S3 provides extremely redundant storage with its multiple geographically dispersed copies. And I understand some on-prem S3 storage devices provide the same capabilities with separate copies on different physical devices behind object storage storage grid. In other words, backup data loss with proper object storage is considered extremely unlikely.

The second approach is best suited for customers who are very likely to have to restore from any restore point (including those a few years old) and do this practically all the time. This is a quite rare scenario I would say, as for most customers 98% of restores are done from the last 7-14 days of backups. With that in mind, what is the point of paying for have all 7 years of backups readily available on-prem?

And in case you're talking about on-prem object storage, then the 2nd approach seems especially excessive. Because now you have 2 copies of ALL of your backups with ALL sitting on-prem and so are suspect to fire, flood, burglary etc. Then, you still need to achieve "1" in the "3-2-1" rule. So when you also add "1" and it's a full copy as it is supposed to be, then why have 2 more copies on-prem?

Post by **lowlander** » Feb 12, 2021 5:40 pm this post

Hi Gostev,

thanks for this clear explanation, very appreciated.

Regarding scenario 2 I was thinking of creating a SOBR distributed over two datacenters ( DC1 and DC2 ) and using one extent per datacenter in the SOBR using a bucket in a 3th datacenter. Placing the Object storage solution in a 3th datacenter would give you the ability to recover from a datacenter failure in DC3, while having a full copy of your backup data distributed over the two datacenters DC1 and DC2.

When we distribute the Object storage over multiple datacenters ( e.g. DC2 and DC3 ) with a failure tolerance of 1 DC : I agree that having a full copy in the performance tiers of the SOBR is quite excessive

However when using the object storage solution for Object lock purposes it is costwise a expensive solution regarding the space efficiency of available S3 storage ( 50% usable capacity of raw data ).

I think it is a cost and risk consideration...

any other thoughts ?

Post by **Gostev** » Feb 13, 2021 1:11 am this post

Actually, Object Lock has little to no impact on storage capacity usage...

Post by **lowlander** » Feb 13, 2021 10:43 am this post

Just trying to interpret the whole tiering concept of cloud tier and it corresponding immutability setting, super interresting technology by the way.

The following references are quite interesting to read / view :
https://www.youtube.com/watch?v=9YbZzBtPM7o
https://helpcenter.veeam.com/docs/backu ... ml?ver=100 is a good reference for the concept of block generation.

When you create a job pointing to a SOBR, backups will first be stored in the performance tier. As I understand you should always reserve some working space for applying the actual retention policies on a job. Having this said, when you move or copy backups to the capacity tier, this "cloud tier" is basically based on objects ( blocks ) that are managed by metadata. I think I missed this concept of metadata referring to blocks, and was thinking in additional space needed for transformation ( being wrong ).

However regarding the length of an immutability period I was wondering what the impact is of copy mode on the needed cloud tier capacity. Immutability is defined on the bucket within Veeam. Setting this periode to a larger period than the maximum daily retention in a job, my toughts were that you have to reserve additional storage in the cloud tier. For example :
- assume having a daily backup job with a retention of 30 days, 1 Full and 29 incrementals;
- having a bucket with an immutability of 90 days;
- assume that everyday there is new unique data generated, thus a form of change rate;
> after 30 days within the cloud tier we can't delete blocks while these are protected and flagged as immutable. In this case new data after these 30 days will be written ( I assume ) to new blocks that costs storage in your object storage environment. Lets say 60 x the size of unique data ( incremental data )

I understand now that the orchestration of block index, metadata and blocks is a space efficient mechanism to store data in an efficient way in the cloud tier. I agree that the impact on storage is minimal when using the same immutable period as the max daily retention period of a job. However I don't quite understand that there is no impact on the cloud tier storage capacity when using a longer immutability period ( say 90 days or even 180 days ) regarding to a daily retention configuration in a job with a max of 30 days ( all in the perspective of copy mode ).

Just want to make sure I don't miss the logic

Post by **Gostev** » Feb 13, 2021 11:14 pm this post

This explains our misunderstanding of each other: immutability should never be set higher than the retention policy! Such setup makes no sense because this would make immutability implicitly extend the retention policy, through making its processing physically impossible. So what you're discussing is simply an invalid configuration. If you want your backups to be retained for 90 days, you should set your retention policy to 90 day (and not to 30 days).

My answers to you were assuming "normal" set up your immutability period is equal or less than the retention policy.

I'm curious what are you actually trying to achieve by setting it to 30 days, when you're apparently well aware immutability will extend it to 90 days anyway? Knowing this should help to show you the correct path of achieving what you are trying to achieve.

Post by **lowlander** » Feb 15, 2021 6:38 am this post

My goal is to plan a valid immutable period and not an invalid configuration

As you explain, this can have impact on the "short term retention". Things we want to know what a safe immutable period should be in the end. Reading about ransomware that becomes active after 2 months for example. Having just an immutable period of a few days, would not help to prevent encrypting your data. Having your data immutable for 180 days would impact you storage requirements.

However I can imagine it is difficult to give a good advice on this, I really would like to know what best practices are to start with

R&D Forums

Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Re: Backup job - automatic vs manual offload

Who is online