Backing up directly to AWS - my approach - opinions please..

cac · Post by **cac** » Feb 13, 2018 3:51 pm this post

Hello All,
I've been using Veeam for almost 2 years now backing up to an Exagrid appliance and overall has worked well for us.
Our retention period is relatively short (less than 1 month), and we recently ran low on space on our Exagrid appliance.
I was exploring backing up to AWS directly and came up with what I think is a cost effective solution for our company (it's a hybrid approach - we're not getting rid of Exagrid anytime soon).

We are an AWS customer and I am utilizing the AWS storage gateway with cached volumes. This approach presents iSCSI disks to the VEEAM server, and within VEEAM, I'm able to add a repository and point it directly to the "local" (AWS) disk. I've backed up about 10TB of data without issues. I've also done test restores without issue. The bottlneck here is definitely our bandwidth, but so far it has worked very well for the past month.
We also utilize Nimble snapshots for short term recovery in the event we need to restore something very quickly (we keep about 5 days or so).

I understand there are negatives here, specifically:
Backup/restore process dependent upon Bandwidth speeds
Restores are slower for very large VMs
As data grows, the AWS cost will increase (currently, it's around $7-$10 per day to backup incremental data)
Data isn't deduped or compressed like on the Exagrid (at least to my knowledge, unless Veeam does some of this on it's own?)

Positives:
I don't have to physically buy/rack/stack another appliance.
Adding space is relatively simple if we run out.
Up front cost is very minimal compared to shelling out tens of thousands for new appliance (which is what it would cost in our case for a new appliance).
Drives in AWS can be snapshot and saved for further protection.

As of now, we will continue with this approach to see how it works over the next year. 1 month is too short to draw a conclusion, but so far, it's working surprisingly well.
What are your opinions about this approach from a VEEAM functionality standpoint? (Not in terms of having an opinion on whether you like cloud or not). I do want this to be successful but also want to expose any other negatives or possible disasters with this approach.

Thanks for your opinions and time.

Post by **tsightler** » Feb 13, 2018 6:33 pm this post

The #1 thing to be concerned about is the job mode you are using and how that will impact the cache requirements as well as egress costs for S3. For example, if you attempt to run synthetic full backups, or you run any health checks, the gateway will likely be required to retrieve a significant amount of data from S3 and this can quickly run up the costs. I've seen customers that were expecting a $500 storage bill instead get a $5000 bill for storage + bandwidth, so monitor that very closely.

Also, really long backup chains have the potential to be a problem if the cache is not big, that's because, during job startup, Veeam reads metadata from all of the backup files, and this is likely to not be in the cache, meaning more get requests and more bandwidth charges. Performance can also be impacted since the latency for retrieving that metadata can be significant.

You can somewhat architect around these problems by using backup modes that don't perform synthetics (active full or active full copies) assuming you have the bandwidth for this. Or, if you're change rate is small and the cache is fairly large, you may be able to live with modes like forever forward, or format those iSCSI volumes with ReFS and leverage block clone in Veeam, which saves the read/write cycles.

The worst performance usually happens once you've largely filled the volume. For example, if you have a a 20TB volume that you've just created, writing fresh data is easy. That because the AWS gateway appliance knows that the blocks are empty, so it doesn't need to retrieve a block to perform a partial write. However, as more data is written to the volume, and older files are deleted, the AWS gateway is unaware of this, because it only sees the block devices), when Veeam writes new files areas of the disk where old data exist, and those blocks are not in cache, the gateway has to retrieve the block from S3, into the cache, before it can update it. This leads to performance getting much worse over time, as well as the amount of data that is retrieved (i.e. back to the bandwidth costs issue).

Admittedly, at small scale (i.e. 10TB) these issues might not be so bad, especially if you have a big cache, but just keep an eye out for them. Like I said above, I've had more than a few customers that thought they were OK after a few months only to hit a wall a few months further down and suddenly start getting bills 3-10x higher. This is not unique to the AWS gateway, I've seen this with other gateways as well.

cac · Post by **cac** » Feb 14, 2018 3:24 pm this post

excellent information, thank you!
I don't anticipate the backups growing that much over the next year.
I do have the AWS backups set up to do 1 full backup every 3 months, with the rest being incremental.
Our cache volumes are fairly large (2TB) and so far I have not seen them negatively impacted by the gateway.

vishalgupta · Post by **vishalgupta** » Mar 17, 2018 4:20 pm this post

Hi Tom,

I have a similar situation with a customer wherein they want their monthly fulls to go directly to AWS(Glacier preferably) and the size of the source data is 80TB(without data reduction); And they want to keep 7 years as retention for these monthly fulls. What do you suggest, would it be even feasible/practical to send monthly fulls to AWS directly with such a high retention?

R&D Forums

Backing up directly to AWS - my approach - opinions please..

Re: Backing up directly to AWS - my approach - opinions plea

Re: Backing up directly to AWS - my approach - opinions plea

Re: Backing up directly to AWS - my approach - opinions plea

Who is online