General Backup and Replication scenario questions/ideas

1-0-1 · Post by **1-0-1** » Feb 23, 2011 9:26 am this post

Writing this down on the forum to spark some creative/experienced input before I make my final decision. I have 4 ESXi 4.1 servers. Two are in prodution and two are at a remote site for DR purposes. The remote site and production site is linked with a 5Mbps link. There are 16 guests spread across the ESXi hosts. They are anything from 40GB to 900GB.

We have successfully run replication across to the remote site. We started off by basically having two jobs running which where grouped by server operating system. Problem with this is that when one replication fails in a job of 4-6 virtual servers I have to redo a full replication to removable media for all the servers in the job. This can happen quiet frequently as a small volume expansion on one of the virtual servers requires a full re-replication.
My solution to this problem is to create a replication job for each virtual server and have it post job option enabled to kick start the next replication job. Only problem here is if one job fails all the subsequent jobs will fail also, but at least I can bypass one job if I need to iniate a full replication.

VEEAM Backups are going to be a challenge as the corresponding replication jobs run long due to the slow link we have and workload on the respective servers. I would idealy run backup jobs every day but they will not work because the servers will still be locked by the replication job. Therefore I was thinking of doing replication once a week or once a month which poses the danger of replicating large changes over the slow link to the DR site.
It would be great to do backups over the slow link as the incrementals are smaller than the replication ones but backups are missing an option running the initial backup to a removable media and then doing incremental to the remote storage on the DR site.
I have a large amount of SAN storage availebel but do not think I am really utilising it properly due to the backup destinatin restriction imposed by ESXi. I was thinking of attached the storage to the virtualized VEEAM server and then share the folder for the physical backup server (BackupEXEC 12.5) to pickup the VEEAM backup jobs and put it to tape. Other option would be to connect the current backup server to the SAN and assign a LUN to it which it will use to store backups as part of staging area to tape.
Problem still is that I will have to reduce the replication job frequency to a point where I have to ask myself is it still worthwhile running replication. The reason for this is that for instance if I run replication once a month or once a week I have a serious old replica which I need to anyway update from backups before making it viable for the disaster recovery site. So my understanding would be then to just run backups for everything and in case of full disaster recovery testing rebuild the virtual servers from backup (we do not have a specific time window dictated but it should still fall in a reasonable window). In other words - no defined RTO except it should not take days to restore the largest VM (approx 900GB) - either from tape to SAN or from SAN/DISK first.

Post by **Vitaliy S.** » Feb 24, 2011 10:14 am this post

1-0-1 wrote:Problem with this is that when one replication fails in a job of 4-6 virtual servers I have to redo a full replication to removable media for all the servers in the job. This can happen quiet frequently as a small volume expansion on one of the virtual servers requires a full re-replication.

Have you considered using a retry option? Besides, if one of the replication fails, the subsequent job run should fix it for you automatically while doing incremental runs for other VMs.

1-0-1 wrote:My solution to this problem is to create a replication job for each virtual server and have it post job option enabled to kick start the next replication job. Only problem here is if one job fails all the subsequent jobs will fail also, but at least I can bypass one job if I need to iniate a full replication.

Not sure that I am following you here. Why other replication jobs should fail? You can use PowerShell scripts to trigger replication jobs in sequence, moreover we've got a PS script that returns current job status, so you could decide whether to trigger the next job or not.

1-0-1 wrote:VEEAM Backups are going to be a challenge as the corresponding replication jobs run long due to the slow link we have and workload on the respective servers. I would idealy run backup jobs every day but they will not work because the servers will still be locked by the replication job. Therefore I was thinking of doing replication once a week or once a month which poses the danger of replicating large changes over the slow link to the DR site.

If you have a slow WAN link, I would suggest doing backups locally and then sync (with rsync) those files with on offsite storage. In case of any disaster you could use Instant VM Recovery to bring up your VMs backup to production without wasting the time on restoring operations.

On top of that, have you tried any WAN acceleration tools? You may want to take a look at HyperIP, that should give much better performance rates for the replication jobs.

1-0-1 wrote:It would be great to do backups over the slow link as the incrementals are smaller than the replication ones but backups are missing an option running the initial backup to a removable media and then doing incremental to the remote storage on the DR site.

Well... It does have, theoretically

You can choose local destination storage for your backup job, then move those files to an offsite storage and then change destination for the backup job and that's it!

1-0-1 wrote:I was thinking of attached the storage to the virtualized VEEAM server and then share the folder for the physical backup server (BackupEXEC 12.5) to pickup the VEEAM backup jobs and put it to tape. Other option would be to connect the current backup server to the SAN and assign a LUN to it which it will use to store backups as part of staging area to tape.

Both scenarios look good.

1-0-1 wrote:The reason for this is that for instance if I run replication once a month or once a week I have a serious old replica which I need to anyway update from backups before making it viable for the disaster recovery site. So my understanding would be then to just run backups for everything and in case of full disaster recovery testing rebuild the virtual servers from backup (we do not have a specific time window dictated but it should still fall in a reasonable window). In other words - no defined RTO except it should not take days to restore the largest VM (approx 900GB) - either from tape to SAN or from SAN/DISK first.

You may want to replicate large file servers, that do not have frequent changes, for all other VMs I would do backups locally and use rsync to have offline copies of those backup files.

1-0-1 · Post by **1-0-1** » Mar 17, 2011 5:59 am this post

Thanks a lot for the input by the way!

marshall28 · Post by **marshall28** » May 24, 2011 3:17 pm this post

Generally when we are referring to doing backups locally and then rsyncing them offsite are you running rsync natively on the esx hypervisor or within a guest operating system like windows? Also, is the local backup going to a SAN or external hard drive attached to the Primary server which is being mounted within a guest virtual machine? How does veeam process this?

R&D Forums

General Backup and Replication scenario questions/ideas

Re: General Backup and Replication scenario questions/ideas

Re: General Backup and Replication scenario questions/ideas

Re: General Backup and Replication scenario questions/ideas

Who is online