- 
				willrodbard
- Influencer
- Posts: 16
- Liked: never
- Joined: Sep 21, 2009 11:55 am
- Full Name: Will Rodbard
- Contact:
Replication Recommendations?
Hi All.
I wanted to ask the community about this before logging a a support call as I'm sure some of you must have experienced this before as well
We have just moved our DR kit to a new DC, we currently only have a 2Mb link to the DR site and are trying to replicate 980Mb of VMs. We did a first pass replication job locally over a Gbit connection and then moved the kit, thinking, and we have been told, that 2Mb would be enough bandwidth for incrementals.
I have read that we should now be able to do near CDP, however we are now seeing a replication windows of 36Hrs and counting for 1 job. This one job contains all VMs together
Our environment consists of:
vSphere, both sites
approx 20 VMs
Local: NetApp FAS2020
remote: Dell PE R610 with MD1000 SCSI disk shelf
2Mb Leased line - dedicated to DR
So my questions are this:
1: Is it better to have multiple simultaneous replication jobs, i.e. 1 for each VM or one big group consisting of all VMs
2: Are there any best practise documents available?
3: Will moving the pagefile to a non replicated disk make a massive difference? (I know that sounds like a dumb quesiton but with vSphere changed block tracking I'm not sure who much will change in relatively small pagefiles)
Cheers
Will
			
			
									
						
										
						I wanted to ask the community about this before logging a a support call as I'm sure some of you must have experienced this before as well
We have just moved our DR kit to a new DC, we currently only have a 2Mb link to the DR site and are trying to replicate 980Mb of VMs. We did a first pass replication job locally over a Gbit connection and then moved the kit, thinking, and we have been told, that 2Mb would be enough bandwidth for incrementals.
I have read that we should now be able to do near CDP, however we are now seeing a replication windows of 36Hrs and counting for 1 job. This one job contains all VMs together
Our environment consists of:
vSphere, both sites
approx 20 VMs
Local: NetApp FAS2020
remote: Dell PE R610 with MD1000 SCSI disk shelf
2Mb Leased line - dedicated to DR
So my questions are this:
1: Is it better to have multiple simultaneous replication jobs, i.e. 1 for each VM or one big group consisting of all VMs
2: Are there any best practise documents available?
3: Will moving the pagefile to a non replicated disk make a massive difference? (I know that sounds like a dumb quesiton but with vSphere changed block tracking I'm not sure who much will change in relatively small pagefiles)
Cheers
Will
- 
				Vitaliy S.
- VP, Product Management
- Posts: 27700
- Liked: 2909 times
- Joined: Mar 30, 2009 9:13 am
- Full Name: Vitaliy Safarov
- Contact:
Re: Replication Recommendations?
Hello Will,
To be able to use change block tracking feature, please make sure you've met all the requirements for that: Change Block Tracking , with this your incremental runs shouldn't take so much time (you currently have)
Actually, it is better to combine VMs from one template to one job (for ex. Windows machines or Linux to one job) so you could leverage all the benefits of the de-duplication. And yes, moving your pagefile to the disk which you've excluded from the job will make your jobs run quicker.
			
			
									
						
										
						To be able to use change block tracking feature, please make sure you've met all the requirements for that: Change Block Tracking , with this your incremental runs shouldn't take so much time (you currently have)
Actually, it is better to combine VMs from one template to one job (for ex. Windows machines or Linux to one job) so you could leverage all the benefits of the de-duplication. And yes, moving your pagefile to the disk which you've excluded from the job will make your jobs run quicker.
- 
				willrodbard
- Influencer
- Posts: 16
- Liked: never
- Joined: Sep 21, 2009 11:55 am
- Full Name: Will Rodbard
- Contact:
Re: Replication Recommendations?
Hi Vitally S.Vitaliy S. wrote:Hello Will,
To be able to use change block tracking feature, please make sure you've met all the requirements for that: Change Block Tracking , with this your incremental runs shouldn't take so much time (you currently have)
Actually, it is better to combine VMs from one template to one job (for ex. Windows machines or Linux to one job) so you could leverage all the benefits of the de-duplication. And yes, moving your pagefile to the disk which you've excluded from the job will make your jobs run quicker.
Thanks for the rapid response. I was unaware that I would need to edit the individual VM disks and enable them for CBT, I presumed this would have been done automatically once I had upgraded to hardware version 7 (& correct VMware Tools) both oif which were completed a while ago.
As you can see below VEEAM is using CBT, but has a really slow replication speed i.e. 3Mb/s:
Total VM size: 40.57 GB
Processed size: 40.57 GB
Processing rate: 3 MB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 12/12/2009 12:15:15 PM
End time: 12/12/2009 4:01:05 PM
Duration: 3:45:50
I will have another look and let you know about the advanced settings, however I have to power off each VM to enable me to do this, so it might not be a quick response
 
 Cheers
Will
- 
				willrodbard
- Influencer
- Posts: 16
- Liked: never
- Joined: Sep 21, 2009 11:55 am
- Full Name: Will Rodbard
- Contact:
Re: Replication Recommendations?
Hi Vitally,willrodbard wrote:Hi Vitally S.
Thanks for the rapid response. I was unaware that I would need to edit the individual VM disks and enable them for CBT, I presumed this would have been done automatically once I had upgraded to hardware version 7 (& correct VMware Tools) both oif which were completed a while ago.
As you can see below VEEAM is using CBT, but has a really slow replication speed i.e. 3Mb/s:
Total VM size: 40.57 GB
Processed size: 40.57 GB
Processing rate: 3 MB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 12/12/2009 12:15:15 PM
End time: 12/12/2009 4:01:05 PM
Duration: 3:45:50
I will have another look and let you know about the advanced settings, however I have to power off each VM to enable me to do this, so it might not be a quick response
Cheers
Will
A little bit quicker than I thought I would, I have just checked out the config paramaters for the bleow machine and all the settings for CBT are correct, yet if you look at the time that the job took to run it is still taking far too long:
****
7 of 7 files processed
Total VM size: 63.00 GB
Processed size: 63.00 GB
Processing rate: 1 MB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 12/13/2009 9:58:14 AM
End time: 12/13/2009 10:43:05 PM
Duration: 12:44:51
******
ctkEnabled flag is set to TRUE
scsi0:0.ctkEnabled = TRUE
scsi0:1.ctkEnabled = TRU£
How can we hope to get near CDP when it is taking 12:44 to perform 1 replication?
Cheers
Will
- 
				Vitaliy S.
- VP, Product Management
- Posts: 27700
- Liked: 2909 times
- Joined: Mar 30, 2009 9:13 am
- Full Name: Vitaliy Safarov
- Contact:
Re: Replication Recommendations?
Will,
Yes, that is done automatically, however just wanted you re-check that. As for the slow rates you see, have you tried transferring any other file over that WAN link, is the speed slow like the same? Actually, the speed values depend not only on the link that is being used for transfer but also it is dependant on the destination storage as well.
			
			
									
						
										
						Yes, that is done automatically, however just wanted you re-check that. As for the slow rates you see, have you tried transferring any other file over that WAN link, is the speed slow like the same? Actually, the speed values depend not only on the link that is being used for transfer but also it is dependant on the destination storage as well.
- 
				willrodbard
- Influencer
- Posts: 16
- Liked: never
- Joined: Sep 21, 2009 11:55 am
- Full Name: Will Rodbard
- Contact:
Re: Replication Recommendations?
Vitally,
I have just re-run a different/seperate replication job to the same DR site from our primsry site and I get the following results:
(BTW the 2 different results I showed above are individual VMs from the job log, where as the one bloew is for the whole job - 3xVMs)
****
3 of 3 VMs processed (0 failed, 0 warnings)
Total size of VMs to backup: 58.80 GB
Processed size: 58.80 GB
Processing rate: 37 MB/s
Start time: 12/10/2009 1:00:23 AM
End time: 12/10/2009 1:27:16 AM
Duration: 0:26:53
****
The only difference between the 2 replications jobs is the VMs themselves, i.e. different VMs but exactly the same O/S, target and destination storage are also the same and the VMware environment is the same one as used for the other job, strange that the replication rate is so mush greater than the other job?
Am I right in thinking that, as with a backup job, the larger the group the longer it will take to read all the files within that group as VEEAM treats the job as a whoel, i.e. one large file? and if any part of the job corrupts then the whole job is left in a corrupt state?
Cheers
Will.
			
			
									
						
										
						I have just re-run a different/seperate replication job to the same DR site from our primsry site and I get the following results:
(BTW the 2 different results I showed above are individual VMs from the job log, where as the one bloew is for the whole job - 3xVMs)
****
3 of 3 VMs processed (0 failed, 0 warnings)
Total size of VMs to backup: 58.80 GB
Processed size: 58.80 GB
Processing rate: 37 MB/s
Start time: 12/10/2009 1:00:23 AM
End time: 12/10/2009 1:27:16 AM
Duration: 0:26:53
****
The only difference between the 2 replications jobs is the VMs themselves, i.e. different VMs but exactly the same O/S, target and destination storage are also the same and the VMware environment is the same one as used for the other job, strange that the replication rate is so mush greater than the other job?
Am I right in thinking that, as with a backup job, the larger the group the longer it will take to read all the files within that group as VEEAM treats the job as a whoel, i.e. one large file? and if any part of the job corrupts then the whole job is left in a corrupt state?
Cheers
Will.
- 
				Vitaliy S.
- VP, Product Management
- Posts: 27700
- Liked: 2909 times
- Joined: Mar 30, 2009 9:13 am
- Full Name: Vitaliy Safarov
- Contact:
Re: Replication Recommendations?
Will,
Actually, if you have placed many VMs in one job, they will be treated one by one. And if one of your VMs fails to backup, you can specify a retry option for all failed VMs, however if after retries your VM still fails to backup the whole job will also fail.
As for the better speeds you get with different VMs, that could be due to many reasons, for example if your VM is highly transactional it will take much more time to delete the snapshot rather that for ordinary User's VM, so the job duration will be different for those VMs. By the way what VMs are you trying to replicate (Exchange, SQL, DC...anything like that)?
			
			
									
						
										
						Actually, if you have placed many VMs in one job, they will be treated one by one. And if one of your VMs fails to backup, you can specify a retry option for all failed VMs, however if after retries your VM still fails to backup the whole job will also fail.
As for the better speeds you get with different VMs, that could be due to many reasons, for example if your VM is highly transactional it will take much more time to delete the snapshot rather that for ordinary User's VM, so the job duration will be different for those VMs. By the way what VMs are you trying to replicate (Exchange, SQL, DC...anything like that)?
- 
				willrodbard
- Influencer
- Posts: 16
- Liked: never
- Joined: Sep 21, 2009 11:55 am
- Full Name: Will Rodbard
- Contact:
Re: Replication Recommendations?
Hi Vitally,
The last Job that i showed the stats for contains 1 x Exchange, & 2 x Domain Controllers,
To answer your question, we have several SQL boxes, a couple of Exchange servers and then file/print and other random stuff like web/mail proxies etc all ofwhich suffer from really slow performance, all except the 3 VMs in the last job that is
I will post below some comparative stats for 2 different exchange servers we have just to show you the difference in speeds, again these are both hosted on the same storage here, in fact the Exchange server that gets greater speeds is hosted on SATA disk, where as the slower Exchange server is sitting on FC disk. I know things like space oin the LUN etc have an impact on speed/performance, we have no spcae issues at the moment, both LUNs have at least 30% free space. I would have thought the FC disks would give better performance than the SATA disks, even though they are both going through the iSCSI connection, rather than the other way round.
Any way, enough rambling, here's the jobs info from the last job for each.
You will notice that the 2nd Exchange box is listed as failed, I binned thios job as it was over running in to production time and when the snapshot is released it would have caused issues due to the amount of time it would take to bin.
******
Exchange Server 1.
******
10 of 10 files processed
Total VM size: 26.03 GB
Processed size: 26.03 GB
Processing rate: 40 MB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 12/10/2009 1:09:28 AM
End time: 12/10/2009 1:20:40 AM
Duration: 0:11:12
*******
Exchange Server2
*******
5 of 13 files processed
Total VM size: 144.36 GB
Processed size: 8.34 GB
Processing rate: 539 KB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 12/14/2009 5:06:46 AM
End time: 12/14/2009 9:37:04 AM
Duration: 4:30:17
Replicating file "[ExchLUN2] PNSSRV08_Exchange/PNSSRV08_Exchange-flat.vmdk"
Operation has been terminated by user
*******
			
			
									
						
										
						The last Job that i showed the stats for contains 1 x Exchange, & 2 x Domain Controllers,
To answer your question, we have several SQL boxes, a couple of Exchange servers and then file/print and other random stuff like web/mail proxies etc all ofwhich suffer from really slow performance, all except the 3 VMs in the last job that is
I will post below some comparative stats for 2 different exchange servers we have just to show you the difference in speeds, again these are both hosted on the same storage here, in fact the Exchange server that gets greater speeds is hosted on SATA disk, where as the slower Exchange server is sitting on FC disk. I know things like space oin the LUN etc have an impact on speed/performance, we have no spcae issues at the moment, both LUNs have at least 30% free space. I would have thought the FC disks would give better performance than the SATA disks, even though they are both going through the iSCSI connection, rather than the other way round.
Any way, enough rambling, here's the jobs info from the last job for each.
You will notice that the 2nd Exchange box is listed as failed, I binned thios job as it was over running in to production time and when the snapshot is released it would have caused issues due to the amount of time it would take to bin.
******
Exchange Server 1.
******
10 of 10 files processed
Total VM size: 26.03 GB
Processed size: 26.03 GB
Processing rate: 40 MB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 12/10/2009 1:09:28 AM
End time: 12/10/2009 1:20:40 AM
Duration: 0:11:12
*******
Exchange Server2
*******
5 of 13 files processed
Total VM size: 144.36 GB
Processed size: 8.34 GB
Processing rate: 539 KB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 12/14/2009 5:06:46 AM
End time: 12/14/2009 9:37:04 AM
Duration: 4:30:17
Replicating file "[ExchLUN2] PNSSRV08_Exchange/PNSSRV08_Exchange-flat.vmdk"
Operation has been terminated by user
*******
- 
				Gostev
- Chief Product Officer
- Posts: 32761
- Liked: 7971 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Replication Recommendations?
Will, near-CDP replication is not possible during the initial (full) replication, it only works for subsequent replication passes. I think some passes above are full (initial).
The results for incremental Exchange Server 1 replication above (11 min) is normal and expected for replicating Exchange server over 2Mb link. Exchange produces a lot of disk changes due to transaction log activities, plus snapshotting live Exchange VM takes much longer than VM running other applications. You would see a few times faster replication speed (under 5 mins) for non-Exchange VMs.
Also, if you look at VRB size produced after the replication cycle, it will give you an idea of how much data Veeam Backup has replicated over WAN to make the replication happened. Basically, replication cannot go faster than time required to push this amount of data over your 2MB/s (if the link is always at full speed, not occupied by some other activities).
			
			
									
						
										
						The results for incremental Exchange Server 1 replication above (11 min) is normal and expected for replicating Exchange server over 2Mb link. Exchange produces a lot of disk changes due to transaction log activities, plus snapshotting live Exchange VM takes much longer than VM running other applications. You would see a few times faster replication speed (under 5 mins) for non-Exchange VMs.
Also, if you look at VRB size produced after the replication cycle, it will give you an idea of how much data Veeam Backup has replicated over WAN to make the replication happened. Basically, replication cannot go faster than time required to push this amount of data over your 2MB/s (if the link is always at full speed, not occupied by some other activities).
- 
				willrodbard
- Influencer
- Posts: 16
- Liked: never
- Joined: Sep 21, 2009 11:55 am
- Full Name: Will Rodbard
- Contact:
Re: Replication Recommendations?
Hi Anton,
Thanks for the advice about checking the size of the vrb file, i'll go and have a look
and let you know
The job I posted was a subsequent pass and not a 1st time run. I can confirm this as I spent all day friday in our new DR site installing the servers and testing connectivity to these VMs from the office, so I know they were already there.
Is there anything that could have changed to make VEEAM think it was a 1st time pass instead of a subsequent pass? such as changing the IP addresses of the target vCenter/ESX servers in DR? (which we did since we ran the 1st pass in the office. We didn't change any host names though and DNS was changed to reflect the IP change pre-replication run)
Cheers
Will/
			
			
									
						
										
						Thanks for the advice about checking the size of the vrb file, i'll go and have a look
and let you know
The job I posted was a subsequent pass and not a 1st time run. I can confirm this as I spent all day friday in our new DR site installing the servers and testing connectivity to these VMs from the office, so I know they were already there.
Is there anything that could have changed to make VEEAM think it was a 1st time pass instead of a subsequent pass? such as changing the IP addresses of the target vCenter/ESX servers in DR? (which we did since we ran the 1st pass in the office. We didn't change any host names though and DNS was changed to reflect the IP change pre-replication run)
Cheers
Will/
- 
				Gostev
- Chief Product Officer
- Posts: 32761
- Liked: 7971 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Replication Recommendations?
Will, only creating the new job can initiate the full path. Cannot think of any other options at the moment.
			
			
									
						
										
						- 
				tsightler
- VP, Product Management
- Posts: 6040
- Liked: 2867 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Replication Recommendations?
Just to throw some ideas around here, you really need to see how big your changes are (size of VBR).  Just based on the fact that your talking about 20 systems it's hard to come up with a scenario that wouldn't involve at least several GB of changes per pass.  In your first post you list "980Mb of VMs" but I'm assuming you mean 980GB of VM's as 980Mb would not hold a typical OS.  Also, you then appear to show single VM jobs that were 40GB and 63GB so this also backs up that we're talking about 980GB.
So, can you clarify that we're talking 980GB? In my experience a typical VBR file would be generally be at least 1-2% the size of your entire backup. I understand this can vary greatly based on the load of the servers, things like Exchange will be a lot higher, applications servers which write very little data may be much less, but 1-2% seems to be the low end in my experience. So, in other words, assuming 980GB is the correct figure, even 1% change is 9GB of data. Doing the math in my head that would say that 9GB of data would take about 10 hours (2Mb/sec should be right at 1GB/hr).
Basiclaly, you need to see how big you VBR files are, and do the math to see if you can transfer that much data. 2Mb for 20 VM's seems pretty low to me just as a "top of my head" guess.
			
			
									
						
										
						So, can you clarify that we're talking 980GB? In my experience a typical VBR file would be generally be at least 1-2% the size of your entire backup. I understand this can vary greatly based on the load of the servers, things like Exchange will be a lot higher, applications servers which write very little data may be much less, but 1-2% seems to be the low end in my experience. So, in other words, assuming 980GB is the correct figure, even 1% change is 9GB of data. Doing the math in my head that would say that 9GB of data would take about 10 hours (2Mb/sec should be right at 1GB/hr).
Basiclaly, you need to see how big you VBR files are, and do the math to see if you can transfer that much data. 2Mb for 20 VM's seems pretty low to me just as a "top of my head" guess.
- 
				willrodbard
- Influencer
- Posts: 16
- Liked: never
- Joined: Sep 21, 2009 11:55 am
- Full Name: Will Rodbard
- Contact:
Re: Replication Recommendations?
Yeah, well spotted there mate, I did mean 980Gb   
 
I will check out the VBR file and have a look
thanks for your info
Will
p.s. congrats on the MVP status
			
			
									
						
										
						 
 I will check out the VBR file and have a look
thanks for your info
Will
p.s. congrats on the MVP status

Who is online
Users browsing this forum: Amazon [Bot], Baidu [Spider], Bing [Bot] and 41 guests