Digest re-calc after moving VMs between replication jobs

kesenta · Post by **kesenta** » Sep 18, 2012 12:46 am this post

Current scenario (assume valid restore points already existing for each VM at the target site).
Using Veeam 6.1.0.181 for the Backup Server and both Source and Target Proxies.

Replication Job #1 members
- VM01
- VM02
- VM03

Replication Job #2 members
- VM04
- VM05
- VM06

The requirement is to move VM03 from Job #1 to Job #2 due to runtime scheduling and use of a slow WAN link.
After VM03 is removed from Job #1 and added to Job #2 the job successfully executes.

However the job tasks shows a Calculating Digests step for each disk for this VM when it is executed as part of Job #2.
This is a concern as for large VMs (over 1TB vDisks) this step causes a unnecessary delays.
This problem also occurs if a new Replication Job is created and an existing VM that already has a replica is moved to it.

Does anyone know of any workaround(s) to avoid this digest re-calculation step? Thanks.

Post by **foggy** » Sep 18, 2012 8:30 am this post

How long digests are being calculated? When you perform replica mapping, VM digests have to be calculated and transferred to the source proxy. If you have a target proxy deployed on the remote site and it is effectively used for processing, then this proxy should calculate VM digests locally and send them to a source proxy, minimizing the traffic over WAN.

kesenta · Post by **kesenta** » Sep 18, 2012 12:35 pm this post

The data flow and setup is as follows:

------------------------------------------------------------------------
VM03 (1.9TB)
||
Source ESXi Host
||
Backup Server + Local/Source Proxy VM (8 vCPU, 16GB)
(Virtual Appliance using Hot-Add)
||
WAN (20Mbps)
||
Remote/Target Proxy VM (8 vCPU, 8GB)
(Virtual Appliance using Hot-Add)
||
Target ESXi Host
||
VM03_replica (1.9TB)
------------------------------------------------------------------------

Job configuration is set to Optimal for Compression and WAN target for Deduplication.

I don't see any traffic on the WAN link that indicates it is being saturated during the digest calculation step.
So link speed isn't an issue.

Replica mapping is enabled after VM03 is moved from Job #1 to Job #2 (the Detect option was used).

The digest recalculation can potentially take hours but that's not the issue here.
The issue is by simply moving the VM between replication jobs, it causes it to be recalculated even if a replication job was successfully completed just before it was moved.
This shouldn't be the case since the VM was just replicated and therefore has minimal to no data change (even tested with source VM powered off).
If i took the VM out and put it back into the same job, it doesn't do a digest recalculation, as expected.

Why is this the case?
Should I be manipulating something in the database or VM-ID folder for this VM as part of moving the VM between jobs?
Is this by design or a limitation of managing multiple/juggling VMs within replication jobs?

Post by **foggy** » Sep 18, 2012 4:15 pm this post

kesenta wrote:Why is this the case?

Otherwise we just cannot be sure this is the same VM. Digests have to be calculated and compared to the original VM in case of VM mapping and only after that differences between the two VMs can be transferred.

kesenta · Post by **kesenta** » Sep 18, 2012 10:14 pm this post

So it is by design then, which leads me to the next question (from my previous post) of how can we work around it since we already know that the VM in question is definately the same.

Post by **Vitaliy S.** » Sep 19, 2012 9:34 am this post

There are no workarounds, after each VM mapping we have to compare all the blocks to start transferring changed blocks to the VM replica.

kesenta · Post by **kesenta** » Sep 21, 2012 6:15 am this post

hmm, not good enough then...
because if one had offsite replication going for weeks (lets say for 10TB worth of VMs) and then all of sudden something prompted the replication group(s) to be reconfigured, it would need to sit through the whole process of recalculating digests for each and every vdisk for every VM.
this will be painful especially on slow WAN links.

Post by **Vitaliy S.** » Sep 21, 2012 9:27 am this post

Digest recalculation doesn't consume much bandwidth, so WAN link shouldn't be the issue (as you've correctly stated in your second post). Digest calculation speed solely depends on the size of the virtual disks and proxy server location.

Post by **tsightler** » Sep 21, 2012 12:25 pm this post

Nothing is stopping you from creating one job per VM for replication, then this should never really be an issue (or at least very rarely). But as others have properly stated, digests calculation is performed locally by the target proxy, so the bulk of traffic does not cross the WAN.

parneye · Post by **parneye** » Dec 25, 2015 4:55 am this post

even though this is a post from over 3 years ago, I echo kesenta's thoughts...
tsightler: your point, in my opinion, is somewhat valid and I have used it in the past, but the issue here is extra complexity added to management of several things. To name two: Consistency between job settings (proxy settings, compression type, etc..) become more of a burden to manage and keep consistent; there is also an added complexity in service provider systems that parse both successful and failed job emails (hence creation of support tickets if failed email comes through, and also for lack of success email in-case transmission of email is broken).
It would be much better, in my opinion, to have flexibility built within a "job" and also between "jobs".
We manage Veeam for 10+ clients, and two things I've always missed and wished was there:
* the ability to run a job (on demand or by schedule) and select the VMs within the job for that particular run.. (for example, if you just want to do a single VM, or miss a large VM for a particular run or several runs... for different reasons I've come across a need for this, quite a few times).
* (the same kesenta wanted) the ability to move a VM from one replica job to another.. (I don't mean move any of the vSphere side data), in other words the VM is still on Host A, and getting replicated to Host B -- I simply want to do it in a different Job on the same Veeam server for a particular reason..
Above it's mentioned, "Otherwise we just cannot be sure this is the same VM." is either ignorance of the root of the question or the standard company line for that feature is not worth our time. You could be sure that it is the same VM because you could build the feature yourself that would move across the relevant files within the current job folder the the other one, and make any necessary veeam database modifications.
Are any of the above features on the development cards?
Thanks

Post by **foggy** » Dec 25, 2015 10:18 am this post

One of them is already implemented: the ability to run on-demand incremental backup for one or more VMs (though there's no scheduling in it so far).

R&D Forums

Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Re: Digest re-calc after moving VMs between replication jobs

Who is online