Failback process has some nonsensical parts?

wisecra · Post by **wisecra** » Dec 05, 2014 12:32 pm this post

I discuss part of this in my previous post ( http://forums.veeam.com/veeam-backup-re ... ml#p127165 )

(this is all in VMware 5 ESXi/vcenter, Veeam B&R 7)

I had to failover all the VMs at one site to a remote site ( - worked awesomely, btw!)

In failingback over to the production site, it seems the failback process goes through each disk serially, one at a time. This is different than the way backups/replications work - they can address more than one disk at a time (assuming proxy resources).

Now that I've tweaked things enough to actually get failback to work on a two drive server it:
+1 Made a snapshot of the production VM
+2 Calculated changes between Drive 1 on the replica and Drive 1 on production
+3 Replicated RP hard drive 1
+4 Calculated changes between Drive 2 on the replica and Drive 2 on production
+5 Replicated RP hard drive 2
+6 Made a new snapshot on the replica VM
+7 Calculated changes between Drive 1 on the production and Drive 1 on replica
+8 Is replicating HD1 from production BACK to the replica???

It's bad enough that it can't replicate in parallel, but *WHY ON EARTH* would it need to replicate BACK to the replica what it just replicated TO production?

AND now 12 hours later on a server with two 60GB drives, it failed on a channelError: ConnectionReset.

That image tag didn't do what I thought it would. Here it is:
https://notes.pmpllp.com/veeamfb.jpg

Post by **foggy** » Dec 05, 2014 1:49 pm this post

wisecra wrote:+7 Calculated changes between Drive 1 on the production and Drive 1 on replica
+8 Is replicating HD1 from production BACK to the replica???

It's bad enough that it can't replicate in parallel, but *WHY ON EARTH* would it need to replicate BACK to the replica what it just replicated TO production?

It is not replicating back to replica, it transfers the changes that occurred inside replica VM since failback process has started. Please review the step-by-step description of the failback process for better understanding.

wisecra · Post by **wisecra** » Dec 05, 2014 2:06 pm this post

Thanks! I get the process, but why is it doing it TWICE for each drive?

I sat there and watched it calculate for Drive 1, then replicate Drive 1, then calculate Drive 2 and replicate Drive 2.

THEN it created a new snapshot on the replica and started over, calculating Drive 1, then replicating Drive 1.... then it failed. 8-(

Post by **foggy** » Dec 05, 2014 2:11 pm this post

Look at the steps 3 and 7 of the failback procedure. First it needs to transfer changes occurred while replica was running in Failover state, then (the final sync) the changes occurred during failback itself.

wisecra · Post by **wisecra** » Dec 05, 2014 2:21 pm this post

I guess I should be more clear about this. I shut down the replica VM before doing the failback... is it really safe to keep it running during this process?

wisecra · Post by **wisecra** » Dec 05, 2014 2:22 pm this post

and my other question is that since I shut the replica down... HOW can if find anything to replicate in the final sync? Since the replica was off, there should be no changes between the first and second sync.

Post by **Vitaliy S.** » Dec 06, 2014 5:39 pm this post

Let me chime in to your discussion - I don't believe that replica power state is tracked during failback process, meaning that final sync procedure would still take place.

The usual case is that you're running the required VM all the time and do not initiate a failback process on the powered off VM, but I see what you're pointing to. How much time does the final sync take? I assume it should not be long, since there are no changes made to the replicated VM, right?

wisecra · Post by **wisecra** » Dec 06, 2014 7:11 pm this post

Here's another example which I'm running right now.
1vm
Disk 1 20GB
Disk 2 40mB
Disk 3 300GB
DIsk 4 100GB

First calc and sync: 6.5 hours
Second sync: Welll, it's 22% through Disk 3 and it's been 8.5 hours.
Remember, it doesn't sync in parallel, either.
Also, that's 6.5 hours to do the first calc and sync, and 8.5 hours on the second sync ALONE, this is not total time.
Total time is 6.5+8.5 = 15 hours SO FAR.

My users are quite unhappy with me for their server being slow (working from the remote site) and it being down for a VERY long time as I try and fail it back.

Post by **foggy** » Dec 06, 2014 8:34 pm this post

What operations take most of the time? Calculation or transfer itself? Do you have proxies deployed on both ends? What transport mode is being used on them? What kind of connection do you have between locations?

wisecra · Post by **wisecra** » Dec 06, 2014 10:23 pm this post

General info:
-Both hosts ESXi 5, SATA attached local storage (RAID5)
-VMFS 5 datastores, one for each ESXi server
-have proxy servers at both ends

Here's the current details:

Code: Select all

12/5/2014 6:34:13 PM          Failback started at 12/5/2014 6:34:13 PM
12/5/2014 6:35:05 PM          Queued for processing at 12/5/2014 6:35:05 PM
12/5/2014 6:35:38 PM          Preparing next VM for processing                         0:00:30
12/5/2014 6:35:33 PM          Required backup infrastructure resources have been assigned
12/5/2014 6:35:38 PM          Using source proxy 'VMware Backup Proxy' [nbd]
12/5/2014 6:35:39 PM          Using target proxy 'server7' [nbd]
12/5/2014 6:35:48 PM          Preparing original VM
12/5/2014 6:36:13 PM          Creating working snapshot on original VM                 0:00:21
12/5/2014 6:41:32 PM          Calculating original signature Hard disk 1 (20.0 GB)     0:04:56
12/5/2014 6:50:48 PM          Replicating RP Hard disk 1 (20.0 GB) 1.0 GB processed    0:09:14
12/5/2014 6:51:00 PM          Calculating original signature Hard disk 2 (40.2 MB)     0:00:06
12/5/2014 6:51:30 PM          Replicating RP Hard disk 2 (40.2 MB) 40.2 MB processed   0:00:29
12/5/2014 7:57:41 PM          Calculating original signature Hard disk 3 (300.0 GB)    1:05:53
12/6/2014 12:48:48 AM          Replicating RP Hard disk 3 (300.0 GB) 20.0 GB processed 4:51:02
12/6/2014 12:49:19 AM          Calculating original signature Hard disk 4 (100.0 GB)   0:00:19
12/6/2014 4:22:00 AM          Replicating RP Hard disk 4 (100.0 GB) 95.3 GB processed  3:32:40
12/6/2014 4:22:34 AM          Creating replica restore point                           0:00:29
12/6/2014 6:55:59 AM          Replicating changes Hard disk 1 (20.0 GB) 15.5 GB processed 2:33:22
12/6/2014 6:56:26 AM          Replicating changes Hard disk 2 (40.2 MB) 2.0 MB processed 0:00:08
12/6/2014 4:17:43 PM          Replicating changes Hard disk 3 (300.0 GB) 31% at 1 MB/s 9:26:17+

... and counting

wisecra · Post by **wisecra** » Dec 06, 2014 11:11 pm this post

Both proxies are VMs, set to Auto mode they've picked NBD (as you can see).

Connection is VPN over 10MB internet connection. (actually 20MB at primary site end, 10MB at backup site end).

wisecra · Post by **wisecra** » Dec 07, 2014 9:20 am this post

*clear - Don't think this is a technical issue with veeam - clear*

Fact 1-
I've been trying to failback for over a week, it's just not working. It looks like our VPN/internet connection is solid enough to allow backup replication and failover, but the serial failback (one disk at a time) process with TWO replications is just too much for it. I've been tweaking for a week and while I'm getting closer to success, I'm just out of time (see Fact 2):

Fact 2-
I will be scalped if I don't get these servers back up at the primary site.

Question -
For each live (failed over) VM at the back up site, I should be able to:
- Remove all snapshots,
- Copy the resulting VMDKs to a USB drive,
- Transport the USB drive to the primary site,
- Attach the transported VMDKs to the primary VMs, and;
- Crank up my servers.

Am I missing anything?

wisecra · Post by **wisecra** » Dec 07, 2014 12:52 pm this post

I understand merging these threads, but can someone confirm that this will work before I try it?
Thx.

Post by **Gostev** » Dec 07, 2014 3:15 pm this post

Sure, I don't see any reason why this would not work.

wisecra · Post by **wisecra** » Dec 07, 2014 7:05 pm this post

Thanks... Seems like it should (we'll see!).

Grumpy I may be, but I do trust your perspective and advice.
cpw...

wisecra · Post by **wisecra** » Dec 07, 2014 8:25 pm this post

It does look like removing those snapshots is not a short process. I should have started it earlier instead of waiting for your (helpful) reply.
*sigh* At least you can smell paint while watching it dry.

Dec 07, 2014 8:59 pm

wisecra wrote:Connection is VPN over 10MB internet connection. (actually 20MB at primary site end, 10MB at backup site end).

The processing speed reported in the provided session log is consistent with what your VPN connection allows. Even if failback could process multiple disks in parallel, your connection would still be the bottleneck and would not allow for faster failback operation.

Post by **Gostev** » Dec 07, 2014 10:41 pm this post

wisecra wrote:It does look like removing those snapshots is not a short process. I should have started it earlier instead of waiting for your (helpful) reply.
*sigh* At least you can smell paint while watching it dry.

If the storage your replicas are running on is not very fast in terms of IOPS capacity, then it might be faster to shut down VMs and back them up with VeeamZIP, then transport a backup file and perform a restore. Consider sequential I/O of full backup vs. random I/O of snapshot commit... and there are multiple snapshots on those replicas to commit (one for each restore point, VMDK is actually the oldest).

wisecra · Post by **wisecra** » Dec 07, 2014 11:59 pm this post

That sounds like a good idea, except I've already told VMware to remove all the snapshots... so I think I'm committed.

One of them finished, so I'm waiting on the other two.

R&D Forums

Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

[MERGED] Advice, please: Failback is failing

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Re: Failback process has some nonsensical parts?

Who is online