FEATURE REQUEST - Speed Up the Planned Failback Process

Discussions specific to VMware vSphere hypervisor

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby DrWhy » Mon Jan 23, 2017 4:36 pm

Got it, thanks for taking the time to provide an update. What is the ETA at this time?
DrWhy
Enthusiast
 
Posts: 38
Liked: 2 times
Joined: Tue May 12, 2015 7:05 pm
Full Name: Caleb

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby foggy » Mon Jan 23, 2017 5:35 pm 2 people like this post

We typically do not share ETA until we 100% confident the feature is getting into the particular release.
foggy
Veeam Software
 
Posts: 16225
Liked: 1296 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby Gostev » Tue Feb 14, 2017 1:44 am

Implementation has been completed, and if all is well QC-wise, the feature will be included in 9.5 U2. Just don't ask when U2 is going to be released, because we don't have the timeline defined yet (no pressure to release one in terms of bugs). Sometimes in the spring!
Gostev
Veeam Software
 
Posts: 22172
Liked: 2610 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby DrWhy » Tue Feb 14, 2017 4:47 pm

Best news I've heard all month! Can't wait.
DrWhy
Enthusiast
 
Posts: 38
Liked: 2 times
Joined: Tue May 12, 2015 7:05 pm
Full Name: Caleb

[MERGED] Failback too slow for temporary failover

Veeam Logoby Erhard » Tue Mar 21, 2017 2:00 pm

Hi,

while failing over is quite straight forward and does not take very long (original VM is synced while running, than powered down, last minute sync and replica powered on), failing back that very same machine takes at least 3 times the time of failover, if not hours or days.

Now imagine I want to maintain the primary Hyper-V host where some or all VMs are running. Well yes, lets perform a failover, install some gigabytes of MS updates and then failback the VMs.

No way! I tried this with some test VM that has got a dynamic hard drive that has got occupied space of 117G and that is 130G large in Windows. That VHD file is 125G large.

While failover took 8 minutes failback took three quarters of an hour. Okay, I placed some call and asked what is going wrong. The answer was, that there is nothing going wrong.

It seems like failback works/worked like this:

- take a snapshot of the running replica
- read the original machine to check for changes
- calculate the differences and throw the result in trash
- copy back the entire replica (130G) anyway
- then write another 130G, because the dynamic VHD contains unpartitioned space of another 130G (no idea where the non-existent data is written to, maybe to some single dummy sector?)
- and so on and so on

Now what I'ld like to have is a fast and straight foward failback like this

- the amount of time when neither replica nor original VM are available must be as short as possible, so please sync it online and if possible based on CBT
- there must be an option whether failback shall power down the replica and power up the original automatically or supervised (I haven't got any mind to stare on some progress bar for eight hours or so)
- if supervised failback is selected, the replica must be kept running until the "do it now - button" is pushed
- when that button is pushed, the productive replica(s) is powered down, the remaining data is synced and the original VM(s) is(are) powered on

I do hope that somebody undestands that there seems to be room for improvement in failback. And please, planned failover and planned failback are enterprise options. While I am not talking about the enterprise edition I would bet several pizzas that the enterprise edition suffers from slow failback as well.

Best regards

Erhard
Erhard
Novice
 
Posts: 6
Liked: never
Joined: Mon Jun 06, 2016 5:57 am
Full Name: Erhard Pütz

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby foggy » Tue Mar 21, 2017 4:02 pm 1 person likes this post

Hi Erhard, thanks for your feedback. Please get a chance to test it once again once Veeam B&R v9.5 Update 2 is released in April.
foggy
Veeam Software
 
Posts: 16225
Liked: 1296 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby kenny782 » Mon Apr 24, 2017 4:07 pm

Just curious if you had any updates? :)
No pun intended lol

Thanks,

Kenny
kenny782
Novice
 
Posts: 5
Liked: never
Joined: Fri Feb 10, 2017 4:39 pm
Full Name: Kenneth M.

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby DrWhy » Mon Apr 24, 2017 4:08 pm 1 person likes this post

April isn't over yet :)
DrWhy
Enthusiast
 
Posts: 38
Liked: 2 times
Joined: Tue May 12, 2015 7:05 pm
Full Name: Caleb

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby jim3cantos » Wed Jun 07, 2017 8:05 am

Ok. Ather reading the whole thread while testing failback, may be it's time for applying Update 2:

https://www.veeam.com/kb2283
Failback performance improvements. Failback can now optionally use changed block tracking data to determine the changes between the original VM and replica VM state. This dramatically accelerates the failback performance due removing the need to read the entire original VM disks (“Calculating original signature” operation). For VMware hypervisor, we recommend that this option is not used if the failover event was triggered by a disaster that involved host or storage crash or dirty shutdown, as CBT data may be inconsistent in this Case.

...be aware of this issue with update 2 before (or after) updating.
jim3cantos
Enthusiast
 
Posts: 44
Liked: 6 times
Joined: Tue Jan 08, 2013 6:14 pm
Location: Madrid, Spain
Full Name: José Ignacio Martín Jiménez

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby DrWhy » Wed Jun 07, 2017 3:21 pm

Good info! thanks Jim. There is a hotfix that fixes the issue, which is good, but it must be obtained by contacting support.
DrWhy
Enthusiast
 
Posts: 38
Liked: 2 times
Joined: Tue May 12, 2015 7:05 pm
Full Name: Caleb

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby YouGotServered » Fri Feb 16, 2018 5:19 am

Hey guys, I'd like to chime in here, albeit a bit late. Hopefully someone sees this! Please excuse the long post, but I just want to be clear and detailed.

Environment and situation:
Tenant replicating to a cloud connect replication environment (hosted by me)
Test VM: 350GB provisioned, only about 50GB actually used
100 mbit/s pipe between tenant and service provider

I replicated the test VM in a few hours with no issue. Did a partial failover, which worked spectacularly. It only took about 5 minutes from me starting the failover, to me being able to ping the VM in the DR site. To simulate some data change, I downloaded a 2.8GB ISO (and for fun, it was a Veeam ISO) and left it in my downloads folder. I let the VM run for a few hours, just sitting there, not doing anything.

I go to failback, and I was pretty surprised by the result, not in a good way. I did a quick rollback, utilizing CBT and restoring the VM to the original location, however, for 2.8GB worth of changes, my VM was still down for nearly an hour! What I don't understand is that the job log says "Replicating restore point for Hard disk 1 (350.0 GB) 2.6 GB processed", and that part only took 3 minutes, 57 seconds. That 2.6GB lines up closely with the size of the ISO I downloaded.

What I don't understand, is that the next phase, "Replicating changes Hard disk 1 (350.0 GB) 37.6 GB processed" took 43 minutes, all while the VM was powered off. I noticed that it was going with an average speed of 15 - 20 MB/s

Link to a picture of the failback log: https://ibb.co/hyAsy7

My questions are:
1. What in the world is it doing while "Replicating changes Hard disk 1"? I thought a large portion of the changed data was copied during the "Replicating restore point for Hard disk 1" phase. If that isn't the case, what does that phase do then?
2. Why did it take so long for a VM that presumably only had 2.8GB of changed data? I understand logs and what not make changes and take space, but even if we double it, that should have only taken roughly 5 minutes at an average speed of 18 MB/s.

I guess I'm just confused here as to what is happening and why we can't leverage the awesome replication features Veeam has built in to essentially do a few reverse replications as someone here has stated, power off, then do one more quick replication (5-10 minutes tops), then power on.

Failing over is nice and simple, but quite frankly, I'm terrified to use it because of the implications of falling back. I don't want to have to tell my client "We can fail you over, but honestly I have no idea how long you'll be down while we fail you back, and I have no idea when Veeam will actually decide to take you down to finish the failover". If a 50GB VM with roughly 3GB of changed data took that long, what if we have to fail over a client's Exchange server for an extended period of time? I have no idea how long it would take for a 3tb Exchange server that potentially has a week worth of changed data on it and I don't want to find out.

The only solution I can see around this is to build a VPN tunnel between our DR site and our customer's network, then use another Veeam server to replicate the changes from our DR site to the customer site, turn off the replicas during a maintenance window, run another replication to get the changed data, then turn the customer's servers on in the original production environment. It seems like this would give me the flexibility I need to determine exactly when they go down, while also giving me the least amount of downtime. However, I absolutely know my networking guys are going to say "Why didn't we do that in the first place, and why don't we just do the normal replications from the tenant that way as well? And honestly, I don't have an answer for that because it seems to make more sense to do it that way than to deal with the uncertainty and mystery around Veeam's built-in failback process that it seems like we have to utilize for Cloud Connect.

Now, if I've missed something, and there's maybe some slick feature I'm not aware of, or that what I'm experiencing is out of the ordinary after update 2 which gives us the option to use CBT to skip calculating disk digests (we're on U3 by the way), please, let me know. I would love to know about it. Actually, I'm really begging to know about it at this point :) Make me look dumb, I don't care, I just want to know.

If this is normal behavior, then please take this as my feature request to continue development here and us the same mechanism you already have that works wonderfully to replicate hot data over, but for the replication back. If anyone has any better suggestions than my VPN tunnel idea in a cloud connect environment, please chime in.
YouGotServered
Service Provider
 
Posts: 30
Liked: 6 times
Joined: Fri Mar 11, 2016 7:41 pm
Full Name: Cory Wallace

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby YouGotServered » Tue Feb 20, 2018 3:10 am

Does anyone else have any thoughts here or am I alone on this? Hoping someone has an idea or observation better than mine :)
YouGotServered
Service Provider
 
Posts: 30
Liked: 6 times
Joined: Fri Mar 11, 2016 7:41 pm
Full Name: Cory Wallace

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby foggy » Tue Feb 20, 2018 5:17 pm

Hi Cory, quick rollback (failover using CBT) is performed in two steps:

1. First Veeam B&R needs to align the state of the original source VM with the state stored in restore point it was failed over to. This phase is reported as "Replicating restore point for Hard disk 1" and took ~3 minutes in your case. Basically, all the changes occurred in the original VM after the restore point was created are rolled back during this phase.
2. Then it needs to sync the changes occurred inside the replica VM while it was running after the failover event back to the original VM. This step is reported as "Replicating changes Hard disk 1" and took the most time during the entire failback operation.

So the amount of changes occurred while the replica was running is 37.6GB, including those 2.8GB from the downloaded ISO.
foggy
Veeam Software
 
Posts: 16225
Liked: 1296 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby YouGotServered » Tue Feb 20, 2018 8:48 pm

Foggy, thanks for the reply.

I'm afraid to say, I don't think that's the case. It just doesn't seam realistic to me that we can have 37.6GB of changes on a test VM that doesn't really do anything. I'm testing with support right now, and we failed over, then almost immediately failed back using a quick rollback. The virtual machine was up no longer than 3 minutes, and yet it had to process about 21GB of data and took about 15 minutes of downtime. This server isn't a file server, SQL server, Exchange, or anything user facing. I just don't think it is realistic that it really has 20+ GB of changed data in the span of a few short minutes. My case number is 02617103 if you are interested in taking a look.
YouGotServered
Service Provider
 
Posts: 30
Liked: 6 times
Joined: Fri Mar 11, 2016 7:41 pm
Full Name: Cory Wallace

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Veeam Logoby foggy » Wed Feb 21, 2018 2:02 pm

Ok, let's see what they can come up with after reviewing the log files.
foggy
Veeam Software
 
Posts: 16225
Liked: 1296 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

PreviousNext

Return to VMware vSphere



Who is online

Users browsing this forum: subglo and 34 guests