Host-based backup of VMware vSphere VMs.
DrWhy
Enthusiast
Posts: 38
Liked: 2 times
Joined: May 12, 2015 7:05 pm
Full Name: Caleb
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by DrWhy »

Got it, thanks for taking the time to provide an update. What is the ETA at this time?
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by foggy » 2 people like this post

We typically do not share ETA until we 100% confident the feature is getting into the particular release.
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by Gostev »

Implementation has been completed, and if all is well QC-wise, the feature will be included in 9.5 U2. Just don't ask when U2 is going to be released, because we don't have the timeline defined yet (no pressure to release one in terms of bugs). Sometimes in the spring!
DrWhy
Enthusiast
Posts: 38
Liked: 2 times
Joined: May 12, 2015 7:05 pm
Full Name: Caleb
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by DrWhy »

Best news I've heard all month! Can't wait.
Erhard
Novice
Posts: 7
Liked: never
Joined: Jun 06, 2016 5:57 am
Full Name: Erhard Pütz
Contact:

[MERGED] Failback too slow for temporary failover

Post by Erhard »

Hi,

while failing over is quite straight forward and does not take very long (original VM is synced while running, than powered down, last minute sync and replica powered on), failing back that very same machine takes at least 3 times the time of failover, if not hours or days.

Now imagine I want to maintain the primary Hyper-V host where some or all VMs are running. Well yes, lets perform a failover, install some gigabytes of MS updates and then failback the VMs.

No way! I tried this with some test VM that has got a dynamic hard drive that has got occupied space of 117G and that is 130G large in Windows. That VHD file is 125G large.

While failover took 8 minutes failback took three quarters of an hour. Okay, I placed some call and asked what is going wrong. The answer was, that there is nothing going wrong.

It seems like failback works/worked like this:

- take a snapshot of the running replica
- read the original machine to check for changes
- calculate the differences and throw the result in trash
- copy back the entire replica (130G) anyway
- then write another 130G, because the dynamic VHD contains unpartitioned space of another 130G (no idea where the non-existent data is written to, maybe to some single dummy sector?)
- and so on and so on

Now what I'ld like to have is a fast and straight foward failback like this

- the amount of time when neither replica nor original VM are available must be as short as possible, so please sync it online and if possible based on CBT
- there must be an option whether failback shall power down the replica and power up the original automatically or supervised (I haven't got any mind to stare on some progress bar for eight hours or so)
- if supervised failback is selected, the replica must be kept running until the "do it now - button" is pushed
- when that button is pushed, the productive replica(s) is powered down, the remaining data is synced and the original VM(s) is(are) powered on

I do hope that somebody undestands that there seems to be room for improvement in failback. And please, planned failover and planned failback are enterprise options. While I am not talking about the enterprise edition I would bet several pizzas that the enterprise edition suffers from slow failback as well.

Best regards

Erhard
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by foggy » 1 person likes this post

Hi Erhard, thanks for your feedback. Please get a chance to test it once again once Veeam B&R v9.5 Update 2 is released in April.
kenny782
Novice
Posts: 5
Liked: never
Joined: Feb 10, 2017 4:39 pm
Full Name: Kenneth M.
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by kenny782 »

Just curious if you had any updates? :)
No pun intended lol

Thanks,

Kenny
DrWhy
Enthusiast
Posts: 38
Liked: 2 times
Joined: May 12, 2015 7:05 pm
Full Name: Caleb
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by DrWhy » 1 person likes this post

April isn't over yet :)
jim3cantos
Enthusiast
Posts: 59
Liked: 12 times
Joined: Jan 08, 2013 6:14 pm
Full Name: José Ignacio Martín Jiménez
Location: Madrid, Spain
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by jim3cantos »

Ok. Ather reading the whole thread while testing failback, may be it's time for applying Update 2:

https://www.veeam.com/kb2283
Failback performance improvements. Failback can now optionally use changed block tracking data to determine the changes between the original VM and replica VM state. This dramatically accelerates the failback performance due removing the need to read the entire original VM disks (“Calculating original signature” operation). For VMware hypervisor, we recommend that this option is not used if the failover event was triggered by a disaster that involved host or storage crash or dirty shutdown, as CBT data may be inconsistent in this Case.
...be aware of this issue with update 2 before (or after) updating.
DrWhy
Enthusiast
Posts: 38
Liked: 2 times
Joined: May 12, 2015 7:05 pm
Full Name: Caleb
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by DrWhy »

Good info! thanks Jim. There is a hotfix that fixes the issue, which is good, but it must be obtained by contacting support.
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

Hey guys, I'd like to chime in here, albeit a bit late. Hopefully someone sees this! Please excuse the long post, but I just want to be clear and detailed.

Environment and situation:
Tenant replicating to a cloud connect replication environment (hosted by me)
Test VM: 350GB provisioned, only about 50GB actually used
100 mbit/s pipe between tenant and service provider

I replicated the test VM in a few hours with no issue. Did a partial failover, which worked spectacularly. It only took about 5 minutes from me starting the failover, to me being able to ping the VM in the DR site. To simulate some data change, I downloaded a 2.8GB ISO (and for fun, it was a Veeam ISO) and left it in my downloads folder. I let the VM run for a few hours, just sitting there, not doing anything.

I go to failback, and I was pretty surprised by the result, not in a good way. I did a quick rollback, utilizing CBT and restoring the VM to the original location, however, for 2.8GB worth of changes, my VM was still down for nearly an hour! What I don't understand is that the job log says "Replicating restore point for Hard disk 1 (350.0 GB) 2.6 GB processed", and that part only took 3 minutes, 57 seconds. That 2.6GB lines up closely with the size of the ISO I downloaded.

What I don't understand, is that the next phase, "Replicating changes Hard disk 1 (350.0 GB) 37.6 GB processed" took 43 minutes, all while the VM was powered off. I noticed that it was going with an average speed of 15 - 20 MB/s

Link to a picture of the failback log: https://ibb.co/hyAsy7

My questions are:
1. What in the world is it doing while "Replicating changes Hard disk 1"? I thought a large portion of the changed data was copied during the "Replicating restore point for Hard disk 1" phase. If that isn't the case, what does that phase do then?
2. Why did it take so long for a VM that presumably only had 2.8GB of changed data? I understand logs and what not make changes and take space, but even if we double it, that should have only taken roughly 5 minutes at an average speed of 18 MB/s.

I guess I'm just confused here as to what is happening and why we can't leverage the awesome replication features Veeam has built in to essentially do a few reverse replications as someone here has stated, power off, then do one more quick replication (5-10 minutes tops), then power on.

Failing over is nice and simple, but quite frankly, I'm terrified to use it because of the implications of falling back. I don't want to have to tell my client "We can fail you over, but honestly I have no idea how long you'll be down while we fail you back, and I have no idea when Veeam will actually decide to take you down to finish the failover". If a 50GB VM with roughly 3GB of changed data took that long, what if we have to fail over a client's Exchange server for an extended period of time? I have no idea how long it would take for a 3tb Exchange server that potentially has a week worth of changed data on it and I don't want to find out.

The only solution I can see around this is to build a VPN tunnel between our DR site and our customer's network, then use another Veeam server to replicate the changes from our DR site to the customer site, turn off the replicas during a maintenance window, run another replication to get the changed data, then turn the customer's servers on in the original production environment. It seems like this would give me the flexibility I need to determine exactly when they go down, while also giving me the least amount of downtime. However, I absolutely know my networking guys are going to say "Why didn't we do that in the first place, and why don't we just do the normal replications from the tenant that way as well? And honestly, I don't have an answer for that because it seems to make more sense to do it that way than to deal with the uncertainty and mystery around Veeam's built-in failback process that it seems like we have to utilize for Cloud Connect.

Now, if I've missed something, and there's maybe some slick feature I'm not aware of, or that what I'm experiencing is out of the ordinary after update 2 which gives us the option to use CBT to skip calculating disk digests (we're on U3 by the way), please, let me know. I would love to know about it. Actually, I'm really begging to know about it at this point :) Make me look dumb, I don't care, I just want to know.

If this is normal behavior, then please take this as my feature request to continue development here and us the same mechanism you already have that works wonderfully to replicate hot data over, but for the replication back. If anyone has any better suggestions than my VPN tunnel idea in a cloud connect environment, please chime in.
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

Does anyone else have any thoughts here or am I alone on this? Hoping someone has an idea or observation better than mine :)
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by foggy »

Hi Cory, quick rollback (failover using CBT) is performed in two steps:

1. First Veeam B&R needs to align the state of the original source VM with the state stored in restore point it was failed over to. This phase is reported as "Replicating restore point for Hard disk 1" and took ~3 minutes in your case. Basically, all the changes occurred in the original VM after the restore point was created are rolled back during this phase.
2. Then it needs to sync the changes occurred inside the replica VM while it was running after the failover event back to the original VM. This step is reported as "Replicating changes Hard disk 1" and took the most time during the entire failback operation.

So the amount of changes occurred while the replica was running is 37.6GB, including those 2.8GB from the downloaded ISO.
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

Foggy, thanks for the reply.

I'm afraid to say, I don't think that's the case. It just doesn't seam realistic to me that we can have 37.6GB of changes on a test VM that doesn't really do anything. I'm testing with support right now, and we failed over, then almost immediately failed back using a quick rollback. The virtual machine was up no longer than 3 minutes, and yet it had to process about 21GB of data and took about 15 minutes of downtime. This server isn't a file server, SQL server, Exchange, or anything user facing. I just don't think it is realistic that it really has 20+ GB of changed data in the span of a few short minutes. My case number is 02617103 if you are interested in taking a look.
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by foggy »

Ok, let's see what they can come up with after reviewing the log files.
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

foggy wrote:Ok, let's see what they can come up with after reviewing the log files.
Just an update here. Support seems to be relatively stumped at the performance. They thought that maybe doing a planned failover would result in a quicker failback, however it was just as slow. I believe the case is being escalated.
ChrisGundry
Veteran
Posts: 258
Liked: 40 times
Joined: Aug 26, 2015 2:56 pm
Full Name: Chris Gundry
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by ChrisGundry »

Cory, do you have any progress on your issue? I am seeing a similar issue in a test we are running. We don't have a support case open at the moment, but wondered what progress if any had been made between 02/03/2019 and now?
Thanks!
Layla-shmayla
Lurker
Posts: 1
Liked: never
Joined: Nov 22, 2016 10:04 pm
Full Name: Layla D
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by Layla-shmayla »

Same thing here as well. In testing our DR setup, Failback took 4 hours for a 250 GB disk. Failed over to DR for 10 minutes, saved a text file to desktop, initiated Failback. Dubs tee eff???
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by foggy »

I've checked the Cory's case and unfortunately there was no resolution to the issue, so I recommend both of you to open your own cases for a closer look by our engineers.
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

Hey guys,
Correct, unfortunately there was never a resolution.

I love veeam and all of the fantastic support and capabilities that they have. While Veeam continues to be our go to for backups, we ultimately had to go with another provider for replication, largely due to this issue.

I spent countless hours (on and off business hours) testing and providing logs and information to Veeam (on the case I have already noted, some other similar cases, and personal research and testing time), but I could not come to anything resembling an acceptable failback process and timetable.

I had the opportunity at VeeamOn 2017 to discuss this briefly with Gostev, and it seems like my concerns were noted, but I'm not sure if there's anything happening with this on Veeam's end.

Again, I really hope that there's something I'm missing to make this process a success, but support couldn't really give me an answer.
ChrisGundry
Veteran
Posts: 258
Liked: 40 times
Joined: Aug 26, 2015 2:56 pm
Full Name: Chris Gundry
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by ChrisGundry »

Thanks for the update Cory, even though it is not a positive one :(

I also love Veeam and have been a long standing customer and advocate of Veeam. But recently I have been feeling a lot of my technical/usability issues are not being addressed and Veeam don't seem to care that things like this are not working the way customers want/need them to. Unfortunately at the moment I don't have time to log a case for this replication issue (busy fighting a couple of other Veeam issues!) but I will try and get it logged ASAP.
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

Please keep us updated :) If there's a good solution, I am happy to re-evaluate my VM replication utility!
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

I'd like to know if anyone is still experiencing this or if this has been resolved. Hoping that something has been done since this issue was brought to light a while ago.

Thanks!
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

@Foggy, did anything ever come of this?
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by veremin »

There are certain plans on how to make failback work faster and more predictably, but it's too early to share any details. Thanks!
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

Was it ever determined that there was an actual issue of CBT appearing to not be used, or Veeam attempting to read MUCH more of the VM data than it should? I'm wanting to know if Veeam ever found a root cause.
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by foggy »

Do you mean your particular support case? Are you still experiencing the same behavior? I can see that your case was closed by you without being tracked down to the resolution. While I totally understand your frustration from not getting the resolution for a long time, the fact that there are only a couple of people encountering this behavior makes us think the issue is environment-specific and hence we cannot investigate it without your assistance.
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by YouGotServered »

Foggy,
Support was clueless and the case was stalled. After a good amount of time (I can't remember how long exactly), I had to move to a different replication solution due to this issue. I could recreate the issue in a couple of environments as well.

My frustration has subsided long ago, I'm just wondering if there was ever anything found by Veeam support and it seems like there wasn't :(
ChrisGundry
Veteran
Posts: 258
Liked: 40 times
Joined: Aug 26, 2015 2:56 pm
Full Name: Chris Gundry
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by ChrisGundry »

Last time I checked we were still seeing this behaviour as well. Unfortuantly my experiance with support on issues like this is also that support are 'clueless' as YouGotServered says. This means I don't usually get to logging them because they go no where and just lead to be getting frustrated with support when I don't have time to go round 'the support loop'.

For most things we have moved to an application level solution like Exchange DAGs, DFSR, SQL Always On etc. Whilst more costly, they work more reliably, give more functionality and don't require Veeam.
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: FEATURE REQUEST - Speed Up the Planned Failback Process

Post by foggy »

Cory's case ended up on Tier 1 which means it never got escalated for deeper investigation. Our recommendation is to always ask for an escalation if you feel that the investigation drags out or leads nowhere.
Post Reply

Who is online

Users browsing this forum: No registered users and 75 guests