DR orchestration - fail over and fail back possibilities

Baoth · Post by **Baoth** » Mar 06, 2019 11:04 am this post

Hello all

New to the Forum, and to Veeam AO, so please excuse the newbie questions.

So I am working for a client that is interested in using Veeam for Backup and Replication, which will be used in order to recover in a DR situation. I understand that they would like to replace VMware SRM with the Veeam solution, which looks to be using Availability Orchestrator. In regards to Veeam Backup, this is required as the current solution is not suited to the Enterprise.

Being a complete newcomer to this, I can only compare it to SRM, so apologies in advance.

Probably best to give a slight overview of the issue. SRM works a treat for my client, generally speaking. I failed over from PROD to DR without any problem. However, as part of it's reprotection steps, the reverse replication requires a delta checksum is performed on the underlying VMDK as the "new" protected VM now exists in DR with its failover counterpart in PROD. The problem arises when the VM in question is 4 TB - this checksum activity takes around 6 or so hours; no data is being copied across the WAN link, its purely ensuring the VMDK is valid.

What I would like to find out is how Veeam fares in that situation. If fail over is managed by VAO, and we were to perform a planned fail over for DR testing purposes, and then a fail back, what happens from a replication point of view please?

Is it supported to perform a planned migration from PROD to DR, and subsequently back again?

Can you create a recovery plan within VAO that allows specification on the order in which VM's are powered on?

Any help or advice, especially real world experience, would be greatly appreciated.

Thanks all!

Post by **Alec King** » Mar 06, 2019 1:34 pm this post

Hello Baoth and welcome to the Veeam forums!

To answer your main question, the end-to-end process for replica protection in Veeam is -

Failover - will start, power on, and check the already-existing replicas on the DR site
Failback - will transfer the replica states on DR site back to Production site, and again start, power on and check the source VMs

(there are other options, such as Undo failover, Permanent failover, Commit failback etc which add more flexibility. However for your questions let's consider just the main 2 above)

During failback we will compare the disk contents of the replica with the disk contents of original source VM. This is so we can transfer the minimum data across the network to bring the source VM up-to-date with current state.
Therefore to your first question, there is still processing required to reconcile disk states between replica and source. However we do take advantages of various Veeam features such as change-block tracking, replica seeding and so on, during this failover-failback process.
I can't guarantee that the Veeam failback process will be faster, as there are many variables - I really recommend you try this in a POC!

Planned Migration is a feature we are planning for a future VAO update. However it can be performed manually at the moment -

Run the required Veeam replication job (in Veeam Backup console)
Power down the source VMs
Run the replication job again
Run the failover plan in VAO

This ensures that app-aware VSS data is copied (first replication), then consistent state of powered off VM (second replication). Then the VAO plan will automate failover and the checks to recover the VMs on the DR site.
We do plan to automate this completely in a future update, however the somewhat-manual process above works very well.

Regarding the capability to choose the order of VM recovery - VAO is very flexible in this!
You can group VMs, and within the group, execute in sequence or simultaneously.
You can choose the maximum number of simultaneously-processed VMs.
Your plan can have multiple VM groups.
Also every VM can get specific checks as it is recovered to the replica -

Heartbeat check
Network check

..and multiple application checks included out-of-box in VAO, for

SQL
Exchange
IIS
Sharepoint
and more!

Plus the ability to run your own custom scripts.

All the above activity is visible in the VAO UI, and captured in the Reports.
There are comprehensive reports, automatically updated, including Readiness Check (sanity check of plan config, RPOs etc), Plan Testing (in an isolated Veeam DataLab), and of course a full report of any Plan Execution (failover, failback, etc)

Hopefully this has confirmed your interest in VAO! If you've further questions, or would like a demo - or start that POC

- please let us know.
Thanks!

Baoth · Post by **Baoth** » Mar 06, 2019 2:07 pm this post

Hi Alec

Thanks for the quick reply!

Would you be able to tell me any more regarding the disk comparison please, as this is what took me the longest in my current environment. For example, I've calculated the checksum activity working at around 1.2TB per hour. The actual activity of transferring data from site A to B is pretty quick as we have a good pipe in place.

How are the VMDK's replicated with Veeam? Currently, vSphere Replication uses the host network for replication.

I will take your advice and try and get this tested too.

Lets say for example then, that I had a group of VM's and configured the sequence that I wanted them powered on at the DR site. DR was then initiated, the VM's came up in DR. For whatever reason, we were able to move back to the PROD site in a short space of time, and that process was started. Would you have to wait for the disk comparisons to finish before you could move back to PROD, or would that have to complete first?

Hope that makes sense, and thanks again!

Baoth

Post by **Alec King** » Mar 06, 2019 3:37 pm this post

Hi Baoth,

Veeam replicas have a lot of flexibility for data transfer - they can use network-only transfer, and/or hot-add of disks to Proxy servers.
Veeam can use any network connected to the Veeam Backup/Proxy/Orchestrator servers, there is no requirement to use the host network.

The process for failback, including the disk digest calculation and data transfer is described in detail in our Veeam Backup & Replication documentation (replica failback section) - you can also see how this looks when Veeam Orchestrator is managing this process.

Here is a quick summary of the failback processing.

Note that for the first 3 processes, the replica VM is still powered on and available, so although time is required there is no downtime -

Veeam Backup & Replication creates a working failback snapshot on the original VM.
Veeam Backup & Replication calculates the difference between disks of the original VM and disks of the VM replica in the Failover state.
Veeam Backup & Replication transports changed data to the original VM.

The remaining processes involve some downtime, but the digest/transfer operations will be much faster, because most of this work was already done above.

Veeam Backup & Replication powers off the VM replica.
Veeam Backup & Replication creates a failback protective snapshot for the VM replica.
Veeam Backup & Replication calculates the difference between the VM replica and the original VM once again and transports changed data to the original VM.
Veeam Backup & Replication removes the working failback snapshot on the original VM.
Veeam Backup & Replication powers on the original VM on the target host.

Again this whole process is optimised, leveraging Change Block Tracking (CBT), automatically skipping blocks for deleted files, minimising the downtime for VMs, and so on.
And with the VAO flexibility of VM groups and simultaneous processing, this work can be done for multiple VMs in parallel.

Hope that helps!
Alec

Baoth · Post by **Baoth** » Mar 07, 2019 9:45 am this post

Hi Alec

Thanks again. This has helped a lot! I have a lot to get through, and might sign up to one of the demo's too.

But, I am struggling to see why Availability Orchestrator would be a benefit? It looks like you can create a Failover Plan with Replication alone, and although there is a limitation on the amount of VM's that can be powered up at the same time, 10 according to the documentation, you can add more than that to the Plan. It seems that the Plan will continue to power VM's on as when one has finished powering up earlier in the plan.

To be honest, I can't envisage our cluster powering up, say 200 VM's, at the same time anyway. The existing solution runs through a plan too, and holds off the powering up of VM's dependant on available resources within the cluster.

Can you help me understand what the point of Orchestrator is please?

Cheers

Paul
(Baoth)

Post by **Alec King** » Mar 07, 2019 10:59 am this post

Hi Paul,

So far we have focused on the failover + failback mechanism. Veeam Orchestrator drives Veeam Backup & Replication to achieve this at scale. VAO is very flexible in the scheduling and parallel-processing options. However for me that's the engine, and although an engine is very important - it's not the only part of a car

VAO will check and confirm the applications on the virtual machines, as they fail over. As I mentioned below, we have out-of-box checks for major apps (SQL, Exchange, Sharepoint) but with custom scripting you can add your own checks for any app or service.
VAO makes it very easy to test your failover plan. A Veeam DataLab will create custom lab environments, test your plan on-demand or on a schedule, and give a detailed report, proving that your DR solution is ready. There are so many use-cases for DataLabs!
VAO will automatically update the plan configuration. The VM Groups in VAO can be dynamically controlled (e.g. using vSphere tags) - so that the failover plan is automatically updated as you provision/decommission VMs in your environment.
VAO will track all changes to your failover plan, such as added/removed VMs, and plan edits. This will be captured in the Change Log of the Plan Definition report, automatically updated daily.
VAO reports can be extensively customised, to add relevant information such as emergency contacts, DR site managers etc. These reports are very useful for compliance & audit, for example, as well as troubleshooting, and consumption by business stakeholders.
(and I know partners who add their own logo & branding to the VAO report template, which gives a very nicely polished result!)

As you see - once your failover plan is created - there is so much automation in VAO to make that plan self-updating, self-checking, self-testing, and self-documenting. Saving countless hours of error-prone manual labour that can be spent on more useful projects!
And failover is so much more than just powering on the VMs. The sophisticated application checking in VAO ensures a successful failover - where all those distributed apps, databases, and other critical systems, are checked and confirmed during the failover process.

So, although the 'failover engine' is very important - I think the above features are where VAO really adds value.

Hope that helps!

R&D Forums

DR orchestration - fail over and fail back possibilities

Re: DR orchestration - fail over and fail back possibilities

Re: DR orchestration - fail over and fail back possibilities

Re: DR orchestration - fail over and fail back possibilities

Re: DR orchestration - fail over and fail back possibilities

Re: DR orchestration - fail over and fail back possibilities

Who is online