High availability of SOBR #Case 05112528

Post by **makacmar** » Nov 05, 2021 9:48 am this post

Hello,
backup of transaction logs for ms-sql and oracle is not supported in case that VM could be backed up by 2 jobs

jobA is backing up VM in DC A and physical repository is in location A (sobrA)
jobB is backing up VM in DC B and physical repository is in location B (sobrB)
so for normal VM is great HA, but not for oracle and sql
if switch is done, that transaction logs in child job are not done, because is warning in parent jobB - Echo "Transaction logs were not backed up, because JobA is still active"
if this is not supported, then solution has to come on repository level with sobr

for high availability of sobrs a suggest solutions:
1) create button like failover inside of sobr (like button Seal, Maintenance)
2) create group of sobrs with policy like
a) failover
b) round robin (one task to first sobr, second task to second sobr, 3rd to first sobr, …)
c) performance (checking cpu, memory a based on that input, put to correct sobr with less load),
d) priority (sending to sobr1 and in case is fully occupied, it will put into sobr2)
3) put option into job –> storage – "failover" where to choose sobr which will be used in case that all extents in original sobr are offline

Advanced option
4) in case of using copy job continue with chain of backups from copy job on second side (using sobr from copy job as failover sobr - option3)

Or please fix issue with transaction logs supporting in case of switching between 2 jobs

in case that VM is moving from side A to B and jobB should start with backups - like option 4 with prerequisite of copy job

Post by **HannesK** » Nov 05, 2021 11:31 am this post

Hello,
just to clarify: you have a a stretched cluster / metro-cluster from a storage perspective. you do backup of the SQL / Oracle VM to both sides with two backup jobs to be able to continue backup in case of one site goes down (because it's impossible today, to map backup copy job backups to a backup job) . is my understanding correct?

The second goal you have is having the transaction logs in both locations, correct?

Your suggestion 4 is probably the most likely thing that could be done. That would mean one backup job to SOBR A and one backup copy job from SOBR A to SOBR B. I don't have a time estimation on that, but with the background improvements we plan, that could be doable in future releases.

Transaction logs can only be backed up by one backup job, correct. But a second backup job can still back up the same VM with "perform copy only" mode. In a DR situation, the second backup job could be enabled for processing transaction logs. But yes, the transaction logs from the lost site are missing in that case.

Chances that we add redundancy of SOBRs into the software are low. That would add a massive amount of complexity and performance implication to the software while redundancy could also be solved at storage level depending on the operating system / hardware that is used. For example with object storage, it's possible to have geo-redundant storage. With V12, we plan to write directly to object storage. That means, one backup job to an object storage would result in full geo-redundancy and failover has no impact on the backups.

Best regards,
Hannes

PS: please keep in mind that support is there to fix broken things (not for design questions). your SE / SA or the forums are available for feature requests or design questions.

Post by **makacmar** » Nov 05, 2021 1:37 pm this post

Hello Hannes,
we have stretched cluster
jobA, jobB, but VM is backed up only in one job at the same time, it depends, where VM is located
VM is backed up by jobA, in case location A is down, VM is moved to location B and backup by job B
if repository are physical servers without high availability
how to solve such situation?
I suggest only what will be appreciated if policy for SOBR will exist as I don't see solution in veeam11

Post by **Gostev** » Nov 05, 2021 4:07 pm this post

Conceptually, SOBR was designed with the following paradigm in mind:
Performance Tier carries a role of fast landing zone and "local cache" of most recent backups. As any cache, it can be re-built if lost (from Capacity Tier).
Capacity Tier provides redundancy (including off-site) via Copy mode, eliminating the need to make Performance Tier extents highly available.

Post by **makacmar** » Nov 08, 2021 9:31 am this post

Hello Gostev,
but this gives you availability of backup images, but not to provide backups of logs from child jobs.
so in this case, backup will fail

Post by **Gostev** » Nov 08, 2021 12:21 pm this post

Why would backup fail? Unless there's some bug I'm not aware of, log backup should continue to another performance tier extent.

Post by **HannesK** » Nov 09, 2021 8:14 am this post

Hello,
Is there a capacity tier involved in that setup (I cannot find that in the initial post)? Anton, correct my if I'm wrong... are you suggesting the following design?

- only one backup job that continues to work after a failover
- only one SOBR spanning both datacenters (currently there is one SOBR per datacenter. I assume to keep backup traffic local)
- geo-redundant capacity tier in copy mode
- if SOBR extents in one DC fail, then there are two options:
1) download everything from capacity tier and continue
2) automatically perform active-full (if configured)

Marcel: if a datacenter is lost, then JobA will also be deactivated (or at least the VM excluded), because there is nothing to back up anymore on that site. Correct? So JobB would work fine on the surviving site.

Are you using backup copy jobs today to ensure the backups are stored on both sites? And you don't have enough free space to just run another active full (JobB) for the whole data center? If yes, then it brings me back to my answer that option 4 might be a solution in future versions (no time estimation possible today).

if repository are physical servers without high availability

I heard from customers that solve that with Storage Replica on Windows and I remember DRBD can do something similar on Linux (probably there are alternatives on Linux)

Best regards,
Hannes

wku · Nov 09, 2021 11:46 am

Hi @makacmar, can you please clarify which "failover" scenario are you caring for here?

a) Site A database server goes down, but you DO have backups in Site A repository, including transaction logs.
You restore from Site A backup repo to Site B database server, start database on Site B.
You disable backup job A, enable backup job B (from Site B server to Site B repo) and you get errors because Job A was holding transaction logs until now.
In that case just tell Veeam to immediately allow "takeover" of transaction log backup by new job by setting the timer registry keys to zero: https://www.veeam.com/kb2029
(I recommend changing the timers back to at least 1-2 days if not standard 7 after such amount of days has passed, to reenable the safety against accidental start of both jobs in parallel.)

b) Site A goes fully down and you have no backups to restore from.
Site B database server is some kind of standby/replica.
You want to have transaction logs from Site A available also in Site B to replay during failover.

@Gostev and @HannesK focus hard on ways to provide you with backup repository redundancy, because your original post is focused on dream-wishes about redesign of SOBR system.
But it appears that you have the database failover step solved to your satisfaction otherwise, and your actual concern is that Veeam does not let you start to start transaction logs backup in site B.
And for that, the KB article should help.

Post by **makacmar** » Nov 09, 2021 3:02 pm this post

Hello,
we have stretched cluster, where VM is able to migrate anytime from DC A to DC B.
i dont see option how to achieve continual backup of SQL and oracle, if DC A is down whitout manual intervention.
If application aware backup could be handled by one job only and cannot jump over 2 jobs - depends, where VM is located, then option is only to create sobr policy like failover mod between 2 sobrs, if in first sobr are all extents down.
If SQL is producing 100 GB of logs per hour and DC A is down during night, then we have all such DBs stopped in the morning, because manually is not possible to handle such changing in time.
We have copy job for both jobs, to do copy of backup to the other side. So I see as safe restore action.
But our customer wants to have 450 such oracle and SQL VMs and in case DC A is down, there is issue, that DB will be stopped due full log destination inside of VM.
I dont see KB as solution in this case, or i am wrong?

Post by **HannesK** » Nov 10, 2021 7:41 am this post

where VM is able to migrate anytime from DC A to DC B.

the "anytime" is something that was not clear to me. I was answering about disaster situations. As disasters only happen very rarely, I think that manual intervention (or scripted) is a valid option.

I suggest to simplify then: just have one backup job always doing the same backup. It sounds irrelevant that there is some cross traffic between the datacenters, because the VMs can move "anytime".

If you want to cover disaster situations automatically, there are several options
- re-design SOBR in V11 with the suggestions from above (waiting for the SOBR feature requests to get implemented makes no sense)
- re-design the repository with V12 with geo-redundant object-storage - I assume that disaster will not strike before V12. Your SE / SA can help with details.
- if you want to stay with the current design and two jobs, then the KB article can help. Two backups jobs (my first assumption) and only one of them is doing log shipping. No backup copy jobs, because redundancy is already given by the two backup jobs. But you still need to enable log shipping on the second job in a disaster situation. It also has the downside, that logs are stored only on one site.

Even point 4 (manual mapping) would involve manual interaction. As I don't have a timeline for that, I would also not wait for it if you need a solution "now"

Post by **makacmar** » Nov 11, 2021 1:52 pm this post

Hello Hannes,
customer has strechted vSAN cluster across 2 datacenters.
He is not able to see service provider part, but in rest has full access
and is able to deploy VM by himself and put to DC A or B.
is able to set policy, that if default location is A, could migrate to B.
Or is able to run vmotion anytime, because he has dedicated vCenter.

- running in 2 jobs is manual activity - if DR will happen, it will be affected thousands of oracle and sql VMs
- geo-redundant object storage looks promising, but i dont know how hard will be redesign our automation
- best way is group of sobrs with policieslike failover, or in job have option for second sobr, in case all extents in original sobr are down

R&D Forums

High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Re: High availability of SOBR #Case 05112528

Who is online