Feature Request: smarter Cluster backups

Marvellous · Post by **Marvellous** » Aug 21, 2019 12:48 am this post

Greetings all,

Are there any plans to make Veeam Agents smarter with respect to clusters and replicated data?

At this stage (Veeam 9.5 U4a), a fail-over cluster with physical RDMs needs to be backed up with an agent, as vSphere cannot snapshot the physical RDMs. The agent is loaded on each node, and the Change Block Tracker assesses the node with a view that all data is local. If a fail-over event occurs, and the RDMs are removed from the primary node and mounted against the secondary node, the agent on the secondary node sees the relocated RDMs as new data and tries to back it all up from scratch, blowing out the backup time and the repository size and duplicating all of the data within Veeam. Instead of doing an incremental backup in a couple of hours, our backup blows out to 4 days, and fills the repository with new data, requiring manual intervention to prevent catastrophic failure.

The agent should be able to recognise the relocated data as already existing in Veeam, then do incremental backups only, but it doesn't.

Ironically, Update 4b is Storage Replica aware, and will only backup data once when it is actually replicated, but the Failover cluster data that exists only once gets replicated by Veeam.

I've seen various similar topics around regarding Microsoft Failover Clusters, AAG and DAGs, but nothing that quite covers this.

Thoughts?

Post by **HannesK** » Aug 21, 2019 9:33 am this post

Hello,
and welcome to the forums.

and the repository size and duplicating all of the data within Veeam

that's not normal. Do you have a case number for that behavior?

Ironically, Update 4b is Storage Replica aware

can you describe what you mean? The agent does not see what happens on the storage. It only sees the shared volume mounted to the cluster.

If a fail-over event occurs

I guess you are talking about a windows cluster failover event. Nothing changes to the VMware environment?

Best regards,
Hannes

Marvellous · Post by **Marvellous** » Aug 22, 2019 3:34 am this post

Hi Hannes,

Thanks for your response!

When you say "that's not normal", are you implying that the Agent should only create one copy of data, regardless of which Failover Cluster node the data is hosted on? We were advised by our vendor that having to do a full backup as a baseline each time the failover cluster changed nodes was entirely normal, so we never logged a job. We had been assured that this was normal behaviour.

The reference to Storage Replica is admittedly a little off-topic here: Storage Replica doesn't use RDMs, so it can be snapshotted, and therefore backed up agentless, so it's not really a discussion for this thread, as it won't use the Veeam agent. But the release notes state "support for Windows Server Storage Replica, including automatic exclusion of duplicate copies of data at backup time". so it seems that Veeam can identify data duplication in Storage Replica clusters, but not when using the Agent in Failover clusters.

Sorry, yes: the failover reference was talking about a Microsoft Failover Cluster fail-over, not the ESXi hosts. So if the MS Failover cluster moves the disk resources from the Windows Primary Node to the Windows Secondary Node, the Agent sees that as new data and tries to back it up from scratch.

This is problematic for two reasons: 1. It takes 4 days to do a full backup of 64TB across the wire via Agent, and 2. the backup repository has a maximum space of 120TB, so we don't have the space for two separate copies in the repository.

So, is this normal behaviour? If not, how do we fix it?!

Post by **HannesK** » Aug 26, 2019 9:22 am this post

Hello,

When you say "that's not normal", are you implying that the Agent should only create one copy of data, regardless of which Failover Cluster node the data is hosted on?

yes. That's what I would expect from any backup solution. But it's a good point to add that to my upcoming blog post. I did the following test: failover from node 1 to node 2. The incremental backup was 350 MByte. Full backup was 13 GByte. After full shutdown of the cluster, the incremental was 3 GByte. There is no full backup if everything works as expected.

We were advised by our vendor that having to do a full backup as a baseline each time the failover cluster changed nodes was entirely normal, so we never logged a job.

no idea - I'm not the vendor

Best regards,
Hannes

Marvellous · Post by **Marvellous** » Sep 02, 2019 5:02 am this post

Hi Hannes,

So you would expect Veeam to keep only a single copy of all cluster data, and for the Failover cluster to keep doing incremental backups, regardless of which Microsoft server node the data is hosted on?

If this is the expected behaviour, why is our Failover Cluster not behaving as expected, and instead attempting a Full backup every time the Microsoft Cluster fails over nodes? Should I log a Support Call and log this as a fault?

Incidentally, the Vendor in question is a Veeam partner, as recommended to us by Veeam for the Veeam implementation.

Cheers

Post by **HannesK** » Sep 10, 2019 6:19 am this post

Hello,
yes and yes (as mentioned in my first post

).

Please post the case number here for reference.

Best regards,
Hannes

Marvellous · Post by **Marvellous** » Sep 10, 2019 7:12 am this post

Hi Hannes,

Case # 03744935.

There should be full details in that job, but we're seeing some really odd behaviour in these backups. Like the Backup Repository reporting that it's backing up 158TB of Fileshare cluster, even thought the cluster has only 54TB used with a total capacity of 80TB. It then stops and does a full in the middle of the week, for no apparent reason, but doesn't mark it as a Full, just an incremental. We have a SAN limitation of 120TB for a volume, so we are running out of space regularly if it tries to do a full backup on top of an existing full backup, because we just can't fit multiple full backups in the same repository. And every attempt it makes blows our backup out for 3 to 4 days.

Like I said: more details in the case, but let me know if you want any clarification.

Cheers,
Mark

Zach123 · Post by **Zach123** » Dec 27, 2019 12:26 am this post

Hi

I bet it is because of the "Per-VM backup" enabled on the cloud repository your backup is going to. We are facing a similar issue. The data on the shared disk is backed up and saved twice, once for each node. Each node is saved with an independent copy of the data on the shared disk.

backupquestions · Dec 27, 2019 6:24 am

https://www.veeam.com/blog/windows-2019 ... agent.html

If you use agents and create an actual failover cluster job then according to the blog post, even a repo with per vm chains is ignored and one chain is made for all cluster members. Which makes sense.

So I would question if you properly set up an actual failover cluster backup job as documented in the blog post.

tyler.pittman · Apr 15, 2020 10:59 pm

I know this is an old thread, but I came across it in a search and I want to second it. I'm seeing this exact issue, and I've seen it with multiple file server clusters, too. It seems like if the backup so much as sneezes, or if there are non-existent random "cluster membership changes", it gives up on the backup chain entirely and starts a new full backup. Some of us are backing up 30-40TB data sources with this - an unexpected full backup in the middle of the week is a death sentence, to both the backups that need to run nightly while it runs, and our backup repositories' that need that space. How are we supposed to calculate our required repository storage when it can be nuked randomly? I put in a ticket for this issue, and the support person said that it would be fixed by upgrading to V10, and I upgraded, and it just happened to us again, with a different cluster.

This isn't even unexpected behavior, per this: https://helpcenter.veeam.com/docs/backu ... ml?ver=100

"In case a backup task within a Veeam Agent backup job that processes a cluster completes unsuccessfully, Veeam Agent for Microsoft Windows will create full backup of all shared disks of the cluster."

Why would this product be designed in such a way? Sorry to be so pointed, but I've been a Veeam evangelist since 2014, shouting it from the rooftops when I worked in an environment that was all virtual (I even sold my current org on Veeam), but now that I'm in an environment that has big clustered physical data sources, I am severely disappointed by how far behind the Veeam Agent for Windows is.

Post by **HannesK** » Apr 16, 2020 8:37 am this post

Hello,
and welcome to the forums.

The user guide is not precise enough at this point. I will try whether we can improve that. Yes, there are some situations where a new full backup is required. But for most "unsuccessfull" backups there is no new full backup. With V10 we improved it further.

As you mentioned that you have regular issues... could you please post the case number to see what we can improve (assuming that you are using V10)?

Thanks,
Hannes

R&D Forums

Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Re: Feature Request: smarter Cluster backups

Who is online