Failover cluster full backups

ctg49 · Post by **ctg49** » Apr 02, 2019 12:55 pm this post

Quick question.

We have a failover cluster housing some file servers, and periodically (after patching restarts, usually) we get an error in VEEAM about the cluster membership having changed, and a full backup being required. I'd expect this if the membership was different than the previous incremental, but each time I do patching restarts I put the resources back on their source VM (numbered resources, odds on host01, evens on host02). Am I misunderstanding what this error means? Is there something I can do to avoid getting 15TB worth of full backups every month?

Thanks!

Post by **Dima P.** » Apr 03, 2019 5:01 pm this post

Hello Chris,

May I ask if the note becomes inaccessible during backup process or you patch and reboot it in advance? Thank you!

ctg49 · Post by **ctg49** » Apr 03, 2019 8:04 pm this post

Hi Dima,

Never do failovers/restarts during backups... I checked the backup on it, and it seems to go something like this:

Resource normally on Host02
Failovers done for patching
Failbacks done once patching complete, so resource is back on host02
Backup performed, full backup taken due to being on wrong node
Confirm in VEEAM that backup is showing up under the 'wrong' node
next backup, VEEAM sees it on the 'correct' node, takes *another* full backup.

It's unlikely that the resource is actually on the wrong node after restarts and failbacks are complete, as I don't touch them very often and veeam seems to 'see' it on the correct node the following day. I'm open to suggestions, though.

Post by **Dima P.** » Apr 05, 2019 6:52 pm this post

Chris,

Thank you for the details! I've confirmed with QA folks that full backup should not be performed in the case like yours, so please open a case and share the case ID with me (we need to check your application debug logs to understand the root cause). Thank you in advance!

ctg49 · Post by **ctg49** » Apr 22, 2019 12:14 pm this post

Updating as this has continued to occur (once on Friday, once on Saturday... Saturday likely just a recurrence as I deleted the 'full incremental' from Friday).

Case #03521834

Post by **Dima P.** » Apr 30, 2019 6:50 pm this post

Hello folks,

Need some clarification on your setups:

1. Which Veeam B&R and agent version are you using?
2. Any chance cluster's storage is configured with Storage Spaces?
3. If the answer to last one is yes - please clarify if that Storage Spaces or Storage Spaces Direct?

Thank you in advance!

ctg49 · Post by **ctg49** » May 01, 2019 12:32 pm this post

1) The most recent that comes with VEEAM B&R, we're using 9.5U4a, about shows v9.5.4.2753.
2) Nope, just a basic MSCS file server cluster. Two of these are shared VVOLs, four are shared pRDMs (we're mid-transition to shared VVOLs). There's no correlation between pRDM/VVOL and these 'full backup incidents'.

Post by **Dima P.** » May 03, 2019 10:11 am this post

Hello Chris,

Thank you for the update! Please make sure that the agents are up to date as well (you can check that via Inventory node) and keep working with out support team. Cheers!

ctg49 · Post by **ctg49** » May 22, 2019 7:04 pm this post

Might have an update on this, tech is running down some other information/going to lab it out.

Consider the following scenario:
Cluster01
Clusterhost01
Clusterhost02
Resource01 hosted on host01

Backup is performed, which successfully backs up resource01 from host02 (basically instant, nothing to back up) and fails for some reason backing up resource01 from host01 (in my case, some network hiccup). The job fails that node, but succeeds the job parameters with a warning or whatever. Now the tricky part, a synthetic full backup is created as scheduled for this job, but the synth full has zero or minimal data within. As a result, the next incremental backup sees multiple TB of 'new' data which needs to be backed up to fulfill the chain.

This appears to be a bug, and should probably cause the synth full to fail if the backup of the resource fails. I don't know how this would work if one job was backing up multiple resources, however... we do one job per resource.

ejenner · Post by **ejenner** » May 10, 2022 9:19 am this post

Hello everybody.

I have been having an issue with backing up our clustered file server for a couple of years now. I've logged a few tickets regarding it and eventually with the most recent ticket I've been told it is expected behavior.

The reason for posting here is to try to find out if through the collective hive mind on the forum if there are possibly other opinions on how to do file server backup properly. We are replacing our file server infrastructure imminently, the hardware is in place and already installed so now is a chance for us to change things if we want to.

The current problem:

At the moment with our existing configuration we have a OS level cluster with a virtual file server and file server roles. This is configured as per this blog: https://www.veeam.com/blog/how-to-creat ... -2019.html

What seems to happen is that changed block tracking fails when the file server role moves from one cluster node to the other. The move can be triggered when a node is put into maintenance for security patching for instance. The other time CBT seems to fail is when the Veeam product is upgraded. It seems to reset the CBT data for our clustered fileserver backups during the software upgrade.

Of course the issue with CBT failure is you have to backup all the data from scratch, it won't do an incremental backup. We don't have space on any of our repositories for multiple copies of our data.

This may just be a quirk with our setup, in which case I'm open to suggestions on where to look... but I've been researching other backup products and found other providers have a similar view on what would happen in this scenario.

Whether or not we could use the 'File Share' backup function to target the SMB shares on the cluster is a bit of a moot point as the licenses for this feature are pegged to data size rather than per instance or socket. So with the size of our data being fairly large it would double our already considerable licensing costs. I'm not sure if the 'File Share' backup works for clusters but if it did we wouldn't be able to afford to use it.

I'm interested to hear peoples views on the ways other people work around this problem. I have some of my own ideas but I'm keeping those to myself as I don't want to push the discussion in any particular direction.

Post by **HannesK** » May 10, 2022 2:32 pm this post

Hello,

Of course the issue with CBT failure is you have to backup all the data from scratch

CBT = change block tracking. A CBT failure would lead to a full scan, but not to a full backup. I remember that new full backups were an issue in earlier versions, but that should have been fixed (please see above)

Support should investigate, why this really happens and escalate it to higher tiers (I see that case was finished at tier 1). Looks like the case is more than a month old. Not sure, how much sense it makes to continue on it.

Best regards,
Hannes

ejenner · Post by **ejenner** » May 11, 2022 2:09 pm this post

I've since read in Veeam documentation that a node change for the fileserver shouldn't trigger a new backup so that's not the cause. I agree, support should spend more time looking at this issue but as we're moving to a new platform pretty soon it wouldn't really be worth dwelling on the past problem with the old system. A couple of years ago when the problem first started it would've been nice to have found the cause but by now it doesn't matter.

AOK-BV · Post by **AOK-BV** » Jun 14, 2022 10:32 am this post

We have the same issue with our file server clusters. Case 05246809

Post by **HannesK** » Jun 14, 2022 2:05 pm this post

Hello,
I just read through the case and it looks like support only investigated network issues. Finally the case was closed because of "no customer answer".

To really fix the issue, I can only suggestion the following
1) ensure that all other (network) errors are gone
2) if a full backup happens instead of an incremental backup: open a case with current logs and ask why it happened and whether there is a bug number for it. If it's posted "in time" (old cases do not help, because logs are delete after some time for legal reasons) here, I can also check with support what's going on.

General hint on support: the most efficient way is doing "one issue per case".

Best regards,
Hannes

R&D Forums

Failover cluster full backups

Re: Failover cluster full backups

Re: Failover cluster full backups

Re: Failover cluster full backups

Re: Failover cluster full backups

Re: Failover cluster full backups

Re: Failover cluster full backups

Re: Failover cluster full backups

Re: Failover cluster full backups

[MERGED] Clustered Fileservers - Case #05314862

Re: Failover cluster full backups

Re: Failover cluster full backups

Re: Failover cluster full backups

Re: Failover cluster full backups

Who is online