Comprehensive data protection for all workloads
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

mkretzer wrote: Aug 06, 2019 4:24 pmhttps://helpcenter.veeam.com/docs/backu ... l?ver=95u4
"Veeam Backup & Replication automatically sets the SAN Policy within each proxy to Offline Shared."
Right, after reading the link I recalled that this was implemented for Direct SAN Access transport mode (just as per the User Guide's section), so this is the correct setting for that case.

But I also stand corrected that offlineShared is a cause of the issue for hot add. After digging Microsoft documentation on SAN policies, this setting does appear to cover SCSI bus disks too (and not just iSCSI, as I thought). So, my idea was completely incorrect... back to drawing board.

Right now I am completely lost as to why "half of our volumes get mounted on the proxy (including getting a drive letter) while backup is running"... all possible ideas are now proven 100% wrong. Due to offlineShared setting, they should have been kept offline and read-only.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Whats even more confusing:
Your support provided a nice script logging all the to a proxy attached disks.
It always shows one additional attached disk. Even while active full is running. But that one disk is sometimes in one backup offline and sometimes in other backups online.

I wonder: could that be another crazy Windows 2019 bug?
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

No, definitely not. We're not seeing anything like that with any OS versions, including Server 2019... so this must be something environment-specific in your case.

Can you try to do this:
DISKPART > automount scrub
for the OS to "forget" all previously mounted volumes, and then see if these volumes start piling up again despite of SANpolicy set to offlineShared?

If they don't, then it would be an indication that this setting was possibly disabled at some point, which resulted in these volumes being automounted and memorized by the OS (after which having SANpolicy offlineShared will not make any difference).

If they still do, then clearly the next step should be to involve Microsoft, as we're not seeing such behavior in any of our labs.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

@Gostev: Your support has sent me a long explaination. Seems like offline shared and offline all both dont work always!

Automount must be disabled as well. Also this very well can hit other customers too from what i understand.

Currently they are debating how to fix this permanently from your side... I think this is important information and at least a warning should be postet in the backup log!
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

mkretzer wrote: Aug 12, 2019 4:46 pmSeems like offline shared and offline all both dont work always!
There must be some misunderstanding between you and support then, because R&D has confirmed with a very comprehensive testing that offlineShared always works. With this setting, Windows will only mount the disks already "known" to the OS (meaning, those which were mounted to the OS at least once before Veeam backup proxy was even deployed). But, I already checked on this possibility with your earlier, and you have confirmed this absolutely cannot be the case.

So, I would still like to the bottom of this, and the test I suggest in my previous test would help us to. It is also very safe, because you will be able to observe the issue easily through the presence of additional "known" drives in the registry following a hot add backup. Would you be open to do this test?

As noted earlier, disabling automount has its own implications, and last thing we want is fixing the headache with a guillotine. But, in order to find the best solution, we need to fully understand what is causing the issue in your particular environment.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Gostev,

no. Did you read the document they wrote/provided:
"As it turned out, indeed, setting San Policy to either OfflineShared (what Veeam already does by default for any newly added server) or OfflineAll (what we, Support, did in some cases with ReFS volumes not feeling too well after being backed up in HotAdd) is not guaranteeing that the volumes will not be mounted."

In the analysis document they sent us they sound quite sure about the whole thing. They seem to know exactly what is wrong.

So can you please talk to support if there is anything unclear on your side? If so i am happy to do further tests!

They also told us to:

1) Make sure the SAN Policy is OfflineShared (!)
2) Disable automount via Diskpart by running “automount disable”. Follow it up with “automount scrub”
3) Run mountvol /r (make sure to exit Diskpart first)

So again, not "OfflineAll"

Markus
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

Yes, I've been in touch with support regarding this and cleared up their confusion. Also, the support case has been transferred to the team leader. Everyone is now in agreement that the next step should be the following test:

1. Make sure the SAN Policy is OfflineShared, and automount is enabled on the backup proxy (default Veeam backup proxy settings)
2. DISKPART > automount scrub OR CMD > mountvol /r (afaik, these do the same thing)
3. Perform some hot add backups of various VMs

Expected result: not a single backed up drive should be mounted to the backup proxy, even if automount is still enabled.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

It gets confusing now: automount scrub leads to the system volume (including a drive letter!) and the two dynamic volumes (shown as foreign) of the vm beeing mounted and the GPT volume which got corrupted the most to be offline.

Testing with mountvol /r next
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer » 1 person likes this post

LOL nope. After another automount scrub and then mountvol /r the big GPT disk gets mounted again, now one of the dynamic disks is offline and the other dynamic one is still online.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

It gets worse... We disabled automount:
DISKPART> SAN
SAN-Richtlinie : Offline - Freigegeben
DISKPART> automount
Die automatische Bereitstellung von neuen Volumes ist deaktiviert.

Still volumes get show up but get no letter!
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Final test: automount off, offlineall, scrub. Volumes still show up but get no drive letter (should they not be "offline")....
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

mkretzer wrote: Aug 13, 2019 6:34 pmFinal test: automount off, offlineall
As per my request above, can you please test with automount on and offlineShared?
This would represent default settings following Veeam backup proxy installation.

Expected result with these settings:
All backed up disks appear as offline in the Disk Management snap-in.

If you see any other result, please post your respective screenshot.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Gostev,

That was there setting as we tested yesterday. Result was as before disks are not offline!! But they are not offline with offline all, automount off and scrub as well! I will test again and provide screenshot (how can I put a screenshot here??).

Markus
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

https://imgur.com/a/53xc5l3

Here you can see the chaos... And which volume is offline and which not and which gets a letter changes every backup... Thats why some backups are consistent and some are not...

Edit: To Explain what is what:
Disk 0 is the proxy system drive
Disk 1 is a dynamic disk of the backed up vm which sometimes got corrupted (dedup/dynamic disk)
Disk 2 is the OS disk of the backed up vm which even was assigned a drive letter and which got corrupted sometimes as well
Disk 3 is a GPT disk with dedup which got corrupted the most (sometimes completely destroyed and unreadable)
Disk 4 is another dynamic disk which is the only one that is offline

@Gostev: can you submit this result to support or do i need to update the case as well?
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

Yeah, clearly some weird stuff is going on there with this backup proxy. No worries about submitting this information to support, we have an internal thread going with all stakeholders.

For further troubleshooting, I have two options:

Option 1. To exclude the possibility that Windows OS itself is messed up on this particular proxy (or that some 3rd party software is acting up), it would help to deploy brand new backup proxy by doing clean OS install manually from Windows distribution ISO. After that, without installing any 3rd party software, see if the issue repeats on that new backup proxy.

Option 2. Open a support case with Microsoft and have them troubleshoot this (we can open it on your behalf, but obviously it will require your time in case they need logs). Naturally, this would be better to do this after Option 1 and with the clean OS install. Since we're talking VM, may be you can even create a snapshot once the OS is fully updated, right before installing a Veeam backup proxy role and doing the experiments - so that rollback to the clean state is easy.

In general I think we're pretty close to understanding the issue now, with all signs currently pointing at some issue with this specific backup proxy. Naturally, we never see backed up disks being Online in any of our labs.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Gostev,

ok lets go for option 1. But: These proxys where freshly setup with windows 2019 just 2-3 months ago. Nothing was done with them but installing proxy component.
I am currently on vaccation, i will ask my colleagues to do a fresh install on a new proxy. So we will:
- install W2019, newest updates + VMware tools
- Create a snapshot
- Add the proxy to veeam
- Test Backups with out test job

ok?
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

That is correct + remember not to install any 3rd party software too.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Fresh install does not show the issue! All drives stay "offline" even with default settings.
But this does not make it "solved" for us.
1. We should find out what can cause this - Perhaps a remote session with veeam support or some kind of analysis script to compare settings would be nice
2. Veeam should prevent this from happening ever again with any customer. This should be possible: After the volumes are mounted and before reading data starts veeam should check volume status. If one of the volumes to be backed up is "online" if should fail the backup right away

Until 2) is implemented hot-add is too risky to use from my point of view! Which is bad for us as with NBD backups are quite slow. I hope the performance issues of direct SAN at the start of backup are solved when we upgrade all our ESXi to 6.7U2...
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

OK, thank you very much for confirming the issue is specific to the specific backup proxy server (and especially for spending your vacation time). What you're observing with the new backup proxy is consistent to what we're seeing in our test labs.

Since we're dealing with what is either Windows OS or a 3rd party issue on the particular server, it would be best to engage Microsoft for further troubleshooting of the impacted server. Let us know if you're able to open a support case of Microsoft directly. We can do this as well, however we don't have a single system where this issue is reproducible - so we would have to immediately refer them to you anyway.

We will decide on the best course of action based on the conclusion from Microsoft. We need to know what exactly is happening on that server, in order to be able to design a reliable and bulletproof solution.

Unfortunately, it is a bit too late to include any sort of advanced new logic in v10, especially without understanding the issue we're fixing. For example, if those volume status checks will appear to sometime return invalid results on certain system configurations, we're risking to break hot add for thousands of customers. So unless absolutely necessary, we really don't like to touch the code that we spent 10 years stabilizing, making sure it works reliably in 500K different environments... and at this time, we don't even know what's going on with the offending server. This makes it really hard to justify making any risky last minute changes.

However, for example bringing back the code that automatically disables automount will always remain an option for v10.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Gostev,

understandable. I will do this after my vaccation.

I wonder: Would disabling automount prevent a volume getting corrupted even if the disk goes "online"?

Markus
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

That's the current theory... we believe that Windows OS does not do things* to volumes until they are actually mounted to a drive letter (or a folder). And while it is best to just have all disks remain "Offline" as the lower-level protection, we believe that merely having them "Online" should not cause issues until they are actually mounted. As such, scrubbing and disabling automount at the time when backup proxy is deployed may provide another layer of protection.

*Things seem to be limited to special workloads like ReFS or Windows dedupe. Perhaps there's some transaction log that OS starts to automatically replay once the volume is mounted, or something along these lines. Because regular NTFS volumes don't seem to receive any modifications in any case.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Ok.

I think we will stay with NBD for now until we know whats the real issue. Perhaps there is something which we can implement in our monitoring to make sure it does not happen again.

Markus
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

I just asked in the old Case if this is fixed and was told that even if it is not fixed in Windows, V10 does something to fix the issue. What is that change?
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

Yes, short answer is that v10 does work around these crazy Windows logic/features which cause the issue. I will share the details for in a few days, as this is not a quick one - there are three different things that had to be implemented. Thanks!
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Nice! Do we have to wait for the next forum digest? :-/
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

Probably a little more, as the next few days will be the usual major release chaos :D
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev » 8 people like this post

I forgot to update this topic, but better late than never.

I will give the complete story to make it more convenient for new readers.

As many of you know, Veeam automatically disables the automatic mounting of disks on backup proxies. Originally, this was done to prevent the OS from re-signaturing VMFS volumes on LUNs attached to Direct SAN Access transport mode backup proxies. But since we can't know what transport mode will the backup proxy use, we apply this setting to all proxies. In early days of Veeam, we did this by using the automount disable command - but after this setting interfered with one of the R2 updates for Windows Server, we switched to the "recommended" SANPolicy OfflineShared setting instead, and lived with this happily thereafter for many years until mid-2019.

This was when we first faced this really curious case from OP with corrupted hot add backups. He could reproduce the issue quite reliably even after deploying everything fresh, but we just did not see the same behavior in our own labs. Eventually, we were able to pinpoint the difference – we could see that the backed up volumes from hot added VMDKs go Online and get automatically mounted in his environment. While the Online status for the disk by itself is not a problem, having the volumes from the disk automatically mounted allowed the backup proxy server OS to make changes to them – and this in turn did not play well with VMware CBT, sometimes resulting in an inconsistent backup taken.

The biggest part of the puzzle we had to solve was WHY would the newly attached volumes that the OS most definitely never saw before would be automatically mounted, when they were supposed to be protected against that with the SANPolicy OfflineShared setting, which we verified times and again was set correctly. This was borderline magic, so at this point we engaged Microsoft.

It was a very long and complex investigation, but we were blessed to get a really strong support engineer. And after thorough investigation involving Microsoft development, the issue appeared to be on an intersection of the two different "by design" features.

The first feature causes the OS to always mount disks it considers to be "critical", and more specifically those containing dump file, page file or hibernation file. This type of automatic mount ignores the SANPolicy setting, and this is exactly what we were facing here. OK, so far so good, this decision totally makes sense when we're talking about the system disk of the backup proxy itself. But how can it possibly apply to some other disk? How does the OS even know what's on the LUN it never "saw" before, considering that it is specifically told to mount ALL new LUNs as Offline? Fair questions!

Indeed, the OS cannot and does not know - it just "assumes" so, due to another unexpected feature. Apparently, whenever you have a situation where the mounted disk has the same MBR Signature or GPT Disk ID as the backup proxy system disk, which is more likely to happen in virtual environments, something special happens. While the mounted disk will of course remain offline due to the disk ID conflict, this collision causes the OS to follow the special "conflict resolution" logic path. This code path, in particular, marks the corresponding SCSI slot with the special flag, sort of "Online Always" policy. And from this point on, volumes from ANY disk attached to this slot will be automatically mounted by the OS regardless of the SANPolicy setting. Basically, based on the previous conflict, Windows makes an incorrect assumption that any disk that gets attached to this slot can only be a "critical" disk.

So, after understanding the issue and discussing possible workarounds with Microsoft, we've made the following changes to v10.

First, we went back to using the automount disable setting on our backup proxies, instead of less respected SANPolicy (although we did keep this one as well, just in case). This automount setting ensures that volumes from the disk will never be mounted to the OS, even if the newly attached disk goes Online due to the peculiarity described above. The only times when volumes will be mounted, is if there are existing mount points for these volumes already created earlier.

Which is why in addition, starting from v10 we automatically remove all stored mount points for disks that are not actually present in the system. This is done at the time when our data mover component is installed (or updated) on any managed server. This is an important change to keep in mind, because it may impact backup to rotated drives by cleaning up the corresponding mount points, if the disk is not present in the system during our components installation or upgrade.

Across these two changes, we are fully covered against the issue: we clean up all existing "dead" mount points when the backup proxy is added (or updated to v10), and we disable the automated creation of new mount points from that point on.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

Gostev,
great analysis - we are so glad you found the issue. I am just hoping the linux proxy which we will start to use will not have similar issues.

As always, it was great working with you :-)
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by Gostev »

Right, this logic is certainly specific to the Windows OS, which in this case tries to "help" inexperienced user to ensure critical disk availability. As you know, Linux has a different paradigm, and never does anything on behalf of a user automatically "for the greater good" :D
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Inconsistent Filesystems on restores when backing up with Hotadd

Post by mkretzer »

The issue is back - in the latest V10 Version, with automount disabled and SAN to offline shared.
Also with fresh setup W2016 proxies which were not cloned.

Case 04765826!
Post Reply

Who is online

Users browsing this forum: dnaxy and 171 guests