Comprehensive data protection for all workloads
Post Reply
Nick-SAC
Enthusiast
Posts: 74
Liked: 15 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

ReFS: The File Has Been Removed From The File System Namespace

Post by Nick-SAC »

Subtitle: My ReFS MESS
or ReFS: It’s really neat.. until it ISN’T!

I haven’t opened a case on this because I’m not concerned with losing the Data in question and I think this is more of an ReFS Issue/Problem than VBR but in this case that seems to be a distinction without a difference and this is an issue that should concern everyone using an ReFS Backup Repository!


The VBR Backup Server/System:
Windows Server 2016 [Version 1607] [OS Build 14393.3866]
Primary Backup Repository: Internal SATA HDD ReFS
Secondary Backup Repositories: External USB HDDs ReFS


The Initial Problem:
As spelled out by these 2 subsequent Windows Event Log Entries:

------------------------------------------------------------------------------------------------------------------------------------
Log Name: System
Source: Microsoft-Windows-ReFS
Date: 8/1/2020 2:14:27 AM
Event ID: 133
Level: Error
Description:
The file system detected a checksum error and was not able to correct it.
The name of the file or folder is <Filepath\MadeUpFileName>.VBK
------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------
Log Name: System
Source: Microsoft-Windows-ReFS
Date: 8/1/2020 2:14:29 AM
Event ID: 513
Level: Warning
Description:
The file system detected a corruption on a file. The file has been removed from the file system namespace.
The name of the file is <Filepath\MadeUpFileName>.VBK
------------------------------------------------------------------------------------------------------------------------------------

The Subsequent Problem:
As seen in the VBR Backup Copy Job Logs

8/1/2020 2:46:30 AM :: Failed to merge full backup file Error: The system cannot find the file specified.
Failed to get attributes for file [<Filepath\MadeUpFileName>.VBK]
Agent failed to process method {ReFs.IsIntegrityStreamSame}.

8/1/2020 2:46:42 AM :: Failed to generate points Error: The system cannot find the file specified.
Failed to get attributes for file [<Filepath\MadeUpFileName>.VBK]
Agent failed to process method {ReFs.IsIntegrityStreamSame}.


Now, I’m pretty confident that the initial problem (Checksum Error/Corruption) was hardware related to the External USB HDD (as this same sequence occurred 3 times over several weeks with different Backup Copy Jobs & Files, all on that – and only that – single External USB HDD) and I can replace that HDD and I have other good Backup Copy Jobs, so I’m not concerned about losing the Data or my GFS Copies, etc.

What I am concerned with is how to fix or mitigate this issue and/or how to prevent it from occurring again, especially on a system that doesn’t have multiple/duplicate/redundant Backup GFS Copies.

So,

1) Is it normal for ReFS to simply Remove a File that it deems to be corrupted?

2) How does one protect against this issue when the potentially Removed File is a VBR Backup File!?

3) Is there some way to ‘Reset’ that Backup-Copy-Job Job or catalog of files?

FWIW, I tried doing an Active Full on the Backup Copy Job and while that Job processed successfully the subsequent Backup Copy Jobs keep generating a “Fail” with the still missing file errors:
Failed to merge full backup file Error: The system cannot find the file specified. ...
Failed to generate points Error: The system cannot find the file specified. ...


Thanks,
Nick
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: ReFS: The File Has Been Removed From The File System Namespace

Post by soncscy » 2 people like this post

Hey Nick,

133 is kind of a ReFS boogeyman in my experience, and Microsoft's answer is to do Mirror Accelerated Parity. Expensive storage wise, but basically you get RAID-like benefits as I get it with the highlights of block cloning. Without this, ReFS tries to alert you to corruption on the volume with these events; with the parity, it should try to self-correct, but otherwise, it produces the event and warns you "Hey, you got corruption on your volume".

For my clients that are doing ReFS, we monitor for the namespace events so we know asap if there is corruption. There is the refsutil.exe which might be able to recover data, but otherwise the only real answer is to use some disk parity to protect against individual volume corruption. This is the cost I guess of block cloning; you lose the source block, you lose the subsequent files.

Last one I can't comment on :/ Maybe it's still trying to find the old backup that refs removed and you just need Veeam to "forget" the old one?
PetrM
Veeam Software
Posts: 3229
Liked: 520 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: ReFS: The File Has Been Removed From The File System Namespace

Post by PetrM »

Hi Nick,

1) I would say that it's expected, at least based on the error message from Windows event log. I believe it's worth asking the same question on Microsoft Community Forums as well.

2) I think you can make a copy of backup file to another media: for example to setup SOBR and copy backups to an object storage as soon as they appear on the primary one. You may also archive backup files to tape using file to tape job.

3) Maybe this option is what you're looking for? It removes data about backup from Veeam configuration database.

I'd also recommend to perform periodic health check for backup files.

Thanks!
Nick-SAC
Enthusiast
Posts: 74
Liked: 15 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: ReFS: The File Has Been Removed From The File System Namespace

Post by Nick-SAC »

Hey Harvey (aka SONCSCY)

Thanks for the input!

If I read it right it appears that Mirror-Accelerated Parity is only available with Storage Spaces Direct and that’s only in the Datacenter Server editions... and, as you said, that would pretty expensive and prohibitively so for my current clientele.

It doesn’t look like ReFSutil would help here.

Besides the fact that I’m not a fan of Software-Based RAID (which Storage Spaces Direct w/Mirror-Accelerated Parity appears to be essentially) I’m not even sure that any Disk Parity solution would protect against this type of ReFS issue, e.g., as seen in this post (which IMO is well worth reading in its entirety).

================================->snip<-==============================
Recovery after ReFS events 133 + 513 (apparent data loss on dual parity)

I have a single-node Windows server 2016 with a dual parity storage space, on which a bitlockered ReFS volume resides with enabled file integrity. This ReFS volume hosted/contained a ~17TB vhdx file with archive data since its setup half a year ago. This file has now suddenly been removed by ReFS! ...

... this is (or was) our backup server... I purposely chose a dual parity storage space and ReFS for data reliability. I still can't believe that a checksum corruption in probably only a few or a single block of a ~17TB large vhdx file can effectively delete it entirely, without any supported way for admins to read-access it again! That would be totally counter-intuitive to the design purpose of ReFS! ...

from https://social.technet.microsoft.com/Fo ... erverfiles
================================->snip<-==============================

Thanks again,
Nick
Nick-SAC
Enthusiast
Posts: 74
Liked: 15 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: ReFS: The File Has Been Removed From The File System Namespace

Post by Nick-SAC »

Hey Petr,

Other than confirmations of this apparently inherent issue/problem with ReFS, I haven’t been able to find anything of value on the Microsoft sites & forums.

Pardon my ignorance but what is “SOBR”? (The most notable reference I can find with Google is the Russian Spetsnaz “Special Rapid Response Unit:lol:

These are already Backup Copy Jobs to alternate media... so it seems like having to make copies of the Backup Copy Files to yet other other media... would be a bit much...

I routinely do VBR Periodic Health Checks and they all passed before AND AFTER the missing VBK files were ‘removed’ by ReFS... so it would seem that the Periodic Health Checks are only concerned with those VBK files that actually do exist and not any others that should – but don’t – exist?!

If it comes to it, I will blow away the Backups on this particular HDD (as they are expendable) and start it over but my real concern is how to ‘fix it’ if it ever becomes necessary to do so. Specifically, I don’t know what the Removed/Missing VBK file contained and more to the point; if I do a ‘Remove from Configuration’ and then Import those files which still do exist, will the ‘Backup Chain’ be otherwise functional, i.e. presumably with only the ‘missing file’ Restore Points missing but still Restorable for the other Restore Points (before & after the missing file date?

Thanks again,
Nick
Mildur
Product Manager
Posts: 8549
Liked: 2223 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: ReFS: The File Has Been Removed From The File System Namespace

Post by Mildur » 1 person likes this post

SOBR = Scale Out Backup Repo
Product Management Analyst @ Veeam Software
YouGotServered
Service Provider
Posts: 170
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: ReFS: The File Has Been Removed From The File System Namespace

Post by YouGotServered » 1 person likes this post

Personally, I like the Spetsnaz meaning of SOBR more, but I suppose that I wouldn't want the Spetsnaz to have my data :)
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: ReFS: The File Has Been Removed From The File System Namespace

Post by soncscy » 1 person likes this post

@YouGotServered, Would be a very secure offsite copy though ;)

>I routinely do VBR Periodic Health Checks and they all passed before AND AFTER the missing VBK files were ‘removed’ by ReFS... so it would seem that the Periodic Health Checks are only concerned with those VBK files that actually do exist and not any others that should – but don’t – exist?!

I understand the concern here Nick, but as I get it, HealthCheck Checks existing backups -- this is no mystery for me, as if some backup not necessary for a specific point is missing, then there is no harm, right? It's hard to say just with the information here (maybe support can say more?) but frankly, for me this is just a day at the races with ReFS. Without the parity to back it, it's not possible really avoid such things, and it's just a matter of time until there's an issue.

Having multiple redundant copies is __really__ the only way around such an issue. Either at the storage level with RAID/mirror parity, or with native application tooling
PetrM
Veeam Software
Posts: 3229
Liked: 520 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: ReFS: The File Has Been Removed From The File System Namespace

Post by PetrM »

Hello,

Basically, I agree with Harvey and this is what I also recommend to do: to store multiple copies on different medias.

Nick-SAC wrote:the Periodic Health Checks are only concerned with those VBK files that actually do exist and not any others that should – but don’t – exist?!
Health check verifies only the latest restore point in backup chain: full backup file or full backup file and related incremental backup files, the workflow is described on this page.
My only idea is that the latest point in a chain was fine when Health Check was running, that's why it passed.

Nick-SAC wrote:my real concern is how to ‘fix it’ if it ever becomes necessary to do so
I don't think that such fix exists, we can take only preventive actions. However, I'd try to test jobs with another HDD, maybe the issue wouldn't re-occur?

Nick-SAC wrote:if I do a ‘Remove from Configuration’ and then Import those files which still do exist, will the ‘Backup Chain’ be otherwise functional
You cannot import remaining VIB files without an initial VBK, import itself will be failed in this case because data without initial seed is useless.

Thanks!
Post Reply

Who is online

Users browsing this forum: Google [Bot], ybarrap2003 and 245 guests