Comprehensive data protection for all workloads
Locked
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

@nmdange, cool ok so you're running 14393.1198 then ..

@thomas.raabo You're running the experimental fix from msft ??"14393.1100" ?
Cicadymn
Enthusiast
Posts: 26
Liked: 12 times
Joined: Jan 30, 2017 7:42 pm
Full Name: Sam
Contact:

Re: REFS 4k horror story

Post by Cicadymn »

I've got 14393.1198 on my main backup VM and 14393.1100 (experimental) on my backup copy vm.

I haven't re-enabled synthetic fulls on my main backup VM yet as I haven't heard whether or not it includes the fixes from the experimental driver. On one note, the registry keys added when I set up the experimental fix are not present in the current 1198 driver.
andersgustaf

Re: REFS 4k horror story

Post by andersgustaf »

Hi,

I have the same problem, but Im not using Veeam.. Im using DPM :)

I have also got the refs.sys file from a MS technician, but my issue is when Im replacing the existing refs.sys the disk is reported RAW and can not be read.

Any special order/rutine to change the file that my technician has not mentioned?


Best regards,
Anders
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

Welcome to Veeam land. What happens when you put the refs.sys file back you had originally ? Don't quote me on this but when windows 2016 was released and an refs volume was formatted it wasn't the same version we have now. I could be totally wrong here but I've read that somewhere. Maybe that's the issue ? Version 3.0 vs 3.1
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: REFS 4k horror story

Post by lepphce1 »

An update from me. I installed the beta driver provided by Veeam 14393.1100 with the associated registry entries. I continue to have problems with BSOD/reboots. In my case it was necessary to delete a corrupt backup chain (caused by this very issue), which caused the server to go into a series of reboots. This is the same behavior as the official production driver. Hopefully there will be an updated driver soon.
thomas.raabo
Service Provider
Posts: 28
Liked: 11 times
Joined: Oct 31, 2016 6:27 pm
Full Name: Thomas Raabo
Location: infrastructure guy
Contact:

Re: REFS 4k horror story

Post by thomas.raabo »

kubimike wrote:@nmdange, cool ok so you're running 14393.1198 then ..

@thomas.raabo You're running the experimental fix from msft ??"14393.1100" ?
correct.
thomas.raabo
Service Provider
Posts: 28
Liked: 11 times
Joined: Oct 31, 2016 6:27 pm
Full Name: Thomas Raabo
Location: infrastructure guy
Contact:

Re: REFS 4k horror story

Post by thomas.raabo »

andersgustaf wrote:Hi,

I have the same problem, but Im not using Veeam.. Im using DPM :)

I have also got the refs.sys file from a MS technician, but my issue is when Im replacing the existing refs.sys the disk is reported RAW and can not be read.

Any special order/rutine to change the file that my technician has not mentioned?


Best regards,
Anders

Did you put your system into test signing mode?
andersgustaf

Re: REFS 4k horror story

Post by andersgustaf »

thomas.raabo wrote:Hi,

I have the same problem, but Im not using Veeam.. Im using DPM :)

I have also got the refs.sys file from a MS technician, but my issue is when Im replacing the existing refs.sys the disk is reported RAW and can not be read.

Any special order/rutine to change the file that my technician has not mentioned?


Best regards,
Anders


Did you put your system into test signing mode?
No, I got no instructions of doing so.
But I tried that on a test VM that I created and after a reboot it felt like the server became 100 times faster and I was able to access the disk.
I have asked the technician for a new file but I guess that wont help since the file isnt signed. Also, if you check properties on the refs.sys you will see a "certificate error" and that is since it isnt signed (duh).

Thanks for the help!

by the way, I opened my case with MS late April and it took a while to get here... :)

//Anders
thomas.raabo
Service Provider
Posts: 28
Liked: 11 times
Joined: Oct 31, 2016 6:27 pm
Full Name: Thomas Raabo
Location: infrastructure guy
Contact:

Re: REFS 4k horror story

Post by thomas.raabo »

In order to apply the hotfix please use “Microsoft” password to unzip the folder.

Afterwards please rename the original "refs.sys" in C:\Windows\System32\drivers, for example to refs.sys_original, copy the contents of the archive into the same folder.

After copying the new refs.sys please execute this command:
bcdedit /set testsigning

Then please create the following registry keys on the server in question:

- RefsDisableCachedPins (DWORD) = 1
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem

- RefsProcessedDeleteQueueEntryCountThreshold (DWORD) = 2048 in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem
First of all let's give it a try with 2048, then we will change it to 1024 and 512 after if needed.

Also, let's increase the following timeout:

TimeOutValue (DWORD)
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Disk
Please set it to 120 (decimal value).

Please note that these changes will require the server restart.
andy51585
Novice
Posts: 8
Liked: never
Joined: Jul 28, 2014 9:13 pm
Full Name: Andrew
Contact:

Re: REFS 4k horror story

Post by andy51585 »

People who are using the new refs driver released in the latest cumulative update. Are you still utilizing the registry tweaks originally proposed?

The version of my installed refs.sys is 10.0.14393.1198 and still having issues with near daily server lockups with 4k. Unfortunately I don't have enough space at this time to move things around so I have to make this work somehow.

I'm currently using
RefsEnableInlineTrim =1
RefsEnableLargeWorkingSetTrim = 1
RefsNumberOfChunksToTrim = 128 (decimal)

On the Veeam Server, I have RefsVirtualSyntheticDisabled set to 1. I don't even care about the block clone stuff at this time. I just need backups that reliably run for a couple more months until one of our old NAS boxes has enough space on it to move stuff around and reformat w 64k.

I've even gone to running weekly active fulls instead of synthetics as a test, and still have the issues.

The server hosting the iscsi repository has 40GB of RAM and 4 vcpu.

Any advice would be great.
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: REFS 4k horror story

Post by tsightler »

andy51585 wrote:The version of my installed refs.sys is 10.0.14393.1198 and still having issues with near daily server lockups with 4k. Unfortunately I don't have enough space at this time to move things around so I have to make this work somehow.
I don't think I've seen anyone in this thread have success with 4K, regardless of hotfixes or anything else. The thread title is probably a misnomer at this point because most of the discussions for the last month or two has been around people that were still having various issues even after moving to 64K. I'm sure at least some of the issues are the same, but the problem with 4K is that it needs a lot more memory to survive.
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: REFS 4k horror story

Post by lepphce1 »

I'm still on 4K and memory has never been an issue. Unfortunately I don't have the storage to reformat to 64K. But my server does crash for days, with low resource utilization, when I do a mass delete. Seems there are a variety of symptoms and I wouldn't be surprised if there are multiple ReFS issues that are being addressed here.

It's really cool technology and I hope they work it out at some point, but right now I'm actively shopping for disk shelves so I can end this nightmare.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

@lepphche1 hello I've been doing backup copies to another source in preparation to use the new refs.sys experimental driver. I've had great success with a USB 3.0 Western Digital DUO 16 TB. See screengrab below, those are real results!
Image

Also I ordered a QNAP 4bat NAS. Its the TVS-471 w/ 4 10TB drives. Going to benchmark that guy too .

Depending on how much data you have those are great cheap solutions to tide you over. Hope this info helps you out! :D
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: REFS 4k horror story

Post by lepphce1 »

@kubimike, thanks for the suggestions. I'm struggling with my backup copies and GFS retention, so those solutions are still a bit small for my situation... I'm looking at a Dell MD1200 with 12 10TB disks-- it should keep me going for a few years, and after discounts I'm hoping to pay less than $10,000.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

I see, yeah I have a lot more data but Im only doing backup copies of critical data. If my veeam box blows up again at least I have my SQL data and exchange. It just so happens I can fit those two VMs on the 16TB box. And I just ordered a new badass HP LTO7 MSL2024 ! :twisted:
Cicadymn
Enthusiast
Posts: 26
Liked: 12 times
Joined: Jan 30, 2017 7:42 pm
Full Name: Sam
Contact:

Re: REFS 4k horror story

Post by Cicadymn »

tsightler wrote: I don't think I've seen anyone in this thread have success with 4K, regardless of hotfixes or anything else. The thread title is probably a misnomer at this point because most of the discussions for the last month or two has been around people that were still having various issues even after moving to 64K. I'm sure at least some of the issues are the same, but the problem with 4K is that it needs a lot more memory to survive.
Well good/bad news then. I'm actually still running 4K. I've been put in a situation where I don't have enough disks to migrate my data off, and at the same time, have too much data to just say eff it and start over! So I'm stuck!

The good news is I'm actually completely free of crashes or lockups and have been running for a couple months now without issue. But there are two caveats:

1. I cannot run synthetic full backups in my backup jobs. This causes the backup VM to lock up via RAM usage.
2. I cannot run Backup copy jobs that are over 2TB+. They have synthetic full component that I can't disable. This causes the backup copy VM to lock out via CPU usage.

I'm running the experimental refs.sys on my backup copy vm. However it doesn't appear to have any effect. I'm happy to pull any logs for Veeam or Microsoft :)
alesovodvojce
Enthusiast
Posts: 61
Liked: 9 times
Joined: Nov 29, 2016 10:09 pm
Contact:

Re: REFS 4k horror story

Post by alesovodvojce »

System hang today after 7 days of running with experimental refs.sys driver.
- Refs 4k used
- experimental driver with testsigning on (and everything else from checklist Veeam provided to that driver)

We also put system in a little bit memory stress, giving the VM 12GB of RAM. Which is more than sufficient under normal circumstances,
but Refs 4K seems to tend more easily to crashes in low memory configurations. And we wanted to know if the problem is solved. It's not.

Will reboot and try again...
dellock6
Veeam Software
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: REFS 4k horror story

Post by dellock6 »

I really feel like the issue, for now, will not be really solved until you all move to 64k block size. To keep running the 4k configuration is the easiest way to have issues. If you don't have spare space to evacuate and format, you can rent storage servers for the needed time, I've seen several customers solving this problem in this way.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
kb1ibt
Influencer
Posts: 14
Liked: never
Joined: Apr 24, 2015 1:40 pm
Contact:

Re: REFS 4k horror story

Post by kb1ibt »

dellock6 wrote:If you don't have spare space to evacuate and format, you can rent storage servers for the needed time, I've seen several customers solving this problem in this way.
You seem to be forgetting that when you move the files from one volume to another you lose the block clone savings, so in my case I would be losing 50TB due to the move. (20TB used on disk vs 70TB files size)
lohelle
Service Provider
Posts: 77
Liked: 15 times
Joined: Jun 03, 2009 7:45 am
Full Name: Lars O Helle
Contact:

Re: REFS 4k horror story

Post by lohelle » 2 people like this post

I think a great option would be a "backup-copy-job repository-copy-tool". Maybe with a nice GUI for selecting restore points to move/copy. Then it would be like a backup copy job that could use a BCJ-repository as the source. But it should copy ALL (selected) restore points, not only the latest. And of course it needs to be able to use the REFS features. :)
alesovodvojce
Enthusiast
Posts: 61
Liked: 9 times
Joined: Nov 29, 2016 10:09 pm
Contact:

Re: REFS 4k horror story

Post by alesovodvojce » 1 person likes this post

Moved to ReFS 64k and hang 3 days ago. VM had intentionally lower memory - 12 GB - to see, if the ReFS driver issue was fixed for 64k. Nope, just less common. Same symptoms of our hang (and same metafile RAM greediness).

Now we doubled the RAM and waiting. Hopefully it will be ok.
kb1ibt
Influencer
Posts: 14
Liked: never
Joined: Apr 24, 2015 1:40 pm
Contact:

Re: REFS 4k horror story

Post by kb1ibt »

Which type of hang? 100% CPU or something else?
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

thomas.raabo wrote:In order to apply the hotfix please use “Microsoft” password to unzip the folder.

Afterwards please rename the original "refs.sys" in C:\Windows\System32\drivers, for example to refs.sys_original, copy the contents of the archive into the same folder.

After copying the new refs.sys please execute this command:
bcdedit /set testsigning

Then please create the following registry keys on the server in question:

- RefsDisableCachedPins (DWORD) = 1
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem

- RefsProcessedDeleteQueueEntryCountThreshold (DWORD) = 2048 in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem
First of all let's give it a try with 2048, then we will change it to 1024 and 512 after if needed.

Also, let's increase the following timeout:

TimeOutValue (DWORD)
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Disk
Please set it to 120 (decimal value).

Please note that these changes will require the server restart.
Id like to point out these instructions are a bit misleading.
1st set the bcdedit command in windows before rebooting. can't do it in troubleshooting mode.
2nd for those that say the test refs driver from microsoft isnt working did you pay attention to the fact that some of these keys are set in decimal ? If you see above Thomas forgot to mention 'RefsProcessedDeleteQueueEntryCountThreshold' is decimal. When creating the new key regedit does not have the decimal radio button selected by default.
3rd Was anyone suggested to turn off ODX in windows ?

I have the driver now loaded. Fingers crossed I can delete files now! 8)
kb1ibt
Influencer
Posts: 14
Liked: never
Joined: Apr 24, 2015 1:40 pm
Contact:

Re: REFS 4k horror story

Post by kb1ibt »

kubimike wrote: Id like to point out these instructions are a bit misleading.
1st set the bcdedit command in windows before rebooting. can't do it in troubleshooting mode.
2nd for those that say the test refs driver from microsoft isnt working did you pay attention to the fact that some of these keys are set in decimal ? If you see above Thomas forgot to mention 'RefsProcessedDeleteQueueEntryCountThreshold' is decimal. When creating the new key regedit does not have the decimal radio button selected by default.
3rd Was anyone suggested to turn off ODX in windows ?

I have the driver now loaded. Fingers crossed I can delete files now! 8)
I still have the problem on 2 repos:
1) Without this set when you reboot the system won’t load refs.sys and it will show as RAW
2) My instructions included telling me to use decimal
3) Unless they changed their requirements “Files must be on a volume formatted using NTFS. ReFS and FAT are not supported.“ So ODX isn’t even something to worry about in this case.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

yea thats why I didnt disable ODX. Left it on, what about 'RefsDisableCachedPins' I set that to '1' Also do you have any of the original keys set from Mircrosofts public fix? I don't, just 'RefsProcessedDeleteQueueEntryCountThreshold' + 'RefsDisableCachedPins' + 'TimeoutValue'
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

GUYS!
I have some happy news to report! The test refs.sys from Microsoft WORKS. After my job ran it hit a retention period. This was a job I've always dreaded was going to run me out of space. Anyhow, it trimmed a 5TB VBK and so far (fingers crossed) the OS is still stable and running. Normally it would take about 2mins and it would completely freeze. This is so exciting I can't tell you how awesome this is to have it JUST WORK. :mrgreen: :mrgreen: :mrgreen: :mrgreen: :mrgreen:

Now I realize from chatting with Tom its not necessarily deleting a 5TB file but updating references. However when reaching this stage the machine would choke. Thanks again Veeam thanks again Microsoft what a win for us. Lets just HOPE it keeps working! :twisted:
Cicadymn
Enthusiast
Posts: 26
Liked: 12 times
Joined: Jan 30, 2017 7:42 pm
Full Name: Sam
Contact:

Re: REFS 4k horror story

Post by Cicadymn »

kubimike wrote:GUYS!
I have some happy news to report! The test refs.sys from Microsoft WORKS. After my job ran it hit a retention period. This was a job I've always dreaded was going to run me out of space. Anyhow, it trimmed a 5TB VBK and so far (fingers crossed) the OS is still stable and running. Normally it would take about 2mins and it would completely freeze. This is so exciting I can't tell you how awesome this is to have it JUST WORK. :mrgreen: :mrgreen: :mrgreen: :mrgreen: :mrgreen:

Now I realize from chatting with Tom its not necessarily deleting a 5TB file but updating references. However when reaching this stage the machine would choke. Thanks again Veeam thanks again Microsoft what a win for us. Lets just HOPE it keeps working! :twisted:
Is this the same 14393.1100 refs.sys that they've been handing out? Or is there a new experimental refs driver?
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

'14393.1100' is the one I was given, I assume you're still having issues? Did you double check that the registry values are in decimal ? IIRC only needed when the value is > 9. I posted about what registry keys Im using above.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

I wonder two things:

- Will the slow merge issue after something was deleted be solved
- When will this update be released?
Cicadymn
Enthusiast
Posts: 26
Liked: 12 times
Joined: Jan 30, 2017 7:42 pm
Full Name: Sam
Contact:

Re: REFS 4k horror story

Post by Cicadymn »

kubimike wrote:'14393.1100' is the one I was given, I assume you're still having issues? Did you double check that the registry values are in decimal ? IIRC only needed when the value is > 9. I posted about what registry keys Im using above.
Yeah, I can confirm that I'm using decimal. I'm having my CPU lock me out now instead of RAM. Haven't heard back in a while (as they passed it to Microsoft from what I understand).

I wonder if I lower RefsProcessedDeleteQueueEntryCountThreshold from 2048 to 1024 if it will help? They mentioned lowering it, but never said to do that when I reported in it wasn't working. I wonder if going up or down would help? Maybe it doesn't matter for this particular issue.
Locked

Who is online

Users browsing this forum: Bing [Bot], chad.aiken, ludsantos, Max93, TonioRoffo and 280 guests