REFS 4k horror story

Availability for the Always-On Enterprise

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Fri Jun 23, 2017 2:03 pm

tsightler wrote:I don't think I've seen anyone in this thread have success with 4K, regardless of hotfixes or anything else. The thread title is probably a misnomer at this point because most of the discussions for the last month or two has been around people that were still having various issues even after moving to 64K. I'm sure at least some of the issues are the same, but the problem with 4K is that it needs a lot more memory to survive.


Well good/bad news then. I'm actually still running 4K. I've been put in a situation where I don't have enough disks to migrate my data off, and at the same time, have too much data to just say eff it and start over! So I'm stuck!

The good news is I'm actually completely free of crashes or lockups and have been running for a couple months now without issue. But there are two caveats:

1. I cannot run synthetic full backups in my backup jobs. This causes the backup VM to lock up via RAM usage.
2. I cannot run Backup copy jobs that are over 2TB+. They have synthetic full component that I can't disable. This causes the backup copy VM to lock out via CPU usage.

I'm running the experimental refs.sys on my backup copy vm. However it doesn't appear to have any effect. I'm happy to pull any logs for Veeam or Microsoft :)
Cicadymn
Influencer
 
Posts: 19
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS 4k horror story

Veeam Logoby alesovodvojce » Fri Jun 23, 2017 2:49 pm

System hang today after 7 days of running with experimental refs.sys driver.
- Refs 4k used
- experimental driver with testsigning on (and everything else from checklist Veeam provided to that driver)

We also put system in a little bit memory stress, giving the VM 12GB of RAM. Which is more than sufficient under normal circumstances,
but Refs 4K seems to tend more easily to crashes in low memory configurations. And we wanted to know if the problem is solved. It's not.

Will reboot and try again...
alesovodvojce
Enthusiast
 
Posts: 27
Liked: 2 times
Joined: Tue Nov 29, 2016 10:09 pm

Re: REFS 4k horror story

Veeam Logoby dellock6 » Sat Jun 24, 2017 8:47 am

I really feel like the issue, for now, will not be really solved until you all move to 64k block size. To keep running the 4k configuration is the easiest way to have issues. If you don't have spare space to evacuate and format, you can rent storage servers for the needed time, I've seen several customers solving this problem in this way.
Luca Dell'Oca
EMEA Cloud Architect @ Veeam Software

@dellock6
http://www.virtualtothecore.com
vExpert 2011-2012-2013-2014-2015-2016
Veeam VMCE #1
dellock6
Veeam Software
 
Posts: 5055
Liked: 1335 times
Joined: Sun Jul 26, 2009 3:39 pm
Location: Varese, Italy
Full Name: Luca Dell'Oca

Re: REFS 4k horror story

Veeam Logoby kb1ibt » Fri Jun 30, 2017 5:16 pm

dellock6 wrote:If you don't have spare space to evacuate and format, you can rent storage servers for the needed time, I've seen several customers solving this problem in this way.

You seem to be forgetting that when you move the files from one volume to another you lose the block clone savings, so in my case I would be losing 50TB due to the move. (20TB used on disk vs 70TB files size)
kb1ibt
Influencer
 
Posts: 12
Liked: never
Joined: Fri Apr 24, 2015 1:40 pm

Re: REFS 4k horror story

Veeam Logoby lohelle » Sat Jul 01, 2017 11:11 pm 2 people like this post

I think a great option would be a "backup-copy-job repository-copy-tool". Maybe with a nice GUI for selecting restore points to move/copy. Then it would be like a backup copy job that could use a BCJ-repository as the source. But it should copy ALL (selected) restore points, not only the latest. And of course it needs to be able to use the REFS features. :)
lohelle
Service Provider
 
Posts: 77
Liked: 14 times
Joined: Wed Jun 03, 2009 7:45 am
Full Name: Lars O Helle

Re: REFS 4k horror story

Veeam Logoby alesovodvojce » Tue Jul 04, 2017 1:55 pm 1 person likes this post

Moved to ReFS 64k and hang 3 days ago. VM had intentionally lower memory - 12 GB - to see, if the ReFS driver issue was fixed for 64k. Nope, just less common. Same symptoms of our hang (and same metafile RAM greediness).

Now we doubled the RAM and waiting. Hopefully it will be ok.
alesovodvojce
Enthusiast
 
Posts: 27
Liked: 2 times
Joined: Tue Nov 29, 2016 10:09 pm

Re: REFS 4k horror story

Veeam Logoby kb1ibt » Tue Jul 04, 2017 1:57 pm

Which type of hang? 100% CPU or something else?
kb1ibt
Influencer
 
Posts: 12
Liked: never
Joined: Fri Apr 24, 2015 1:40 pm

Re: REFS 4k horror story

Veeam Logoby kubimike » Tue Jul 04, 2017 3:56 pm

thomas.raabo wrote:In order to apply the hotfix please use “Microsoft” password to unzip the folder.

Afterwards please rename the original "refs.sys" in C:\Windows\System32\drivers, for example to refs.sys_original, copy the contents of the archive into the same folder.

After copying the new refs.sys please execute this command:
bcdedit /set testsigning

Then please create the following registry keys on the server in question:

- RefsDisableCachedPins (DWORD) = 1
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem

- RefsProcessedDeleteQueueEntryCountThreshold (DWORD) = 2048 in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem
First of all let's give it a try with 2048, then we will change it to 1024 and 512 after if needed.

Also, let's increase the following timeout:

TimeOutValue (DWORD)
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Disk
Please set it to 120 (decimal value).

Please note that these changes will require the server restart.


Id like to point out these instructions are a bit misleading.
1st set the bcdedit command in windows before rebooting. can't do it in troubleshooting mode.
2nd for those that say the test refs driver from microsoft isnt working did you pay attention to the fact that some of these keys are set in decimal ? If you see above Thomas forgot to mention 'RefsProcessedDeleteQueueEntryCountThreshold' is decimal. When creating the new key regedit does not have the decimal radio button selected by default.
3rd Was anyone suggested to turn off ODX in windows ?

I have the driver now loaded. Fingers crossed I can delete files now! 8)
kubimike
Expert
 
Posts: 236
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby kb1ibt » Tue Jul 04, 2017 4:14 pm

kubimike wrote:Id like to point out these instructions are a bit misleading.
1st set the bcdedit command in windows before rebooting. can't do it in troubleshooting mode.
2nd for those that say the test refs driver from microsoft isnt working did you pay attention to the fact that some of these keys are set in decimal ? If you see above Thomas forgot to mention 'RefsProcessedDeleteQueueEntryCountThreshold' is decimal. When creating the new key regedit does not have the decimal radio button selected by default.
3rd Was anyone suggested to turn off ODX in windows ?

I have the driver now loaded. Fingers crossed I can delete files now! 8)

I still have the problem on 2 repos:
1) Without this set when you reboot the system won’t load refs.sys and it will show as RAW
2) My instructions included telling me to use decimal
3) Unless they changed their requirements “Files must be on a volume formatted using NTFS. ReFS and FAT are not supported.“ So ODX isn’t even something to worry about in this case.
kb1ibt
Influencer
 
Posts: 12
Liked: never
Joined: Fri Apr 24, 2015 1:40 pm

Re: REFS 4k horror story

Veeam Logoby kubimike » Tue Jul 04, 2017 4:34 pm

yea thats why I didnt disable ODX. Left it on, what about 'RefsDisableCachedPins' I set that to '1' Also do you have any of the original keys set from Mircrosofts public fix? I don't, just 'RefsProcessedDeleteQueueEntryCountThreshold' + 'RefsDisableCachedPins' + 'TimeoutValue'
kubimike
Expert
 
Posts: 236
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby kubimike » Thu Jul 06, 2017 12:43 am

GUYS!
I have some happy news to report! The test refs.sys from Microsoft WORKS. After my job ran it hit a retention period. This was a job I've always dreaded was going to run me out of space. Anyhow, it trimmed a 5TB VBK and so far (fingers crossed) the OS is still stable and running. Normally it would take about 2mins and it would completely freeze. This is so exciting I can't tell you how awesome this is to have it JUST WORK. :mrgreen: :mrgreen: :mrgreen: :mrgreen: :mrgreen:

Now I realize from chatting with Tom its not necessarily deleting a 5TB file but updating references. However when reaching this stage the machine would choke. Thanks again Veeam thanks again Microsoft what a win for us. Lets just HOPE it keeps working! :twisted:
kubimike
Expert
 
Posts: 236
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Thu Jul 06, 2017 4:15 pm

kubimike wrote:GUYS!
I have some happy news to report! The test refs.sys from Microsoft WORKS. After my job ran it hit a retention period. This was a job I've always dreaded was going to run me out of space. Anyhow, it trimmed a 5TB VBK and so far (fingers crossed) the OS is still stable and running. Normally it would take about 2mins and it would completely freeze. This is so exciting I can't tell you how awesome this is to have it JUST WORK. :mrgreen: :mrgreen: :mrgreen: :mrgreen: :mrgreen:

Now I realize from chatting with Tom its not necessarily deleting a 5TB file but updating references. However when reaching this stage the machine would choke. Thanks again Veeam thanks again Microsoft what a win for us. Lets just HOPE it keeps working! :twisted:


Is this the same 14393.1100 refs.sys that they've been handing out? Or is there a new experimental refs driver?
Cicadymn
Influencer
 
Posts: 19
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS 4k horror story

Veeam Logoby kubimike » Thu Jul 06, 2017 4:59 pm

'14393.1100' is the one I was given, I assume you're still having issues? Did you double check that the registry values are in decimal ? IIRC only needed when the value is > 9. I posted about what registry keys Im using above.
kubimike
Expert
 
Posts: 236
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby mkretzer » Thu Jul 06, 2017 6:45 pm

I wonder two things:

- Will the slow merge issue after something was deleted be solved
- When will this update be released?
mkretzer
Expert
 
Posts: 310
Liked: 70 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Thu Jul 06, 2017 7:29 pm

kubimike wrote:'14393.1100' is the one I was given, I assume you're still having issues? Did you double check that the registry values are in decimal ? IIRC only needed when the value is > 9. I posted about what registry keys Im using above.

Yeah, I can confirm that I'm using decimal. I'm having my CPU lock me out now instead of RAM. Haven't heard back in a while (as they passed it to Microsoft from what I understand).

I wonder if I lower RefsProcessedDeleteQueueEntryCountThreshold from 2048 to 1024 if it will help? They mentioned lowering it, but never said to do that when I reported in it wasn't working. I wonder if going up or down would help? Maybe it doesn't matter for this particular issue.
Cicadymn
Influencer
 
Posts: 19
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Yahoo [Bot] and 30 guests