Comprehensive data protection for all workloads
Locked
mkretzer
Veeam Legend
Posts: 1203
Liked: 417 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

Hello,

is there a new REFS driver in 2017-08? From what i see the last REFS patches were in 2017-07.

@JimmyO: We are in EXACTLY the same situation now. We tried for 7 months, we had hope with the 2017-07 patches but this morning backup and the whole filesystem hang again for many minutes and we did not even have many backups files which were deleted and lead to much GC.

Markus
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

It appears KB4034661 is still sitting at version 'Refs.sys","10.0.14393.1532' . I wasn't stable on that version. I am STILL running the beta and holding strong. :roll:
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

@delllock I didnt think anyone would do any less then double parity. May be a ding in disk space but surely helps protect data. Thats why I opted for a fast controller and RAID 6+0
mikegodwin
Enthusiast
Posts: 54
Liked: 1 time
Joined: Oct 12, 2012 12:28 am
Full Name: Mike Godwin
Contact:

Re: REFS 4k horror story

Post by mikegodwin »

EricJ wrote:We still had frequent lockups after applying RefsEnableLargeWorkingSetTrim. I bumped the RAM from 16GB to 20GB, and also set the key RefsNumberOfChunksToTrim to "32" (decimal). Since just those two changes, we have been stable for over two months now.

Here is an animation of RAMMap after the RAM bump and second registry key during two big synthetic full fast clones. You can see the Metafile active usage levels off after 5GB. It would be interesting to see what RamMap shows for you during a fast clone operation that causes a lockup.
Does this mean we only care about Active Metafile size? I see Standby Metafile size continues to grow and stays high even after the job completes.
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike » 1 person likes this post

Did anyone see that the registry keys for the beta refs driver were adapted to the public release ?? I didn't try the latest public driver with these keys

https://support.microsoft.com/en-us/hel ... erver-2016

can someone confirm they've tried these keys with the public release ?

I'd also like to state that the URL listed above doesn't say to set 'RefsProcessedDeleteQueueEntryCountThreshold' in Decimal value. I already let Microsoft know.
DaveWatkins
Veteran
Posts: 370
Liked: 97 times
Joined: Dec 13, 2015 11:33 pm
Contact:

Re: REFS 4k horror story

Post by DaveWatkins »

I wonder if Veeam will add something equivalent for Option 3, of it's something that will be rolled into an update
mkretzer
Veeam Legend
Posts: 1203
Liked: 417 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

jronnblom
Influencer
Posts: 17
Liked: 2 times
Joined: Oct 23, 2013 6:15 am
Full Name: Janåke Rönnblom
Contact:

Re: REFS 4k horror story

Post by jronnblom »

JimmyO wrote: I managed to get merge times to almost the same as when using NTFS, but disk load during backup is way higher with ReFS compared to NTFS. Disk queue lenght during backup went from 0,05 to 1+.
Is that with the beta driver? And in that case how do we get hold of it?

-J
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS 4k horror story

Post by JimmyO »

If I remember it correctly, it was with the 2017-07 Cumulative update. With the latest 2017-08 update it got worse.
mark.planger
Influencer
Posts: 14
Liked: never
Joined: Jan 02, 2013 4:50 pm
Full Name: Mark Planger
Contact:

Re: REFS 4k horror story

Post by mark.planger »

Going to take another run at using ReFS with Storage Spaces. Anyone have comments on registry setting tweaks that worked?
Also, I'd like to try the refs beta driver but can't find out how to get a hold of it. Could someone point me to a location or even send it to me?

Thanks.
DaveWatkins
Veteran
Posts: 370
Liked: 97 times
Joined: Dec 13, 2015 11:33 pm
Contact:

Re: REFS 4k horror story

Post by DaveWatkins »

The beta driver was included in the August (maybe July) patchset as far as I know.

Registry info here

https://support.microsoft.com/en-us/hel ... erver-2016

I'm using option 1 and 2 with success. Deletions no longer lock up the volume. The only remaining issue I've found is merges slow down over time but even then mine are still faster than they were on NTFS
Iain_Green
Service Provider
Posts: 158
Liked: 9 times
Joined: Dec 05, 2014 2:13 pm
Full Name: Iain Green
Contact:

Re: REFS 4k horror story

Post by Iain_Green »

Apologies if this has already been answered but 39 pages so I may have missed it!

From what I can tell to clear the issues I am seeing is delayed merges and overall perform of my ReFS repo I need to install the following

https://support.microsoft.com/en-us/hel ... windows-10

And enable all 3 REG options, then install

https://support.microsoft.com/en-us/hel ... erver-2016

and use option 1 and 2, is that correct?
Many thanks

Iain Green
DaveWatkins
Veteran
Posts: 370
Liked: 97 times
Joined: Dec 13, 2015 11:33 pm
Contact:

Re: REFS 4k horror story

Post by DaveWatkins »

The idea with both sets of registry changes were to work through them from top to bottom to resolve the issues presented. In my case that was the first option only in the first patch and 1 & 2 in the second
Iain_Green
Service Provider
Posts: 158
Liked: 9 times
Joined: Dec 05, 2014 2:13 pm
Full Name: Iain Green
Contact:

Re: REFS 4k horror story

Post by Iain_Green »

hmm this is where the confussion lies, we have the first update installed and option one enabled. This was based on reading up of ReFs.
Now we have run in to some perfomance issues with ReFS merges and when I read the info on the second update it states:
: Before you follow these steps, make sure that you have read and implemented the three registry parameters as described in KB article 4016173.
Which would imply I require all 3 reg edit from the first enabled?
Many thanks

Iain Green
DaveWatkins
Veteran
Posts: 370
Liked: 97 times
Joined: Dec 13, 2015 11:33 pm
Contact:

Re: REFS 4k horror story

Post by DaveWatkins »

Oh indeed, I hadn't noticed that when I applied it, of course knowing the way MS updates it's KB's that may not have been there when I applied it.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: REFS 4k horror story

Post by tsightler » 3 people like this post

IMPORTANT NOTE

The information in the post below is from 2017 and is now obsolete. Microsoft has made many improvements to ReFS over the years and the settings below are no longer relevant for the vast majority of environments. If you have performance issues with ReFS you should open a support case for deeper investigation.
______________

I've been doing a significant amount of testing with all six ReFS keys in the two documents and so far my recommendation is the set the following:

HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsNumberOfChunksToTrim = 32
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsDisableCachedPins = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsProcessedDeleteQueueEntryCountThreshold = 512


These settings seem to significantly reduce memory consumption in some cases, and helps to address latency issues that occur, for example, during deletes of large numbers of files, while overall having no significant negative impact on performance.

The other ReFS filesystem setting is as follows:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableInlineTrim = 1

The problem I've seen with this setting is that it seems to have a pretty negative impact on performance, especially with very fast disks. I guess this makes sense since the KB specifically warns that this is likely to be the case with SSD/NVMe, but I noticed the decrease even on just very fast, large RAID sets. I suppose it's still worth trying if, even with the settings above, you still have memory pressure, but in my testing it's not been required as long as the memory on the repository is reasonably sized.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\Configuration\DiskStorage\DuplicateExtentBatchSizeinMB = 100

It seems obvious that this key is specific to DPM and has no impact on Veeam. Veeam uses a different pattern of submissions for duplicate extent requests so should not require any tweaks here.

The only other key mentioned in these KB articles is the following:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DiskTimeOutValue = 0x78

This changes the Windows timeout for a non-responsive disks from 65 seconds to 120 seconds. I guess if you're getting disk timeout messages in the event log causing ReFS writes to fail then perhaps this could be useful somehow, but honestly if you're having 65 second disk timeout, that feels like a bigger problem. Still, this will have no impact on memory or performance so I guess it doesn't hurt to set it if you want.

Note that these are based on my own lab testing, and some customers I have worked with, and is not a final recommendation from Veeam QA, however, I would appreciate it if people try these settings and let us know your results. Note you should be on at least the July 18th, 2017 cumulative update to use these settings.
DaveWatkins
Veteran
Posts: 370
Liked: 97 times
Joined: Dec 13, 2015 11:33 pm
Contact:

Re: REFS 4k horror story

Post by DaveWatkins »

tsightler wrote:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\Configuration\DiskStorage\DuplicateExtentBatchSizeinMB = 100

I haven't really found any negatives to setting this, it just doesn't seem to impact Veeam at all. Perhaps my test cases just don't trigger it, but it seems Veeam submits duplicate extent requests differently than DPM. However, if you are seeing very high I/O latency while jobs are performing merge/synthetic, then perhaps this is worth trying, I just don't have any testing results to back this up.
I'd always assumed this key was specific to DPM and not a ReFS key at all. What I mean is The ReFS driver doesn't use or read that key at all, it's something DPM reads and changes the way it writes data to ReFS volumes only
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: REFS 4k horror story

Post by tsightler »

DaveWatkins wrote:I'd always assumed this key was specific to DPM and not a ReFS key at all. What I mean is The ReFS driver doesn't use or read that key at all, it's something DPM reads and changes the way it writes data to ReFS volumes only
Agree completely, I somehow didn't pay enough attention to the path of that key! Edited post accordingly. :)
JVA@Alsic
Novice
Posts: 5
Liked: never
Joined: Dec 29, 2014 10:00 am
Full Name: Jeroen Van Acker
Contact:

Re: REFS 4k horror story

Post by JVA@Alsic »

Hi everyone,

We have seen a lot of problems with the ReFS integration and Veeam.
Almost every aspect of it has already been brought up here by some other users.
Our implementation consists of 3 backup repositories with ReFS, all with 64K blocksize.
The repositories are on a Dell MD3460 (40 disks)
Performance at first was extremely fast but degraded with time.

We had the memory issue in which the server consumed ALL of the memory and just simply crashed.
We fixed it by setting HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim to 1
After that we adjusted the concurrent tasks from 8 to 5 on a 32 GB server.

The "Merging backup files" action in Backup Copy Jobs has degraded to an extremely slow rate and takes 144 hours for a VBK file of 36 TB in size.

After some experimenting we have resolved this issue by setting the "Turn off Windows write-cache buffer" parameter.
Image

Performance has increased from 144 hours to about 8 hours.
Maybe this setting can help you out, it can be set on-the-fly if your storage supports it, without a need to reboot.

Any feedback is welcome !
Iain_Green
Service Provider
Posts: 158
Liked: 9 times
Joined: Dec 05, 2014 2:13 pm
Full Name: Iain Green
Contact:

Re: REFS 4k horror story

Post by Iain_Green »

tsightler wrote:I've been doing a significant amount of testing with all six ReFS keys in the two documents and so far my recommendation is the set the following:

HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsNumberOfChunksToTrim = 32
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsDisableCachedPins = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsProcessedDeleteQueueEntryCountThreshold = 512


These settings seem to significantly reduce memory consumption in some cases, and helps to address latency issues that occur, for example, during deletes of large numbers of files, while overall having no significant negative impact on performance.

Hi,

seeing no improvement having implemented both patches and the following edits:

HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsNumberOfChunksToTrim = 32
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim\RefsDisableCachedPins = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsProcessedDeleteQueueEntryCountThreshold = 512

Running a DL380 Gen 8 (Server 2016) with a D6020 direct attached in RAID 60 with ReFS.

Still seeing back log of jobs with Merge times in the 100+ hours.
Many thanks

Iain Green
ferrus
Veeam ProPartner
Posts: 300
Liked: 44 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: REFS 4k horror story

Post by ferrus »

I'm about 10 days away from a new Veeam build, to migrate our existing backups over.

The server specs are 4x Cisco C240 M4, 20 CPU cores, 96GB RAM and 43TB RAID6 DAS.
Windows 2016 Std will be installed - and I'd really like to use ReFS. Probably with SOBR, possibly using Storage Spaces.

Quick sweep of opinions - is ReFS still a 'horror story', or is it manageable for everyday use?

I've read experiences for and against, on this thread and others - but not quite sure what the status quo is.
Our current 2012/NTFS install doesn't come anywhere near to the 96GB memory limit, that's mentioned as an issue with ReFS.
mkretzer
Veeam Legend
Posts: 1203
Liked: 417 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

@ferrus: For us REFS is dead.
With 96 GB RAM it is quite likely that you will run into issues withour registry tuning. And even if you get that solved REFS can get very slow after a few weeks...
ferrus
Veeam ProPartner
Posts: 300
Liked: 44 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: REFS 4k horror story

Post by ferrus »

@mkretzer

Thanks for the reply
Does it slow down to below NTFS performance levels?
That's one of the issues for us. We have one VM >18TB, which takes several hours to merge - reducing the window for other jobs.
mkretzer
Veeam Legend
Posts: 1203
Liked: 417 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

@ferrus
After 5 weeks of weekly synthetics (when it starts deleting) definately. Our main problem is not with the merge speed per se but with the fact that if it slows down other processes (normal incrementals for example) basically stop/hang altogether for a long time.
We had a situation where we resized a 2,3 TB SQL server (which is not that big for us) which lead to the entire disk beeing re-read (i know that can be disabled). At the time there was a bigger merge going but our backend storage could handle the IO quite well. Still, instead of 1 hour the SQL backup took 7 hours which then lead to very big snapshots... All the time the backend showed low - medium load. REFS just did not allow the avaiable ressources to be used. The backend normally is capable to push 500 - 700 MB/s but in that situation backup speed was down to 2-3 MB/s for a long time.

We NEVER had/have this with NTFS on the same storage system.
suprnova
Enthusiast
Posts: 38
Liked: never
Joined: Apr 08, 2016 5:15 pm
Contact:

Re: REFS 4k horror story

Post by suprnova »

Exactly the same issue here. And I also agree to avoid ReFS repositories.
antipolis
Enthusiast
Posts: 73
Liked: 9 times
Joined: Oct 26, 2016 9:17 am
Contact:

Re: REFS 4k horror story

Post by antipolis »

while some synthetics are sometimes taking longer than usual (like ~8-10 hours instead of 2-3 hours) on a couple jobs (5 TB & 9 TB) everything is working fine here; got 64 GB ram on the veeam server... 64k cluster size and no reg hacks...
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS 4k horror story

Post by JimmyO »

In the exact same sitution as suprnova and mkretzer. Have given up ReFS and migrated back to NTFS...
Gve
Service Provider
Posts: 33
Liked: 2 times
Joined: Apr 28, 2015 3:28 pm
Full Name: Guillaume
Location: France
Contact:

Re: REFS 4k horror story

Post by Gve » 1 person likes this post

mkretzer wrote:REFS just did not allow the avaiable ressources to be used. The backend normally is capable to push 500 - 700 MB/s but in that situation backup speed was down to 2-3 MB/s for a long time.

We NEVER had/have this with NTFS on the same storage system.
Hi
Same issue on HPE Appolo 4200. One formated with 4k block size and antoher formated with 64k block size
I have the proof that the problem is not related to block size. It's only related to ReFS filesystem an usage made to ReFs by veeam (block clonning)

The symptoms is :
No disk queue
No latency
Very very Low IOPS on disk
disk become unresponsive. (unable to browse, unable to collect disk counter, ect...)

I do not understand that veeam does not warn its customers, especially cloud providers, it's not serious.
JVA@Alsic
Novice
Posts: 5
Liked: never
Joined: Dec 29, 2014 10:00 am
Full Name: Jeroen Van Acker
Contact:

Re: REFS 4k horror story

Post by JVA@Alsic »

Did you try the "Turn off Windows write-cache buffer flushing" option?
Gve
Service Provider
Posts: 33
Liked: 2 times
Joined: Apr 28, 2015 3:28 pm
Full Name: Guillaume
Location: France
Contact:

Re: REFS 4k horror story

Post by Gve »

This option is not available on HP logical Disk.
:(
Locked

Who is online

Users browsing this forum: No registered users and 52 guests