Comprehensive data protection for all workloads
Locked
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

So when veeam is doing a job at what point does it cause the CPU to max out ? Retention ? Merge ? Synthetic Full ?
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev »

I will guess it is Retention.
Cicadymn
Enthusiast
Posts: 26
Liked: 12 times
Joined: Jan 30, 2017 7:42 pm
Full Name: Sam
Contact:

Re: REFS 4k horror story

Post by Cicadymn »

kubimike wrote:So when veeam is doing a job at what point does it cause the CPU to max out ? Retention ? Merge ? Synthetic Full ?
CPU Maxes out when a 3TB+ backup copy job tries to run a synthetic full. I don't have any issues on backup jobs because I've disabled synthetic fulls. And it appears backup copy jobs under 3TB in size don't cause it to lock up. But I've got a 5TB+ file server that has a synthetic full that causes the server to crash any time I try to run it.

There's not a way to disable the synthetic full side of a backup copy job so I'm stuck with just a few file servers that I can't run backup copy jobs on.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

Interesting. How much ram ? What's your disk subsystem look like ? I'm on 10k disks and a very fast controller. 192 gigs of ram. Tell me all the registry keys you've set
kb1ibt
Influencer
Posts: 14
Liked: never
Joined: Apr 24, 2015 1:40 pm
Contact:

Re: REFS 4k horror story

Post by kb1ibt »

Well today I had 2 lockups. The first at its normal time, during a GFS merge (after deletes). The second was more interesting, I had forgotten to disable a different job on the repo, which then started up after reboot (and before the ReFS delete cleanup had finished) so when the job was starting to Initialize Storage for copying the VMs as part of the backup copy it caused another lockup.
lowlander
Service Provider
Posts: 450
Liked: 30 times
Joined: Dec 28, 2014 11:48 am
Location: The Netherlands
Contact:

Re: REFS 4k horror story

Post by lowlander »

Hi,

is refs.sys 10.0.14393.1198 the latest driver, or should we use a driver that is available by MS support ?

Are the following figures expected processing times or is there space for improvement by changing registry keys ?
Exchange - 12TB full - 120GB inc - fastclone full backup file merge time 1h10m
Exchange - 12TB full - 140GB inc - fastclone full backup file merge time 1h44m
File - 16TB full - 270GB inc - fastclone full backup file merge time 1h30m
SQL - 5TB full - 25GB inc - fastclone full backup file merge time 0h08m
Oracle - 2TB full - 15GB inc - fastclone full backup file merge time 0h20m
Generic Workload - 13TB full - 113GB inc - fastclone full backup file merge time 2h30m

When a copy job is started that uses restorepoints above as source, merge time of the fastclone proces increases (sometimes doubles). We try to avoid running backup jobs and copy jobs that use the same ReFS disk in parallel.

We also have seen event messages in the system log 7023 (Data Sharing service). We have applied the recommendation on this website : http://www.neighborgeek.net/2017/05/ser ... rvice.html, however are not sure if it is related to ReFS :)

thanks !
Cicadymn
Enthusiast
Posts: 26
Liked: 12 times
Joined: Jan 30, 2017 7:42 pm
Full Name: Sam
Contact:

Re: REFS 4k horror story

Post by Cicadymn » 3 people like this post

I spoke with the ReFS team from Microsoft via Veeam Support. Explained to them my issues. (Still running 4K ReFS). My CPU maxes out on the experimental driver. Here's what they told me:
When ReFS team investigated memory issue we find that root cause for it is bug in cache manager which is part of kernel. We asked code owner for fix and in meantime implemented workaround fix in ReFS. This workaround fix solved problem for particular customer and probability for regression is smaller compared to kernel fix. This ReFS fix was backported by support team to Windows Server 2016 and released as official fix (I have check, but if you will apply latest updates ReFS code will contain it).

Kernel code owner fixed issue in cache manager. But as ReFS fix solved particular customer issue kernel fix wasn’t backported to Windows Server 2016. The backport happens when support team ask for it and proves business case. Even if I am probably able to start process it is beyond my knowledge and at end it will end with person responsible for your account (or Veeam) anyway. It will be kernel fix (new kernel binaries, not ReFS driver).

In other words – please ask your Microsoft support contact person for bug 9939237 fix to be backported to Windows Server 2016 (if you don’t have this contact ask Veeam to do it).

And then later I heard some more:
I’ve just talked with a couple folks who backport fixes to previous releases, and we’ll go ahead backporting this to RS1. If needed, I’ll follow up with you to solidify the business case of this backport, but at this time, we should have the information we need to backport this fix without any further action from you.

Backporting, unfortunately, takes time since each fix that is backported needs to be thoroughly tested for regressions. We’ll push for it to be backported quickly. Best case, this would be placed in the August patch, though I would plan for September.
So for those of us still on 4K ReFS, it looks like there may be a light at the end of the tunnel. They sound like they've got a good grasp of the problem, and even a fix in the works. We may need to play the waiting game a little bit longer, but I'm hoping we'll be able to be fully stable soon!
zuldan
Enthusiast
Posts: 45
Liked: 5 times
Joined: Feb 15, 2017 9:51 am
Contact:

Re: REFS 4k horror story

Post by zuldan » 2 people like this post

It looks like Microsoft have released the official patch now.

Code: Select all

Addressed performance issues in ReFS when backing up many terabytes of data. 
Addressed issue where a stuck thread in ReFS might cause memory corruption. 
https://support.microsoft.com/en-au/help/4025334
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

:D :) :P :mrgreen:

For those of us that are running the beta driver, how do we install this update ? Might sound like a silly question :?: :shock:
Cicadymn
Enthusiast
Posts: 26
Liked: 12 times
Joined: Jan 30, 2017 7:42 pm
Full Name: Sam
Contact:

Re: REFS 4k horror story

Post by Cicadymn »

zuldan wrote:It looks like Microsoft have released the official patch now.

Code: Select all

Addressed performance issues in ReFS when backing up many terabytes of data. 
Addressed issue where a stuck thread in ReFS might cause memory corruption. 
https://support.microsoft.com/en-au/help/4025334
After installing the update, My Refs.sys driver updated from 10.0.14393.1198 to 10.0.14393.1532 I'll run it without making enabling the settings that causes my crash for a week or two, if everything stays stable, then I'll see about enabling synthetic fulls again.
Cicadymn
Enthusiast
Posts: 26
Liked: 12 times
Joined: Jan 30, 2017 7:42 pm
Full Name: Sam
Contact:

Re: REFS 4k horror story

Post by Cicadymn »

kubimike wrote::D :) :P :mrgreen:

For those of us that are running the beta driver, how do we install this update ? Might sound like a silly question :?: :shock:
Just run the update. It automatically updates the ReFS driver. I've also disabled test mode since it's back in the normal driver lineup.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

reg key removal necessary or will those no apply anymore and thus get ignored ?
dmayer
Influencer
Posts: 18
Liked: 9 times
Joined: Apr 21, 2017 6:16 pm
Full Name: Daniel Mayer
Contact:

Re: REFS 4k horror story

Post by dmayer »

I hope this is coming to Windows 10 Creator's update as we have that deployed at two locations (us being one of them) for a cost savings over Server 2016. Funny enough my home lab has Server 2016 haha.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer » 4 people like this post

I can't believe it but in the first two tests this hotfix fixed our slow merge speed!

We did not do any deletes yet on the volume (this will be our next test), but backup merge speed got slower with every week nevertheless:

1 week after new REFS+active full: 0:08:56
2 week after new REFS+active full: 0:18:24
3 week after new REFS+active full: 0:31:59
4 week after new REFS+active full: 4:17:41

After Patch / 5 weeks after new REFS+active full: 0:09:03

And this after we just migrated 90 % of our backups back to NTFS :-(
ferrus
Veeam ProPartner
Posts: 299
Liked: 43 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: REFS 4k horror story

Post by ferrus »

I'm just about to migrate one of our Veeam repositories over to ReFS out of necessity - so this really is good news.
There's quite a few ReFS threads on here - I need to find out how many other existing issues there are to look out for.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer » 2 people like this post

@ferrus We have seen three issues:

- Crash by high memory usage: Especially with 4k, Per-VM. Fixed in our system by increasing RAM to 384 GB. Should be fixed by the update as well.
- Slow merges, every week it gets slower: Seems to be fixed by the update.
- Active full at the same time as merges are running is EXTREMLY slow (~1-2 MB/s instead of 400+ MB/s): We will test this tomorrow.
- Deletes cause REFS to get slower in general: The first test seems to suggest that this is also fixed by the update.
ferrus
Veeam ProPartner
Posts: 299
Liked: 43 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: REFS 4k horror story

Post by ferrus »

The server we're migrating is a single job/single VM repository, albeit 18TB in size. So it looks as though the only untested issue won't be a problem for us.
This gives me a lot more confidence.

I wonder if this (after testing), will affect Veeams recommendation of 64 vs 4k?
dellock6
Veeam Software
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: REFS 4k horror story

Post by dellock6 »

Any comment on the 3 registry keys that came with the test hotfix? Are they still needed with this latest update?
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS 4k horror story

Post by JimmyO »

mkretzer wrote:I can't believe it but in the first two tests this hotfix fixed our slow merge speed!

We did not do any deletes yet on the volume (this will be our next test), but backup merge speed got slower with every week nevertheless:

1 week after new REFS+active full: 0:08:56
2 week after new REFS+active full: 0:18:24
3 week after new REFS+active full: 0:31:59
4 week after new REFS+active full: 4:17:41

After Patch / 5 weeks after new REFS+active full: 0:09:03

And this after we just migrated 90 % of our backups back to NTFS :-(
Sounds good - I´m testing right now. Do you use any refs registry settings?
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

Only the most basic RefsEnableLargeWorkingSetTrim.
suprnova
Enthusiast
Posts: 38
Liked: never
Joined: Apr 08, 2016 5:15 pm
Contact:

Re: REFS 4k horror story

Post by suprnova »

Initial results with the MS update have been positive for us as well, overall incremental backup speeds have improved. I removed the hotfix registry keys and I am also just using RefsEnableLargeWorkingSetTrim.
Cicadymn
Enthusiast
Posts: 26
Liked: 12 times
Joined: Jan 30, 2017 7:42 pm
Full Name: Sam
Contact:

Re: REFS 4k horror story

Post by Cicadymn »

ferrus wrote:I'm just about to migrate one of our Veeam repositories over to ReFS out of necessity - so this really is good news.
There's quite a few ReFS threads on here - I need to find out how many other existing issues there are to look out for.
Just be sure to use 64K block size on ReFS. I believe 4K is the default, so you'll need to change it to 64K. They've had a lot less issues than those of us on 4K.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

Anyone else end up with ReFS version '10.0.14393.1532 ' ??
SBarrett847
Service Provider
Posts: 315
Liked: 41 times
Joined: Feb 02, 2016 5:02 pm
Full Name: Stephen Barrett
Contact:

Re: REFS 4k horror story

Post by SBarrett847 »

Yep that's what i have too.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

Well I guess msft typed the wrong number in the CSV file, it lists '10.0.14393.1408'
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

New patch is a flop, as soon as retention hits, say goodnight dick. Going back to the beta driver. UGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

The computer has rebooted from a bugcheck. The bugcheck was: 0x00000133 (0x0000000000000001, 0x0000000000001e00, 0xfffff801e1019540, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 0819dd2a-6290-44e4-9bf4-25363b2282d6.
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS 4k horror story

Post by JimmyO »

Same for me - the patch makes little or no difference when doing daily merge of my increments. :x
JaySt
Service Provider
Posts: 415
Liked: 75 times
Joined: Jun 09, 2015 7:08 pm
Full Name: JaySt
Contact:

Re: REFS 4k horror story

Post by JaySt »

if this patch would fix the problems seen in this thread and for which Veeam is so actively trying come up with a fix with MS , i think we would have seen some sort of announcement through Gostev maybe, something like "we're getting close to a fix.. hold on...". Would be quite special to have a patch (or THE patch) drop down from the sky like this.
Or did i miss something (possible... with this large thread).
Veeam Certified Engineer
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

@JimmyO What kind of issue did you have - Slow merges or crashes?

For us the patch seems to really fix the slowness, even when an active full is running at the same time! For the first time, the target storage shows 100 % utilisation on all drives - in the past the backend did not write much and REFS just hang there at very low write speed
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS 4k horror story

Post by JimmyO »

No crashes - only slow merges an unresponsive disks (for about 30 sec at the time).
I can see some difference in merge time. I estimate 25% increase in performance. What used to take 20 hours now takes 15 (at least after the first run).
Still - when I started to use ReFS (first week) it took 1 hour...
Locked

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 201 guests