REFS 4k horror story

Availability for the Always-On Enterprise

Re: REFS 4k horror story

Veeam Logoby kubimike » Thu Jul 06, 2017 7:48 pm

So when veeam is doing a job at what point does it cause the CPU to max out ? Retention ? Merge ? Synthetic Full ?
kubimike
Expert
 
Posts: 236
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby Gostev » Thu Jul 06, 2017 9:16 pm

I will guess it is Retention.
Gostev
Veeam Software
 
Posts: 21396
Liked: 2350 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Thu Jul 06, 2017 10:25 pm

kubimike wrote:So when veeam is doing a job at what point does it cause the CPU to max out ? Retention ? Merge ? Synthetic Full ?

CPU Maxes out when a 3TB+ backup copy job tries to run a synthetic full. I don't have any issues on backup jobs because I've disabled synthetic fulls. And it appears backup copy jobs under 3TB in size don't cause it to lock up. But I've got a 5TB+ file server that has a synthetic full that causes the server to crash any time I try to run it.

There's not a way to disable the synthetic full side of a backup copy job so I'm stuck with just a few file servers that I can't run backup copy jobs on.
Cicadymn
Influencer
 
Posts: 19
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS 4k horror story

Veeam Logoby kubimike » Thu Jul 06, 2017 11:44 pm

Interesting. How much ram ? What's your disk subsystem look like ? I'm on 10k disks and a very fast controller. 192 gigs of ram. Tell me all the registry keys you've set
kubimike
Expert
 
Posts: 236
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby kb1ibt » Thu Jul 13, 2017 8:43 pm

Well today I had 2 lockups. The first at its normal time, during a GFS merge (after deletes). The second was more interesting, I had forgotten to disable a different job on the repo, which then started up after reboot (and before the ReFS delete cleanup had finished) so when the job was starting to Initialize Storage for copying the VMs as part of the backup copy it caused another lockup.
kb1ibt
Influencer
 
Posts: 12
Liked: never
Joined: Fri Apr 24, 2015 1:40 pm

Re: REFS 4k horror story

Veeam Logoby lowlander » Sun Jul 16, 2017 5:53 am

Hi,

is refs.sys 10.0.14393.1198 the latest driver, or should we use a driver that is available by MS support ?

Are the following figures expected processing times or is there space for improvement by changing registry keys ?
Exchange - 12TB full - 120GB inc - fastclone full backup file merge time 1h10m
Exchange - 12TB full - 140GB inc - fastclone full backup file merge time 1h44m
File - 16TB full - 270GB inc - fastclone full backup file merge time 1h30m
SQL - 5TB full - 25GB inc - fastclone full backup file merge time 0h08m
Oracle - 2TB full - 15GB inc - fastclone full backup file merge time 0h20m
Generic Workload - 13TB full - 113GB inc - fastclone full backup file merge time 2h30m

When a copy job is started that uses restorepoints above as source, merge time of the fastclone proces increases (sometimes doubles). We try to avoid running backup jobs and copy jobs that use the same ReFS disk in parallel.

We also have seen event messages in the system log 7023 (Data Sharing service). We have applied the recommendation on this website : http://www.neighborgeek.net/2017/05/ser ... rvice.html, however are not sure if it is related to ReFS :)

thanks !
lowlander
Service Provider
 
Posts: 267
Liked: 20 times
Joined: Sun Dec 28, 2014 11:48 am

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Mon Jul 17, 2017 6:56 pm 3 people like this post

I spoke with the ReFS team from Microsoft via Veeam Support. Explained to them my issues. (Still running 4K ReFS). My CPU maxes out on the experimental driver. Here's what they told me:

When ReFS team investigated memory issue we find that root cause for it is bug in cache manager which is part of kernel. We asked code owner for fix and in meantime implemented workaround fix in ReFS. This workaround fix solved problem for particular customer and probability for regression is smaller compared to kernel fix. This ReFS fix was backported by support team to Windows Server 2016 and released as official fix (I have check, but if you will apply latest updates ReFS code will contain it).

Kernel code owner fixed issue in cache manager. But as ReFS fix solved particular customer issue kernel fix wasn’t backported to Windows Server 2016. The backport happens when support team ask for it and proves business case. Even if I am probably able to start process it is beyond my knowledge and at end it will end with person responsible for your account (or Veeam) anyway. It will be kernel fix (new kernel binaries, not ReFS driver).

In other words – please ask your Microsoft support contact person for bug 9939237 fix to be backported to Windows Server 2016 (if you don’t have this contact ask Veeam to do it).

And then later I heard some more:

I’ve just talked with a couple folks who backport fixes to previous releases, and we’ll go ahead backporting this to RS1. If needed, I’ll follow up with you to solidify the business case of this backport, but at this time, we should have the information we need to backport this fix without any further action from you.

Backporting, unfortunately, takes time since each fix that is backported needs to be thoroughly tested for regressions. We’ll push for it to be backported quickly. Best case, this would be placed in the August patch, though I would plan for September.

So for those of us still on 4K ReFS, it looks like there may be a light at the end of the tunnel. They sound like they've got a good grasp of the problem, and even a fix in the works. We may need to play the waiting game a little bit longer, but I'm hoping we'll be able to be fully stable soon!
Cicadymn
Influencer
 
Posts: 19
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS 4k horror story

Veeam Logoby zuldan » Wed Jul 19, 2017 7:47 pm 2 people like this post

It looks like Microsoft have released the official patch now.

Code: Select all
Addressed performance issues in ReFS when backing up many terabytes of data.
Addressed issue where a stuck thread in ReFS might cause memory corruption.

https://support.microsoft.com/en-au/help/4025334
zuldan
Enthusiast
 
Posts: 44
Liked: 5 times
Joined: Wed Feb 15, 2017 9:51 am

Re: REFS 4k horror story

Veeam Logoby kubimike » Wed Jul 19, 2017 8:55 pm

:D :) :P :mrgreen:

For those of us that are running the beta driver, how do we install this update ? Might sound like a silly question :?: :shock:
kubimike
Expert
 
Posts: 236
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Wed Jul 19, 2017 9:59 pm

zuldan wrote:It looks like Microsoft have released the official patch now.

Code: Select all
Addressed performance issues in ReFS when backing up many terabytes of data.
Addressed issue where a stuck thread in ReFS might cause memory corruption.

https://support.microsoft.com/en-au/help/4025334


After installing the update, My Refs.sys driver updated from 10.0.14393.1198 to 10.0.14393.1532 I'll run it without making enabling the settings that causes my crash for a week or two, if everything stays stable, then I'll see about enabling synthetic fulls again.
Cicadymn
Influencer
 
Posts: 19
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Wed Jul 19, 2017 10:04 pm

kubimike wrote::D :) :P :mrgreen:

For those of us that are running the beta driver, how do we install this update ? Might sound like a silly question :?: :shock:


Just run the update. It automatically updates the ReFS driver. I've also disabled test mode since it's back in the normal driver lineup.
Cicadymn
Influencer
 
Posts: 19
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS 4k horror story

Veeam Logoby kubimike » Wed Jul 19, 2017 10:41 pm

reg key removal necessary or will those no apply anymore and thus get ignored ?
kubimike
Expert
 
Posts: 236
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby dmayer » Thu Jul 20, 2017 1:59 am

I hope this is coming to Windows 10 Creator's update as we have that deployed at two locations (us being one of them) for a cost savings over Server 2016. Funny enough my home lab has Server 2016 haha.
dmayer
Service Provider
 
Posts: 12
Liked: 3 times
Joined: Fri Apr 21, 2017 6:16 pm
Full Name: Daniel Mayer

Re: REFS 4k horror story

Veeam Logoby mkretzer » Thu Jul 20, 2017 8:51 am 4 people like this post

I can't believe it but in the first two tests this hotfix fixed our slow merge speed!

We did not do any deletes yet on the volume (this will be our next test), but backup merge speed got slower with every week nevertheless:

1 week after new REFS+active full: 0:08:56
2 week after new REFS+active full: 0:18:24
3 week after new REFS+active full: 0:31:59
4 week after new REFS+active full: 4:17:41

After Patch / 5 weeks after new REFS+active full: 0:09:03

And this after we just migrated 90 % of our backups back to NTFS :-(
mkretzer
Expert
 
Posts: 310
Liked: 69 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby ferrus » Thu Jul 20, 2017 9:08 am

I'm just about to migrate one of our Veeam repositories over to ReFS out of necessity - so this really is good news.
There's quite a few ReFS threads on here - I need to find out how many other existing issues there are to look out for.
ferrus
Veeam ProPartner
 
Posts: 127
Liked: 20 times
Joined: Thu Dec 03, 2015 3:41 pm
Location: UK

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: mcclans, Vitaliy S. and 55 guests