-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
So when veeam is doing a job at what point does it cause the CPU to max out ? Retention ? Merge ? Synthetic Full ?
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS 4k horror story
I will guess it is Retention.
-
- Enthusiast
- Posts: 26
- Liked: 12 times
- Joined: Jan 30, 2017 7:42 pm
- Full Name: Sam
- Contact:
Re: REFS 4k horror story
CPU Maxes out when a 3TB+ backup copy job tries to run a synthetic full. I don't have any issues on backup jobs because I've disabled synthetic fulls. And it appears backup copy jobs under 3TB in size don't cause it to lock up. But I've got a 5TB+ file server that has a synthetic full that causes the server to crash any time I try to run it.kubimike wrote:So when veeam is doing a job at what point does it cause the CPU to max out ? Retention ? Merge ? Synthetic Full ?
There's not a way to disable the synthetic full side of a backup copy job so I'm stuck with just a few file servers that I can't run backup copy jobs on.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
Interesting. How much ram ? What's your disk subsystem look like ? I'm on 10k disks and a very fast controller. 192 gigs of ram. Tell me all the registry keys you've set
-
- Influencer
- Posts: 14
- Liked: never
- Joined: Apr 24, 2015 1:40 pm
- Contact:
Re: REFS 4k horror story
Well today I had 2 lockups. The first at its normal time, during a GFS merge (after deletes). The second was more interesting, I had forgotten to disable a different job on the repo, which then started up after reboot (and before the ReFS delete cleanup had finished) so when the job was starting to Initialize Storage for copying the VMs as part of the backup copy it caused another lockup.
-
- Service Provider
- Posts: 453
- Liked: 30 times
- Joined: Dec 28, 2014 11:48 am
- Location: The Netherlands
- Contact:
Re: REFS 4k horror story
Hi,
is refs.sys 10.0.14393.1198 the latest driver, or should we use a driver that is available by MS support ?
Are the following figures expected processing times or is there space for improvement by changing registry keys ?
Exchange - 12TB full - 120GB inc - fastclone full backup file merge time 1h10m
Exchange - 12TB full - 140GB inc - fastclone full backup file merge time 1h44m
File - 16TB full - 270GB inc - fastclone full backup file merge time 1h30m
SQL - 5TB full - 25GB inc - fastclone full backup file merge time 0h08m
Oracle - 2TB full - 15GB inc - fastclone full backup file merge time 0h20m
Generic Workload - 13TB full - 113GB inc - fastclone full backup file merge time 2h30m
When a copy job is started that uses restorepoints above as source, merge time of the fastclone proces increases (sometimes doubles). We try to avoid running backup jobs and copy jobs that use the same ReFS disk in parallel.
We also have seen event messages in the system log 7023 (Data Sharing service). We have applied the recommendation on this website : http://www.neighborgeek.net/2017/05/ser ... rvice.html, however are not sure if it is related to ReFS
thanks !
is refs.sys 10.0.14393.1198 the latest driver, or should we use a driver that is available by MS support ?
Are the following figures expected processing times or is there space for improvement by changing registry keys ?
Exchange - 12TB full - 120GB inc - fastclone full backup file merge time 1h10m
Exchange - 12TB full - 140GB inc - fastclone full backup file merge time 1h44m
File - 16TB full - 270GB inc - fastclone full backup file merge time 1h30m
SQL - 5TB full - 25GB inc - fastclone full backup file merge time 0h08m
Oracle - 2TB full - 15GB inc - fastclone full backup file merge time 0h20m
Generic Workload - 13TB full - 113GB inc - fastclone full backup file merge time 2h30m
When a copy job is started that uses restorepoints above as source, merge time of the fastclone proces increases (sometimes doubles). We try to avoid running backup jobs and copy jobs that use the same ReFS disk in parallel.
We also have seen event messages in the system log 7023 (Data Sharing service). We have applied the recommendation on this website : http://www.neighborgeek.net/2017/05/ser ... rvice.html, however are not sure if it is related to ReFS
thanks !
-
- Enthusiast
- Posts: 26
- Liked: 12 times
- Joined: Jan 30, 2017 7:42 pm
- Full Name: Sam
- Contact:
Re: REFS 4k horror story
I spoke with the ReFS team from Microsoft via Veeam Support. Explained to them my issues. (Still running 4K ReFS). My CPU maxes out on the experimental driver. Here's what they told me:
And then later I heard some more:
When ReFS team investigated memory issue we find that root cause for it is bug in cache manager which is part of kernel. We asked code owner for fix and in meantime implemented workaround fix in ReFS. This workaround fix solved problem for particular customer and probability for regression is smaller compared to kernel fix. This ReFS fix was backported by support team to Windows Server 2016 and released as official fix (I have check, but if you will apply latest updates ReFS code will contain it).
Kernel code owner fixed issue in cache manager. But as ReFS fix solved particular customer issue kernel fix wasn’t backported to Windows Server 2016. The backport happens when support team ask for it and proves business case. Even if I am probably able to start process it is beyond my knowledge and at end it will end with person responsible for your account (or Veeam) anyway. It will be kernel fix (new kernel binaries, not ReFS driver).
In other words – please ask your Microsoft support contact person for bug 9939237 fix to be backported to Windows Server 2016 (if you don’t have this contact ask Veeam to do it).
And then later I heard some more:
So for those of us still on 4K ReFS, it looks like there may be a light at the end of the tunnel. They sound like they've got a good grasp of the problem, and even a fix in the works. We may need to play the waiting game a little bit longer, but I'm hoping we'll be able to be fully stable soon!I’ve just talked with a couple folks who backport fixes to previous releases, and we’ll go ahead backporting this to RS1. If needed, I’ll follow up with you to solidify the business case of this backport, but at this time, we should have the information we need to backport this fix without any further action from you.
Backporting, unfortunately, takes time since each fix that is backported needs to be thoroughly tested for regressions. We’ll push for it to be backported quickly. Best case, this would be placed in the August patch, though I would plan for September.
-
- Enthusiast
- Posts: 45
- Liked: 5 times
- Joined: Feb 15, 2017 9:51 am
- Contact:
Re: REFS 4k horror story
It looks like Microsoft have released the official patch now.
https://support.microsoft.com/en-au/help/4025334
Code: Select all
Addressed performance issues in ReFS when backing up many terabytes of data.
Addressed issue where a stuck thread in ReFS might cause memory corruption.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
For those of us that are running the beta driver, how do we install this update ? Might sound like a silly question
-
- Enthusiast
- Posts: 26
- Liked: 12 times
- Joined: Jan 30, 2017 7:42 pm
- Full Name: Sam
- Contact:
Re: REFS 4k horror story
After installing the update, My Refs.sys driver updated from 10.0.14393.1198 to 10.0.14393.1532 I'll run it without making enabling the settings that causes my crash for a week or two, if everything stays stable, then I'll see about enabling synthetic fulls again.zuldan wrote:It looks like Microsoft have released the official patch now.
https://support.microsoft.com/en-au/help/4025334Code: Select all
Addressed performance issues in ReFS when backing up many terabytes of data. Addressed issue where a stuck thread in ReFS might cause memory corruption.
-
- Enthusiast
- Posts: 26
- Liked: 12 times
- Joined: Jan 30, 2017 7:42 pm
- Full Name: Sam
- Contact:
Re: REFS 4k horror story
Just run the update. It automatically updates the ReFS driver. I've also disabled test mode since it's back in the normal driver lineup.kubimike wrote:
For those of us that are running the beta driver, how do we install this update ? Might sound like a silly question
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
reg key removal necessary or will those no apply anymore and thus get ignored ?
-
- Influencer
- Posts: 18
- Liked: 9 times
- Joined: Apr 21, 2017 6:16 pm
- Full Name: Daniel Mayer
- Contact:
Re: REFS 4k horror story
I hope this is coming to Windows 10 Creator's update as we have that deployed at two locations (us being one of them) for a cost savings over Server 2016. Funny enough my home lab has Server 2016 haha.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: REFS 4k horror story
I can't believe it but in the first two tests this hotfix fixed our slow merge speed!
We did not do any deletes yet on the volume (this will be our next test), but backup merge speed got slower with every week nevertheless:
1 week after new REFS+active full: 0:08:56
2 week after new REFS+active full: 0:18:24
3 week after new REFS+active full: 0:31:59
4 week after new REFS+active full: 4:17:41
After Patch / 5 weeks after new REFS+active full: 0:09:03
And this after we just migrated 90 % of our backups back to NTFS
We did not do any deletes yet on the volume (this will be our next test), but backup merge speed got slower with every week nevertheless:
1 week after new REFS+active full: 0:08:56
2 week after new REFS+active full: 0:18:24
3 week after new REFS+active full: 0:31:59
4 week after new REFS+active full: 4:17:41
After Patch / 5 weeks after new REFS+active full: 0:09:03
And this after we just migrated 90 % of our backups back to NTFS
-
- Veeam ProPartner
- Posts: 300
- Liked: 44 times
- Joined: Dec 03, 2015 3:41 pm
- Location: UK
- Contact:
Re: REFS 4k horror story
I'm just about to migrate one of our Veeam repositories over to ReFS out of necessity - so this really is good news.
There's quite a few ReFS threads on here - I need to find out how many other existing issues there are to look out for.
There's quite a few ReFS threads on here - I need to find out how many other existing issues there are to look out for.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: REFS 4k horror story
@ferrus We have seen three issues:
- Crash by high memory usage: Especially with 4k, Per-VM. Fixed in our system by increasing RAM to 384 GB. Should be fixed by the update as well.
- Slow merges, every week it gets slower: Seems to be fixed by the update.
- Active full at the same time as merges are running is EXTREMLY slow (~1-2 MB/s instead of 400+ MB/s): We will test this tomorrow.
- Deletes cause REFS to get slower in general: The first test seems to suggest that this is also fixed by the update.
- Crash by high memory usage: Especially with 4k, Per-VM. Fixed in our system by increasing RAM to 384 GB. Should be fixed by the update as well.
- Slow merges, every week it gets slower: Seems to be fixed by the update.
- Active full at the same time as merges are running is EXTREMLY slow (~1-2 MB/s instead of 400+ MB/s): We will test this tomorrow.
- Deletes cause REFS to get slower in general: The first test seems to suggest that this is also fixed by the update.
-
- Veeam ProPartner
- Posts: 300
- Liked: 44 times
- Joined: Dec 03, 2015 3:41 pm
- Location: UK
- Contact:
Re: REFS 4k horror story
The server we're migrating is a single job/single VM repository, albeit 18TB in size. So it looks as though the only untested issue won't be a problem for us.
This gives me a lot more confidence.
I wonder if this (after testing), will affect Veeams recommendation of 64 vs 4k?
This gives me a lot more confidence.
I wonder if this (after testing), will affect Veeams recommendation of 64 vs 4k?
-
- VeeaMVP
- Posts: 6166
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: REFS 4k horror story
Any comment on the 3 registry keys that came with the test hotfix? Are they still needed with this latest update?
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
-
- Enthusiast
- Posts: 55
- Liked: 9 times
- Joined: Apr 27, 2014 8:19 pm
- Contact:
Re: REFS 4k horror story
Sounds good - I´m testing right now. Do you use any refs registry settings?mkretzer wrote:I can't believe it but in the first two tests this hotfix fixed our slow merge speed!
We did not do any deletes yet on the volume (this will be our next test), but backup merge speed got slower with every week nevertheless:
1 week after new REFS+active full: 0:08:56
2 week after new REFS+active full: 0:18:24
3 week after new REFS+active full: 0:31:59
4 week after new REFS+active full: 4:17:41
After Patch / 5 weeks after new REFS+active full: 0:09:03
And this after we just migrated 90 % of our backups back to NTFS
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: REFS 4k horror story
Only the most basic RefsEnableLargeWorkingSetTrim.
-
- Enthusiast
- Posts: 38
- Liked: never
- Joined: Apr 08, 2016 5:15 pm
- Contact:
Re: REFS 4k horror story
Initial results with the MS update have been positive for us as well, overall incremental backup speeds have improved. I removed the hotfix registry keys and I am also just using RefsEnableLargeWorkingSetTrim.
-
- Enthusiast
- Posts: 26
- Liked: 12 times
- Joined: Jan 30, 2017 7:42 pm
- Full Name: Sam
- Contact:
Re: REFS 4k horror story
Just be sure to use 64K block size on ReFS. I believe 4K is the default, so you'll need to change it to 64K. They've had a lot less issues than those of us on 4K.ferrus wrote:I'm just about to migrate one of our Veeam repositories over to ReFS out of necessity - so this really is good news.
There's quite a few ReFS threads on here - I need to find out how many other existing issues there are to look out for.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
Anyone else end up with ReFS version '10.0.14393.1532 ' ??
-
- Service Provider
- Posts: 315
- Liked: 41 times
- Joined: Feb 02, 2016 5:02 pm
- Full Name: Stephen Barrett
- Contact:
Re: REFS 4k horror story
Yep that's what i have too.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
Well I guess msft typed the wrong number in the CSV file, it lists '10.0.14393.1408'
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
New patch is a flop, as soon as retention hits, say goodnight dick. Going back to the beta driver. UGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
The computer has rebooted from a bugcheck. The bugcheck was: 0x00000133 (0x0000000000000001, 0x0000000000001e00, 0xfffff801e1019540, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 0819dd2a-6290-44e4-9bf4-25363b2282d6.
The computer has rebooted from a bugcheck. The bugcheck was: 0x00000133 (0x0000000000000001, 0x0000000000001e00, 0xfffff801e1019540, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 0819dd2a-6290-44e4-9bf4-25363b2282d6.
-
- Enthusiast
- Posts: 55
- Liked: 9 times
- Joined: Apr 27, 2014 8:19 pm
- Contact:
Re: REFS 4k horror story
Same for me - the patch makes little or no difference when doing daily merge of my increments.
-
- Service Provider
- Posts: 454
- Liked: 86 times
- Joined: Jun 09, 2015 7:08 pm
- Full Name: JaySt
- Contact:
Re: REFS 4k horror story
if this patch would fix the problems seen in this thread and for which Veeam is so actively trying come up with a fix with MS , i think we would have seen some sort of announcement through Gostev maybe, something like "we're getting close to a fix.. hold on...". Would be quite special to have a patch (or THE patch) drop down from the sky like this.
Or did i miss something (possible... with this large thread).
Or did i miss something (possible... with this large thread).
Veeam Certified Engineer
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: REFS 4k horror story
@JimmyO What kind of issue did you have - Slow merges or crashes?
For us the patch seems to really fix the slowness, even when an active full is running at the same time! For the first time, the target storage shows 100 % utilisation on all drives - in the past the backend did not write much and REFS just hang there at very low write speed
For us the patch seems to really fix the slowness, even when an active full is running at the same time! For the first time, the target storage shows 100 % utilisation on all drives - in the past the backend did not write much and REFS just hang there at very low write speed
-
- Enthusiast
- Posts: 55
- Liked: 9 times
- Joined: Apr 27, 2014 8:19 pm
- Contact:
Re: REFS 4k horror story
No crashes - only slow merges an unresponsive disks (for about 30 sec at the time).
I can see some difference in merge time. I estimate 25% increase in performance. What used to take 20 hours now takes 15 (at least after the first run).
Still - when I started to use ReFS (first week) it took 1 hour...
I can see some difference in merge time. I estimate 25% increase in performance. What used to take 20 hours now takes 15 (at least after the first run).
Still - when I started to use ReFS (first week) it took 1 hour...
Who is online
Users browsing this forum: Bing [Bot], Google [Bot], Majestic-12 [Bot] and 70 guests