-
- Expert
- Posts: 141
- Liked: 5 times
- Joined: Jan 27, 2010 9:43 am
- Full Name: René Frej Nielsen
- Contact:
Re: REFS 4k horror story
I enabled the RefsEnableLargeWorkingSetTrim registry key but not one of the others...
I have no experience with RAMMap... how do I set it up to record a sequence like you have created? The problem is that once the server is unresponsive, then I can't save anything before resetting the server.
I have no experience with RAMMap... how do I set it up to record a sequence like you have created? The problem is that once the server is unresponsive, then I can't save anything before resetting the server.
-
- Influencer
- Posts: 23
- Liked: 4 times
- Joined: Apr 16, 2015 11:25 am
- Full Name: Hauke Ihnen
- Contact:
Re: REFS 4k horror story
More details to my issues with freesing storage:
Registry Keys from MS are not helping. They lower the used RAM by ReFS, but Windows is still freezing. With all 3 options set the used RAM will stay at max. ~3GB, not more.
Freezing always occur on high usage of the array, for example multiple running jobs, compacting jobs, or high reading and writing at the same time (backups running + offsite backup to tape job).
Also ReFS lost all it's performance benefits after a few weeks, it's gotten very slow. It was only fast just after the creation. It feels like a very very high fragmented drive. It's not even using full 1GBit LAN now.
For me it is very clear - I will return to NTFS. No time to play a beta tester for MS, backup must be reliable without spending hours over hours for storage issues.
After again a freeze today let's hope that the drive will come up again to move the data away to another storage... at the moment it's freezing 5 minutes after each reboot.
And lets hope that moving the huge files away will not cause the storage to freeze again...
Edit: I was connected to the storage during freeze by RDP, so I can see the task manager just before the box died: memory usage 10GB from 24GB. So it's not a memory issue for me... CPU Load 100%, Uptime 19 minutes. Jay!
Registry Keys from MS are not helping. They lower the used RAM by ReFS, but Windows is still freezing. With all 3 options set the used RAM will stay at max. ~3GB, not more.
Freezing always occur on high usage of the array, for example multiple running jobs, compacting jobs, or high reading and writing at the same time (backups running + offsite backup to tape job).
Also ReFS lost all it's performance benefits after a few weeks, it's gotten very slow. It was only fast just after the creation. It feels like a very very high fragmented drive. It's not even using full 1GBit LAN now.
For me it is very clear - I will return to NTFS. No time to play a beta tester for MS, backup must be reliable without spending hours over hours for storage issues.
After again a freeze today let's hope that the drive will come up again to move the data away to another storage... at the moment it's freezing 5 minutes after each reboot.
And lets hope that moving the huge files away will not cause the storage to freeze again...
Edit: I was connected to the storage during freeze by RDP, so I can see the task manager just before the box died: memory usage 10GB from 24GB. So it's not a memory issue for me... CPU Load 100%, Uptime 19 minutes. Jay!
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
Are you agentless? Maybe thats HPs stuff doing that .rfn wrote:Yes, I'm using NIC teaming... I have a HP 10G NIC where I have teamed the two connectors and connected them to two HPE 5900 series switches that are stacked for redundancy.
I literally only have Windows Server, Veeam Backup & Replication and the HPE drivers and tools on there server. Nothing else... I also got the search result that you're linking to.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
You wont see ReFS memory usage in task manager. Also get with Gostev, IIRC he has some fix from Microsoft to try out.Hauke wrote:Edit: I was connected to the storage during freeze by RDP, so I can see the task manager just before the box died: memory usage 10GB from 24GB. So it's not a memory issue for me... CPU Load 100%, Uptime 19 minutes. Jay!
-
- Expert
- Posts: 141
- Liked: 5 times
- Joined: Jan 27, 2010 9:43 am
- Full Name: René Frej Nielsen
- Contact:
Re: REFS 4k horror story
I'm not 100% sure but "HPE ProLiant Agentless Management Service" is installed... I have a second VBR server as a repository, on another site, and it has the same software installed, but doesn't get this error. That server is a DL380 Gen8.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
@RFN, damn dude you got me stumped. I've given you my bag of tricks to get a DL380 to work with veeam lol. The only thing left is we are running different windows patches.
For s****-n-giggles you could remove all Win updates except for the one that is there by default and install only the following:
kb3211320 + kb3213986
Oh and so you don't get crypt0rwared turn off SMB v1 in features
This is how I run my Veeam box.
For s****-n-giggles you could remove all Win updates except for the one that is there by default and install only the following:
kb3211320 + kb3213986
Oh and so you don't get crypt0rwared turn off SMB v1 in features
This is how I run my Veeam box.
-
- Expert
- Posts: 141
- Liked: 5 times
- Joined: Jan 27, 2010 9:43 am
- Full Name: René Frej Nielsen
- Contact:
Re: REFS 4k horror story
kubimike wrote:@RFN, damn dude you got me stumped. I've given you my bag of tricks to get a DL380 to work with veeam lol. The only thing left is we are running different windows patches.
My servers are fully patched... I have also updated to VBR update 2 today. I will see if it hangs again, and then I will implement the RefsNumberOfChunksToTrim registry key like you did, and see if that helps...
-
- Expert
- Posts: 141
- Liked: 5 times
- Joined: Jan 27, 2010 9:43 am
- Full Name: René Frej Nielsen
- Contact:
Re: REFS 4k horror story
Interesting suggestion... but I really like to have my servers patched, and I'm pretty sure that our auditors would spank me if did what you suggestkubimike wrote:@RFN, damn dude you got me stumped. I've given you my bag of tricks to get a DL380 to work with veeam lol. The only thing left is we are running different windows patches.
For s****-n-giggles you could remove all Win updates except for the one that is there by default and install only the following:
kb3211320 + kb3213986
Oh and so you don't get crypt0rwared turn off SMB v1 in features
This is how I run my Veeam box.
I have really locked down the Windows Firewall on these boxes so any ransomware would have to be very good to get into them! Unfortunately the firewall rules are "reset" by the VBR update 2 installation
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
@RFN
Can't try it to see if it works though ? Just for a few days ? All my prior recommendations + the Microsoft KBs have made my box stable for a few months now .
Can't try it to see if it works though ? Just for a few days ? All my prior recommendations + the Microsoft KBs have made my box stable for a few months now .
-
- Expert
- Posts: 141
- Liked: 5 times
- Joined: Jan 27, 2010 9:43 am
- Full Name: René Frej Nielsen
- Contact:
Re: REFS 4k horror story
It's an interesting suggestion... I will first see if it magically works now, and if not, then try the registry fix. If that doesn't work either, then I can try it, but I almost hope that it doesn't fix it, because we really should be able to patch our servers without it breaking stuff.
I'm also considering rebuilding the server to see if that error in the Application log goes away!
I'm also considering rebuilding the server to see if that error in the Application log goes away!
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
@RFN sounds like a plan. I'm excited to hear the result.
-
- Influencer
- Posts: 23
- Liked: 4 times
- Joined: Apr 16, 2015 11:25 am
- Full Name: Hauke Ihnen
- Contact:
Re: REFS 4k horror story
Just to add numbers.. I have two identical NAS devices. One with Windows Server 2016 and ReFS, the second one with 2012 R2 and NTFS. Same Raid config. Same Harddisks.Hauke wrote: Also ReFS lost all it's performance benefits after a few weeks, it's gotten very slow. It was only fast just after the creation.
Now copying the files from the ReFS box to the NTFS Box.
Speed: 50MB/s, not more. Reading only 1 file, no other load on the box!
Load:
Source (ReFS): constant 100%
Target (NTFS): 2-4% (and it's not a new drive, it's old and fragmented too, used before for Veeam for over 1 year)
...
Again, ReFS worked fine without issues for 2 Months, but every day it got slower and slower. Maybe ReFS isn't a good choice for harddisks because of its heavy fragmentation, and it will work better on SSDs.
I don't think a simple patch from MS will solve that, its by design.
-
- Enthusiast
- Posts: 38
- Liked: never
- Joined: Apr 08, 2016 5:15 pm
- Contact:
Re: REFS 4k horror story
I did test out the test Microsoft fix, but this did not help last night. I do not have CPU or memory problems, but my WMI monitoring has large gaps in my repository data. It's tough to say what causes it, when the instability started, there was only one merge running. Overall, I think at this point I need to turn off block cloning, move back to NTFS, or start using the block clone synthetic fulls.suprnova wrote:I was hoping to avoid the issue by not using synthetic fulls, but this issue is also happening for incremental merges with block cloning. My CPU and RAM are fine, but during the merge I am unable to browse the Veeam repo drive in Windows.
I am fully patched and I have RefsEnableLargeWorkingSetTrim set to 1.
-
- Service Provider
- Posts: 28
- Liked: 11 times
- Joined: Oct 31, 2016 6:27 pm
- Full Name: Thomas Raabo
- Location: infrastructure guy
- Contact:
Re: REFS 4k horror story
News update.
Working with a new ReFS.sys driver from MS and everything seems much more stable.
Still to early to say anything ...... but! does seem to have a big effect on our setup.
Working with a new ReFS.sys driver from MS and everything seems much more stable.
Still to early to say anything ...... but! does seem to have a big effect on our setup.
-
- Service Provider
- Posts: 56
- Liked: 14 times
- Joined: Jan 10, 2012 8:53 pm
- Contact:
Re: REFS 4k horror story
But this wouldn't necessarily matter for customers using block storage, correct? I'm on a Compellent SC8000 & SCv2080Hauke wrote:Again, ReFS worked fine without issues for 2 Months, but every day it got slower and slower. Maybe ReFS isn't a good choice for harddisks because of its heavy fragmentation, and it will work better on SSDs.
I don't think a simple patch from MS will solve that, its by design.
-
- Service Provider
- Posts: 56
- Liked: 14 times
- Joined: Jan 10, 2012 8:53 pm
- Contact:
Re: REFS 4k horror story
One BIG reason I'm trying ReFS- corruption detection. But wouldn't a regular backup files health check on NTFS accomplish the same thing?
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS 4k horror story
There's no "heavy fragmentation" with ReFS as Veeam blocks (which is what is being cloned) are quite large in size, 512KB on average with the default settings. With such block size, even a single spindle of 7200rpm drive should be able to do 30-50MB/s throughput on 100% "fragmented" volume - while any backup storage will usually have multiple spindles and so much more I/O capacity. So the reason here is not fragmentation, but something else. Likely the impact of the core issue discussed here, because looks like that issue simply keeps the entire volume overloaded and constantly busy.Hauke wrote:Maybe ReFS isn't a good choice for harddisks because of its heavy fragmentation, and it will work better on SSDs.
No, not the same - health check only checks and fixes (if needed) the latest restore point. While ReFS monitors the entire volume (including GFS backups etc.) plus with storage spaces, it is able to recovery corruption such as bit rot too - making it an awesome choice for long-term backup repositories.Skyview wrote:One BIG reason I'm trying ReFS- corruption detection. But wouldn't a regular backup files health check on NTFS accomplish the same thing?
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS 4k horror story
Awesome news! Let's observe it for a week or two now.thomas.raabo wrote:News update.
Working with a new ReFS.sys driver from MS and everything seems much more stable.
Still to early to say anything ...... but! does seem to have a big effect on our setup.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
new refs.sys, music to my monitor!
-
- Service Provider
- Posts: 56
- Liked: 14 times
- Joined: Jan 10, 2012 8:53 pm
- Contact:
Re: REFS 4k horror story
This thread is a bit deep, any more information on this?
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS 4k horror story
@Skyview, head to the restroom and read up .. all good stuff
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS 4k horror story
We've been testing a private fix from Microsoft with 6 affected customers, Thomas is one of those who volunteered when I asked for help a few pages ago. I think it makes sense to wait feedback from all of them before expanding this effort.
-
- Service Provider
- Posts: 56
- Liked: 14 times
- Joined: Jan 10, 2012 8:53 pm
- Contact:
Re: REFS 4k horror story
Thanks for the update Gostev.
-
- Enthusiast
- Posts: 63
- Liked: 9 times
- Joined: Nov 29, 2016 10:09 pm
- Contact:
Re: REFS 4k horror story
@Gostev, interested in testing. case #02173809. Our backup server freezed just few hours ago
-
- Service Provider
- Posts: 28
- Liked: 11 times
- Joined: Oct 31, 2016 6:27 pm
- Full Name: Thomas Raabo
- Location: infrastructure guy
- Contact:
Re: REFS 4k horror story
Hi All.Gostev wrote:We've been testing a private fix from Microsoft with 6 affected customers, Thomas is one of those who volunteered when I asked for help a few pages ago. I think it makes sense to wait feedback from all of them before expanding this effort.
This is the third day of testing the new ReFS.sys file and our backup window has gone down with about 10 hours.
Right now we are not able to make the disk go "offline" in explore and disk counters does not stop working. It seems that this have had a big effect on our 4 repos with the patch and veeam jobs now process as expected.
We have a total of 600TB running this patch.
Ram is steady at 60GB ram and performance does not seem to be affected by this patch.
And no lockups... we needed to reboot our repo to keep refs only almost every day.
Will keep you updated.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: REFS 4k horror story
Is it known when this hotfix will be distributed by MS?
-
- Enthusiast
- Posts: 55
- Liked: 9 times
- Joined: Apr 27, 2014 8:19 pm
- Contact:
Re: REFS 4k horror story
I´m also testing the fix from MS, but see only minor improvements. Testing goes on....
@thomas.raabo; are you running the latest MS update for 2016? what about "RefsEnableLargeWorkingSetTrim", are you using it?
@thomas.raabo; are you running the latest MS update for 2016? what about "RefsEnableLargeWorkingSetTrim", are you using it?
-
- Service Provider
- Posts: 28
- Liked: 11 times
- Joined: Oct 31, 2016 6:27 pm
- Full Name: Thomas Raabo
- Location: infrastructure guy
- Contact:
Re: REFS 4k horror story
No this is a special hotfix that are not public.
Day 4 still no problems!
Day 4 still no problems!
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS 4k horror story
JimmyO you have the same special fix.
Also, I don't know if Thomas has RefsEnableLargeWorkingSetTrim enabled, but it would not matter much for him because I know he has infinite RAM on backup repository server well, 256GB that is.
Also, I don't know if Thomas has RefsEnableLargeWorkingSetTrim enabled, but it would not matter much for him because I know he has infinite RAM on backup repository server well, 256GB that is.
-
- Service Provider
- Posts: 28
- Liked: 11 times
- Joined: Oct 31, 2016 6:27 pm
- Full Name: Thomas Raabo
- Location: infrastructure guy
- Contact:
Re: REFS 4k horror story
correct ..
my main issues is meta data change resulting in disk going offline. this does seem to happen on all meta changes.
RefsEnableLargeWorkingSetTrim did help making my system not crash when deleting syntetic fulls and max ram usage around 80g
my main issues is meta data change resulting in disk going offline. this does seem to happen on all meta changes.
RefsEnableLargeWorkingSetTrim did help making my system not crash when deleting syntetic fulls and max ram usage around 80g
Who is online
Users browsing this forum: Gostev and 71 guests