Comprehensive data protection for all workloads
Locked
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev »

JaySt wrote:if this patch would fix the problems seen in this thread and for which Veeam is so actively trying come up with a fix with MS , i think we would have seen some sort of announcement through Gostev maybe, something like "we're getting close to a fix.. hold on...". Would be quite special to have a patch (or THE patch) drop down from the sky like this.
Correct :D as per the quoted response from Microsoft support on the previous page, they are currently porting the hot fix to the current branch.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

So... if i understand corretly they solved the performance issues for some (us included) but not the crash issue which others with not enough RAM have?
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev »

Excessive RAM usage was the very first issue they have solved.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

observation worth mentioning. Even though Im running the latest patch with the beta refs driver I am seeing improved speed with fast clones. It went from 5 hours back to roughly 2-1/2 hours. Im not sure what else they fixed but its an improvement for sure!
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

We ported all our backups back to REFS last weekend. One thing worth mentioning is that the active fulls were really fast. I checked the statistics and most active fulls were even faster than with NTFS on the same storage! It was completely different with the "old" REFS driver where it was always much slower than with NTFS!
pesos
Expert
Posts: 205
Liked: 17 times
Joined: Nov 12, 2014 9:40 am
Full Name: John Johnson
Contact:

Re: REFS 4k horror story

Post by pesos »

Trying to get up to speed here as we have been seeing crashes on our on-host backups recently. At this point is there anything we need to do other than fully patch our hosts and Veeam servers via windows update?
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

crucial question, WHEN does it crash and we can guide you. Time the job with what is happening, take a look at the logs report back.
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: REFS 4k horror story

Post by lepphce1 »

I received the following reply from Microsoft support regarding my 0x133 BSOD events.
Due to complexity of the code changes required to handle the hard WS limit.

At this point, you should check with the backup software vendor if they are using the API to set the hard limit on WS and if affirmative, then recommend them not to do so.

NtSetSystemInformation api called with flag MM_WORKING_SET_MAX_HARD_ENABLE.
This API requires that caller has SeIncreaseQuotaPrivilege privilege.

Kindly check with your backup vendor if they are using the above mention ApI.
I opened a ticket with Veeam support, and was told to pose the question to the forums so Tier 3 or a developer could address it.
mkaec
Veteran
Posts: 462
Liked: 133 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: REFS 4k horror story

Post by mkaec »

lepphce1 wrote:I opened a ticket with Veeam support, and was told to pose the question to the forums so Tier 3 or a developer could address it.
That doesn't seem right. I love that tier 3 and development participate in the forums, but if support cannot communicate internally with them, that's pretty dysfunctional.
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: REFS 4k horror story

Post by lepphce1 »

mkaec wrote: That doesn't seem right. I love that tier 3 and development participate in the forums, but if support cannot communicate internally with them, that's pretty dysfunctional.
I thought it was odd as well.
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev » 1 person likes this post

mkaec wrote:I love that tier 3 and development participate in the forums
Except they don't... but in any case, the response would be totally unacceptable even if they were here. Sorry about that - I've been fighting this behavior as well, but it does keep happening. I am not sure what's up with that. I have forwarded this to the support management for review.
j.forsythe
Influencer
Posts: 15
Liked: 4 times
Joined: Jan 06, 2016 10:26 am
Full Name: John P. Forsythe
Contact:

Re: REFS 4k horror story

Post by j.forsythe »

Hi there.
I updated the server and I am running on ReFS 10.0.14393.1532.
At the moment it seems like it is really slow (compared to NTFS), but the merge to synthetic full seems to be working without locking the server.
I have a ReFS iSCSI target at ta Synology NAS and it is running like a charm for months, it ain't super fast but it is doing its job.
I have the strange feeling that Storage Spaces is causing some issues, so I will change the RAID setup.
At the moment I am running Storage Spaces and I want to change it to a HP RAID.

There was a post where someone had experienced problems with a certain cluster size when creating the new RAID, but I cannot find it anymore.
Could that someone please point me to the right settings, BIG thank you!

I will keep you guys updated.
John
antipolis
Enthusiast
Posts: 73
Liked: 9 times
Joined: Oct 26, 2016 9:17 am
Contact:

Re: REFS 4k horror story

Post by antipolis »

refering to this ? https://www.virtualtothecore.com/en/vee ... ripe-size/

storage spaces is a disaster imho
dellock6
VeeaMVP
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: REFS 4k horror story

Post by dellock6 »

I would not call storage spaces a disaster, but indeed so far it proved to be slower than any hardware raid solution. We are suggesting customers use for now hardware raid solutions and avoid storage spaces, if you want to have high performance, especially parity modes. You lose self-healing of blocks without mirror or parity modes, but you still have scrubbing, and higher performance. A good hardware raid card is one of the first design choices to be done when building a server repository.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: REFS 4k horror story

Post by lepphce1 »

Gostev wrote:Except they don't... but in any case, the response would be totally unacceptable even if they were here. Sorry about that - I've been fighting this behavior as well, but it does keep happening. I am not sure what's up with that. I have forwarded this to the support management for review.
@Gostev I sent you my case number via PM. Thanks!
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

Hello, another patch Tuesday has come and gone. When can we expect to see the beta ReFS driver rolled into a production patch ?
vadm1018
Lurker
Posts: 2
Liked: 1 time
Joined: Aug 08, 2014 2:20 pm
Contact:

Re: REFS 4k horror story

Post by vadm1018 » 1 person likes this post

I'm not using Veeam. But, using REFS for another backup system. Noticed one of the backup systems crashing weekly. Then started being more frequent. Long story short, after applying the July 2017 update which had REFS fixes -- the system continued crashing (may have gotten even worse). Looking at the minidump, it was still reporting the culprit as REFS. Logged a as with their premier support and they confirmed it to still be an issue with REFS. They said it was fixed in Windows 10 but needs to be backported to Server 2016. Didn't give me an ETA after I asked for it but I would think its still at least 1 patch cycle away.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

same here, tried the latest patch all hell broke loose. Went back to the Beta ReFs and its more stable but not 100%. No word from Veeam so far.
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: REFS 4k horror story

Post by Delo123 »

antipolis wrote:refering to this ? https://www.virtualtothecore.com/en/vee ... ripe-size/

storage spaces is a disaster imho
Can't say it is. We have been using Storage Spaces since day 1 and never had a SS related issue. However we only use "simple" and mirrored backed by Adapted Raids. Biggest benefit is being able to create thin volumes from them, we use these to mave mutiple 64TB's to use dedupe on.
j.forsythe
Influencer
Posts: 15
Liked: 4 times
Joined: Jan 06, 2016 10:26 am
Full Name: John P. Forsythe
Contact:

Re: REFS 4k horror story

Post by j.forsythe »

Hello again.
I finally have been able to reconfigure our backup server.
For a week I used a Synology NAS as an ReFS iSCSI target, after a week I finally killed the StorageSpaces "RAID" and went back to the good old HP Smart Array RAID5.
I did the first full backup over the weekend and will see if the Windows 2016 (with all actual patches) will run smoothly.
Bevor it froze after about two weeks, that was on local SAS drives with StorgeSpaces.
And I have a couple numbers that I wanted to share:

iSCSI NTFS
94MB/s
5.2TB transferred 18h:42m

SAS ReFS (StorageSpaces)
20MB/s
5.2TB transferred 75h:52m

iSCSI ReFS
80MB/s
5.2TB transferred 10h:45m

SAS ReFS (HP RAID)
140MB/s :D
5.2TB transferred 10h:45m

Regrads,
John
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: REFS 4k horror story

Post by Delo123 »

Since you are now running Raid 5 on your HP I assume you went for Parity with Storage Spaces, bad choice... SS Parity is the worst implementation of Raid ever. However, SS Mirror works beautifully. SInce your Backups are over 5TB I assume your repository is quite big. Do you really trust Raid 5 when a disk fails?
j.forsythe
Influencer
Posts: 15
Liked: 4 times
Joined: Jan 06, 2016 10:26 am
Full Name: John P. Forsythe
Contact:

Re: REFS 4k horror story

Post by j.forsythe » 1 person likes this post

Hey.
Yes I am running RAID 5.
One full backup uses around 7+TB and I should keep at least 2 weeks on disk and that DL380 is completely packed with disks. :D
But I configured one additional disk as HotSpare and if the gates of Hell open and that server dies, I still have LTO tapes of the last 3 weeks. :twisted:
What would you have recommended, RAID6?

Regards,
John
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: REFS 4k horror story

Post by Delo123 » 1 person likes this post

Some years ago we were running 2 Storage arrays with local Raid 5 groups. These 2 arrays were mirrored to each other by network Raid 1. Everyone said we were crazy to do that, using so much space for redundancy, guess what? We still had downtime and data loss... Personally, I made the mistake to use Raid 5 and Raid 6 again and again, in the end, the only Raid which never had issues was mirroring, everything else has failed for some reason, disk dying or even parity errors during restores. I would always do Raid 10 now on local arrays, no matter what... You know everything works until you really really need it (and then the gates of Hell open as you already said).
dellock6
VeeaMVP
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: REFS 4k horror story

Post by dellock6 » 1 person likes this post

Honestly, single parity with disks larger than 2TB, you are looking for problems. URE (unrecoverable read errors) are a thing, and large disks many times are SATA, so with even less error control functions in them. So, you have a system that is more prone to errors, and on top of it, you have ZERO protection for the entire time that it takes to rebuild a raid group. And with large disks, rebuild times are loooooooong, hot spare just allows for the rebuild to start immediately, indeed, but it's not enough in my opinion. I want at least double parity.
So you know, we are suggesting all our customers designing server-based repositories, to go for a good hardware raid card (agree with Guido, SS parity is too slow...), and choose a double parity configuration, like 6 (mostly on small groups) or 60 (in large disk sets). Raid-10 eliminates parity issues indeed, but it's not exactly like having double protection. You cannot lose "any" two disks, only some combinations. And again during the rebuild you are totally unprotected in that specific R-1 subgroup (hence our suggestion for Raid-60 in many designs).
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS 4k horror story

Post by JimmyO »

OK - I now officially give up ReFS. I have initially worked with Veeam and later on the ReFS development team at MS for the last 6 months and the results are not very good.
Tested all possible regsettings, different versions of refs.sys, different version of ntfs core files but performance is still a lot worse than using NTFS.
While ReFS seems to be a good idea for smaller backups, it´s a disaster for larger ones. My vbk:s are about 60TB and my daily backups are 6TB.
I managed to get merge times to almost the same as when using NTFS, but disk load during backup is way higher with ReFS compared to NTFS. Disk queue lenght during backup went from 0,05 to 1+.
Now I have a 3+ week migration back to NTFS ahead of me :/
JaySt
Service Provider
Posts: 415
Liked: 75 times
Joined: Jun 09, 2015 7:08 pm
Full Name: JaySt
Contact:

Re: REFS 4k horror story

Post by JaySt »

@JimmyO, how about the latest patch currently begin backported to Window Server 2016 mentioned a few posts back? No hope for that one ?
Veeam Certified Engineer
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS 4k horror story

Post by JimmyO »

JaySt wrote:@JimmyO, how about the latest patch currently begin backported to Window Server 2016 mentioned a few posts back? No hope for that one ?
That´s the one that made merge time almost same as NTFS. Better, but not good enough...
JaySt
Service Provider
Posts: 415
Liked: 75 times
Joined: Jun 09, 2015 7:08 pm
Full Name: JaySt
Contact:

Re: REFS 4k horror story

Post by JaySt »

thanks for trying the last 6 months and reporting back, seriously appreciated. it's good to have this kind of feedback from the real world and to have people try this new tech. it's really dissapointing to see this technology and solution fail for such a long time.
Veeam Certified Engineer
thomas.raabo
Service Provider
Posts: 28
Liked: 11 times
Joined: Oct 31, 2016 6:27 pm
Full Name: Thomas Raabo
Location: infrastructure guy
Contact:

Re: REFS 4k horror story

Post by thomas.raabo » 1 person likes this post

JimmyO wrote:OK - I now officially give up ReFS. I have initially worked with Veeam and later on the ReFS development team at MS for the last 6 months and the results are not very good.
Tested all possible regsettings, different versions of refs.sys, different version of ntfs core files but performance is still a lot worse than using NTFS.
While ReFS seems to be a good idea for smaller backups, it´s a disaster for larger ones. My vbk:s are about 60TB and my daily backups are 6TB.
I managed to get merge times to almost the same as when using NTFS, but disk load during backup is way higher with ReFS compared to NTFS. Disk queue lenght during backup went from 0,05 to 1+.
Now I have a 3+ week migration back to NTFS ahead of me :/
Hi JimmyO

Maybe i could dump my settings from registry to compare.
With the ReFS beta driver my problems have gone away.

My setup is as follows.
4xUCS C3260M4
256GB Ram
64x10TB running Raid6 on Cisco HW controllor.
2620v4 CPU

Doing around 40 TB backup.
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS 4k horror story

Post by JimmyO »

Thanks Thomas,

I use a HP server, DL380Gen8 High Performance. 2x2650 CPU, 192 GB ram , 4x raid setups of 35x4TB using raid 6 on HP P431 Controller.

Pls. let me know your current registry keys/settings (at the moment I have removed them all..).
Also - are you running the latest cumulative windows patch (2017-08, KB4034658) ?
Locked

Who is online

Users browsing this forum: Bing [Bot], Creede1, Semrush [Bot], ybarrap2003 and 130 guests