REFS 4k horror story

Availability for the Always-On Enterprise

Re: REFS 4k horror story

Veeam Logoby rendest » Mon Feb 06, 2017 2:12 pm 1 person likes this post

mkretzer wrote:No. In his mail there was not one mentioning of something about the 4 k cluster size... That is also a reason i am kind of caucious about this recommendation.

We more or less found a way to circumvent the filesystem from being stuck as shown in the screenshot below.

Image

Edit: Yes it looks like I just drew a white box, it's actually white-space where the windows kernel forgets the disk is ReFS.

We notice that the backup repo's (now newly formatted to 64KB cluster size) still cause the filesystem to be unresponsive, so we tried throttling the repositories to a lower throughput. Surprisingly, this significantly improved our performance. Since the volume doesn't become unresponsive, Veeam can now backup consistently without being interrupted by the unresponsiveness of the volume.

The throughput is still significantly slower than they would have been on NTFS (we are going to log a case for this as well), but at least, it's stable.

We suspect that the lower block size on the previous formatted volume, resulted the volume to get stuck even faster (in fact, 16x faster). We are backupping from All flash storage arrays, so our bottleneck almost always is our destination storage target.

We are currently monitoring the incoming IO's and as soon as it reaches its limit and causes storage latency on the backup target, the filesystem becomes unresponsive. So throttling temporarily circumvents this issue. This, however, isn't a permanent solution since, even with storage latency, the filesystem should keep on working. A 20-30 ms hiccup on the storage lun causes a 20 second unresponsiveness of the ReFS volume, which in turn brings Veeam to a halt...

@Mkretzer, have you tried throttling as well ?
rendest
Influencer
 
Posts: 18
Liked: 5 times
Joined: Wed Feb 01, 2017 8:36 pm
Full Name: Stef

Re: REFS 4k horror story

Veeam Logoby ivordillen » Mon Feb 06, 2017 3:36 pm

Maybe a confirmation.

I have 2 repo's 64KB cluster size and was testing some backup jobs and some backup copy jobs. Everything was acting normal until I did 2 backup copy jobs at the same time (to the same repo) Then I saw drops (veeam job timeline) in both the jobs at the same time. In the windows resource monitor I saw at disk level alot of writes but no file (and the memory consumption went straight up) - stopping one of the jobs was a solution for the other job to proceed as normal.

Ivor
ivordillen
Enthusiast
 
Posts: 59
Liked: never
Joined: Thu Nov 03, 2011 2:55 pm
Full Name: Ivor Dillen

Re: REFS 4k horror story

Veeam Logoby mkretzer » Mon Feb 06, 2017 4:10 pm

@rendest No we did not try to throttle as our main problem was that with 4 K and without the (bad) patch the whole system crashes. But you might be right as the unresponsiveness happened as soon as there is some kind of load.

Do you already have KB3216755 installed? I have the feeling this update makes the volume much more stable under load - but crashes Veeam services after 12 hours or so...
mkretzer
Expert
 
Posts: 251
Liked: 61 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby rendest » Mon Feb 06, 2017 4:12 pm

Good to hear, so now it's up to Veeam by patching whatever Microsoft broke. Since you mentioned it causes Veeam to crash, and our setups are identical, we are waiting for feedback from Veeam before attempting to install the patch.
rendest
Influencer
 
Posts: 18
Liked: 5 times
Joined: Wed Feb 01, 2017 8:36 pm
Full Name: Stef

Re: REFS 4k horror story

Veeam Logoby rendest » Wed Feb 08, 2017 9:17 am

KB4010672 doesn't seem to fix the time-out issues when experiencing latency... so throttling it is for now :(
rendest
Influencer
 
Posts: 18
Liked: 5 times
Joined: Wed Feb 01, 2017 8:36 pm
Full Name: Stef

Re: REFS 4k horror story

Veeam Logoby mkretzer » Wed Feb 08, 2017 10:31 am

But can you throttle a fast-clone? In our system the fast-clone still lead to high load and the problem described here
mkretzer
Expert
 
Posts: 251
Liked: 61 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby rendest » Wed Feb 08, 2017 10:36 am

mkretzer wrote:But can you throttle a fast-clone? In our system the fast-clone still lead to high load and the problem described here


As I understood, fastclone are just commands to move block pointers, so that shouldn't be that intensive.

But there are other maintenance tasks, which do not follow the throttling (for example cleanup/rollback tasks after a failed backup). Those were quite intensive & took our repository hostage overnight.
rendest
Influencer
 
Posts: 18
Liked: 5 times
Joined: Wed Feb 01, 2017 8:36 pm
Full Name: Stef

Re: REFS 4k horror story

Veeam Logoby mkretzer » Wed Feb 08, 2017 10:39 am

So sadly this is not a good solution for us...
@Gostev: do you see any efforts from microsoft to get this strange latency issues under controll? Was this reproduced by Veeam with 64K blocks?
mkretzer
Expert
 
Posts: 251
Liked: 61 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby ivordillen » Wed Feb 08, 2017 12:34 pm

on what numbers do you throttle?
ivordillen
Enthusiast
 
Posts: 59
Liked: never
Joined: Thu Nov 03, 2011 2:55 pm
Full Name: Ivor Dillen

Re: REFS 4k horror story

Veeam Logoby rendest » Wed Feb 08, 2017 12:37 pm 1 person likes this post

10 mb/s less of where ReFS craps its pants. Depends on the array.
rendest
Influencer
 
Posts: 18
Liked: 5 times
Joined: Wed Feb 01, 2017 8:36 pm
Full Name: Stef

Re: REFS 4k horror story

Veeam Logoby ivordillen » Wed Feb 08, 2017 8:29 pm

we need a latency throttling feature on the repository instead of the source side :-)
ivordillen
Enthusiast
 
Posts: 59
Liked: never
Joined: Thu Nov 03, 2011 2:55 pm
Full Name: Ivor Dillen

Re: REFS 4k horror story

Veeam Logoby Delo123 » Thu Feb 09, 2017 2:25 pm

Especially the 64 TB thing is kind of a deal breaker for us...

Just catching on here, why is the 64TB a deal breaker? You can have a lot of thin provisioned volumes with sotrage spaces as example.
Ps. the 64TB "limit" is a VSS limit...
Delo123
Expert
 
Posts: 330
Liked: 92 times
Joined: Fri Dec 28, 2012 5:20 pm
Full Name: Guido Meijers

Re: REFS 4k horror story

Veeam Logoby rendest » Thu Feb 09, 2017 2:30 pm

Delo123 wrote:
Just catching on here, why is the 64TB a deal breaker? You can have a lot of thin provisioned volumes with sotrage spaces as example.
Ps. the 64TB "limit" is a VSS limit...


The 64TB volumes aren't relevant anymore, since we're experiencing these issues at any lun size.

What mkretzer means is that we'd rather have larger jobs on larger volumes for more space savings. (so no per-vm backup file or scale out repo)
rendest
Influencer
 
Posts: 18
Liked: 5 times
Joined: Wed Feb 01, 2017 8:36 pm
Full Name: Stef

Re: REFS 4k horror story

Veeam Logoby mkretzer » Thu Feb 09, 2017 3:18 pm

Our problem is that we have BIG backup files (up to 8 TB) and it just would not fit very well on such small volumes with all the incrementals.

Per-VM backup files is no solution for us right now...

And BTW storage spaces is not supported on disks behind RAID/FC/SAN controllers... Or did that change with 2016?
mkretzer
Expert
 
Posts: 251
Liked: 61 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby orb » Fri Feb 10, 2017 6:48 am

Out by curiosity, are you using hardware a raid controller, attached storage or JBOD with a storage pool ?

Olivier
orb
Influencer
 
Posts: 17
Liked: 3 times
Joined: Fri Apr 01, 2016 5:36 pm
Full Name: Olivier Bonemme

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Bing [Bot], Google [Bot], Google Feedfetcher and 15 guests