REFS 4k horror story

Availability for the Always-On Enterprise

REFS 4k horror story

Veeam Logoby mkretzer » Wed Feb 01, 2017 7:01 am

Hello,

posted several threads already in the last few days but i have to post again about what happened to us this night.

First of all right now we are in the middle of migrating to REFS repos. We made the error to use 4k blocks on our temporal 120 TB repo. We thought it is no bug deal as it seemed to impact performance of file operations only at first. We monitored memory and cpu usage and did not see the memory preasure others saw because the system is gladly oversized. So we continued to successfully migrate to the new repos.

All went good for a few days, we have to wait 28 days so we can format our "production" backup storage and we were optimistic that we would "survive" that time because of the REFS space savings.

Then i got a message from our monitoring system this night. Our Veeam server was completely unreachable. I went on-site and found that i can move the mouse but not much more. I had to do a hard reset. After the system came up i saw that it tries to create 3 synthetic fulls at the same time, do a tape backup and some copy jobs. All in all nothing unusual - this worked well the nights before. So i disabled the tape job, enabled a limit of 12 concurrent tasks on the repos (before there was no limit) to regulate the load a little bit and drove back home.

10 Minutes later the next alert came in - so we had another crash. So i drove back to the company, did a hard reboot and then limited the REFS repos to 1 concurrent task so that at least our BCJs can finish at some point in the future and started to roll back to our old NTFS repository - with active fulls which i have to do for 1600 machines/140 TB.

Opening a explorer window on the REFS volume takes half a minute even without any load now so it is definately the REFS volume which has issues...

BTW i opened a sev1 case with MS - no response yet....

Markus
mkretzer
Expert
 
Posts: 214
Liked: 49 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby oliverL » Wed Feb 01, 2017 7:19 am

thx for sharing this!

Would appreciate it, if you can update us on the status!

regards
oliver
oliverL
Influencer
 
Posts: 19
Liked: 3 times
Joined: Fri Nov 11, 2016 8:56 am
Full Name: Oliver

Re: REFS 4k horror story

Veeam Logoby mkretzer » Wed Feb 01, 2017 7:41 am

MS called - interestingly MS seems to know about the 4 k issues - at least he told me he heard something about issues....
mkretzer
Expert
 
Posts: 214
Liked: 49 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby mkretzer » Wed Feb 01, 2017 9:59 am

Ok this hotfix was recommended: https://support.microsoft.com/en-us/hel ... -kb3216755
Anyone already tried this? I asked for more information about this hotfix...
mkretzer
Expert
 
Posts: 214
Liked: 49 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby rendest » Wed Feb 01, 2017 8:39 pm

Well we're glad we're not the only ones having these issues.

https://forums.veeam.com/veeam-backup-replication-f2/9-5-refs-server-2016-memory-consumption-t39625-15.html

We can also confirm that, though memory usage seemed better at first, the patch does not solve the problem. Even our 64KB formatted 64TB luns are seeing these symptoms. Performance is very poor as well.
rendest
Influencer
 
Posts: 17
Liked: 5 times
Joined: Wed Feb 01, 2017 8:36 pm
Full Name: Stef

Re: REFS 4k horror story

Veeam Logoby mkretzer » Wed Feb 01, 2017 10:12 pm

@rendest are your 64 k volumes on the same server with the 4 k volumes?
mkretzer
Expert
 
Posts: 214
Liked: 49 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby rendest » Thu Feb 02, 2017 9:29 am 1 person likes this post

Not anymore since they were taking the kernel hostage.

We now isolated the 4K volumes on a seperate host, and migrating the data towards the 64K ones.
rendest
Influencer
 
Posts: 17
Liked: 5 times
Joined: Wed Feb 01, 2017 8:36 pm
Full Name: Stef

Re: REFS 4k horror story

Veeam Logoby Robvil » Thu Feb 02, 2017 11:44 am

I am also migrating to Refs. So reading the forum, i asume it is absolut best to use 64k volumes and stay away from 4k?
Robvil
Enthusiast
 
Posts: 33
Liked: 1 time
Joined: Mon Oct 03, 2016 12:41 pm
Full Name: Robert

Re: REFS 4k horror story

Veeam Logoby v.Eremin » Thu Feb 02, 2017 12:32 pm

Correct.
v.Eremin
Product Manager
 
Posts: 12370
Liked: 892 times
Joined: Fri Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin

Re: REFS 4k horror story

Veeam Logoby mkretzer » Thu Feb 02, 2017 1:23 pm

Robvil wrote:I am also migrating to Refs. So reading the forum, i asume it is absolut best to use 64k volumes and stay away from 4k?

And from all i have read in the past 36 hours you should test it really good before you throw all your backups on it... In our case all looked great up until there were a bigger number of files on the disk...
mkretzer
Expert
 
Posts: 214
Liked: 49 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby mkretzer » Thu Feb 02, 2017 2:11 pm

Ok i just got a very long email from Microsoft with alot of links where the general recomendation is "use NTFS because REFS has a many limitations". Only one thing was diretly targeted at our situation:

"You should avoid volumes bigger than 64 TB". I find this pretty bad because SOBR is not for us at the moment because we also had some issues with per-VM. And right now we have quite big backup files... For us, a bigger volumes is a must-have right now, if we split our 200 TB backup repo in 4 REFS repos we might loose alot of the REFS space saving benefits...

Markus
mkretzer
Expert
 
Posts: 214
Liked: 49 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby Gostev » Thu Feb 02, 2017 8:38 pm 1 person likes this post

Markus, can you share Microsoft support case ID where this was stated? I wonder if the development team behind ReFS agrees with this statement, or perhaps this is an opinion of the specific support engineer who is simply trying to close the case, as this often happens ;) the best way to find out is to ask the dev team behind ReFS directly - which I can easily do. Thanks!
Gostev
VP, Product Management
 
Posts: 20961
Liked: 2239 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: REFS 4k horror story

Veeam Logoby mkretzer » Thu Feb 02, 2017 10:26 pm

Gostev,

that would be so great - number is 117020115253831

Especially the 64 TB thing is kind of a deal breaker for us...

Markus
mkretzer
Expert
 
Posts: 214
Liked: 49 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby Skyview » Fri Feb 03, 2017 3:21 am

perhaps he meant avoid >64TB partitions *while using 4k cluster size* ? Because that, while not directly, sort of lines up with the inertia that veeam and microsoft have about using 64k for really large volumes.
Skyview
Service Provider
 
Posts: 22
Liked: 1 time
Joined: Tue Jan 10, 2012 8:53 pm

Re: REFS 4k horror story

Veeam Logoby mkretzer » Fri Feb 03, 2017 6:51 am

No. In his mail there was not one mentioning of something about the 4 k cluster size... That is also a reason i am kind of caucious about this recommendation.
mkretzer
Expert
 
Posts: 214
Liked: 49 times
Joined: Thu Dec 17, 2015 7:17 am

Next

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Bing [Bot], Gostev and 15 guests