REFS 4k horror story

Availability for the Always-On Enterprise

Re: REFS 4k horror story

Veeam Logoby mkretzer » Tue Apr 18, 2017 9:34 am

@adruet
When you say "7 TB of backup files" - how many files are you talking about? Did you use per-VM?
As said before my "feeling" is that the number of files also play a role, thats why we disabled per-VM.
mkretzer
Expert
 
Posts: 281
Liked: 61 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby adruet » Tue Apr 18, 2017 1:24 pm

We have one job per VM.
That was about: 20 jobs with 30 points of retention max, so I would say arround 620 files (including the vbm files).
That is not an extraordinary demand for a production file system, is it ?
adruet
Influencer
 
Posts: 22
Liked: 6 times
Joined: Wed Oct 31, 2012 2:28 pm
Full Name: Alex

Re: REFS 4k horror story

Veeam Logoby Rmachado » Tue Apr 18, 2017 3:35 pm

Theres a lot of change since the beggining of the REFS and some patchs (i read all)

Is there any official recommendation from Veeam about the use of REFS and 64k ? WE're begging a project with 60 - 80 VMS with about 90TB of Storage and 32GB of Ram.

Should i change the repository to NTFS to be safe? Or can i use REFS?

thank you.
Rmachado
Service Provider
 
Posts: 1
Liked: never
Joined: Thu Dec 15, 2016 11:39 pm

Re: REFS 4k horror story

Veeam Logoby kubimike » Tue Apr 18, 2017 8:02 pm

everyone thats having this issue what is your blocksize at the controller ? My drive latency problems went away when I made it smaller. Unsure if its related but I'd figured I would throw that out there.
kubimike
Expert
 
Posts: 197
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby tsightler » Tue Apr 18, 2017 8:33 pm

Rmachado wrote:Is there any official recommendation from Veeam about the use of REFS and 64k ? WE're begging a project with 60 - 80 VMS with about 90TB of Storage and 32GB of Ram

64K is the official recommendation from Veeam at this time and I would recommend no less than 4GB of RAM per task, ideally more if you can.
tsightler
Veeam Software
 
Posts: 4714
Liked: 1717 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: REFS 4k horror story

Veeam Logoby lepphce1 » Fri Apr 21, 2017 3:44 pm

This is a pretty long thread so I apologize if this has already been asked...

May I *gently* ask if the Veeam folks following this thread have been able to replicate this in your lab? The reason I ask, is the top Google search for "Server 2016 ReFS crash" is a Veeam thread, and not much else. For example, is there a correlation on how Veeam is using block alignment that exacerbates some kind of bug in ReFS? In other words, it seems like it's Veeam users who are getting the brunt of whatever is happening here, and I am wondering if Veeam is taking part in this investigation with Microsoft in any way? I understand that there are a lot of little differences in what we are all seeing, but the common thread here is server instability with this combination of products.

Thank you for your consideration...
lepphce1
Enthusiast
 
Posts: 27
Liked: 2 times
Joined: Tue Jun 28, 2016 4:40 pm

Re: REFS 4k horror story

Veeam Logoby Gostev » Sat Apr 22, 2017 11:47 am

Yes, we've been in touch with the ReFS development team on this issue for a while now. Right now, they are working with one of our customers who has the issue reproducing most consistently. Internally, we do not have a lab that replicates the issue (and based on our support statistics, it does not seems to be very common in general ).
Gostev
Veeam Software
 
Posts: 21250
Liked: 2317 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: REFS 4k horror story

Veeam Logoby rendest » Sun Apr 23, 2017 9:16 pm

Gostev wrote:Yes, we've been in touch with the ReFS development team on this issue for a while now. Right now, they are working with one of our customers who has the issue reproducing most consistently. Internally, we do not have a lab that replicates the issue (and based on our support statistics, it does not seems to be very common in general ).

Are there any case ID's we can refer to ?

Last weeks update made matters worse, resulting in the VBR's crashing overnight again. So besides the terrible performance, we now face rebooting our vbr's every 24hours.
rendest
Influencer
 
Posts: 18
Liked: 5 times
Joined: Wed Feb 01, 2017 8:36 pm
Full Name: Stef

Re: REFS 4k horror story

Veeam Logoby j.forsythe » Mon Apr 24, 2017 8:07 am 1 person likes this post

adruet wrote:
Based on my HP Hardware, 4 servers like this:
- HP DL380 Gen9 with dual CPU Intel E5-2660 v4 2Ghz, 64 GB of RAM, raid 1 SSD for the OS, and 2 NVMe 800GB disks
- Dual 10 Gbit network cards (HP 560FLR) supporting the offloading of SMB v3 (RDMA capabale)
- 2 x DAS HP D3700 with 25 x 1.8TB 12G 10K SAS disks configured as Raid 6 with HP p441 controller

I have done some storage spaces (and Storage Spaces Direct) testing, and the results were not very promising in terms of performance.
When the annonce of the licensing being Windows Server Datacenter only, we dropped the idea of ever using storage spaces direct.
So we tried to use storage spaces localy, using the NVMe disks as journal disk to improve performance.
But comparing the results using a veeam backup profile with diskspd between our p441 controller in HBA mode with storage spaces and the NVMe as journal disks (write cache for the volume) and parity for the rest of the D3700 disks, and standard Raid 6 with the p441 controller, we decided to stick with the p441 and raid 6 as it was faster and less CPU consuming.
Regarding RAM usage, this is probably due to ReFS, and you can check that with Sysinternals RAMMap.

Hi and thank you for the information.

Yeah I think I will disable Storage Spaces and go back to recreating the RAID with the HP controller and NTFS.
Even if the Veeam officials keep praising the ReFS solution and keep telling us that we are only a few having this problem, people are loosing backup data and that should not be something they just accept.
Seeing the weekly email from Gostev praising Windows 10 with ReFS 3.1 as a cheap, good solution for ROBO sites made me very disappointed.
And I sure hope, that future users using ReFS won't have to face the problem....

John
j.forsythe
Influencer
 
Posts: 11
Liked: 3 times
Joined: Wed Jan 06, 2016 10:26 am
Full Name: John P. Forsythe

Re: REFS 4k horror story

Veeam Logoby MatBac » Mon Apr 24, 2017 1:22 pm

Gostev wrote:All, here is the official KB article from Microsoft > FIX: Heavy memory usage in ReFS on Windows Server 2016 and Windows 10
Please don't forget to install KB4013429 before applying the registry values, and remember to reboot the server after doing so.

Finally, please do remember to share what option has worked for you!


Hi!

The KB 4013429 can’t be installed om my system since it is intended for Server 2016 (OS Build 14393.953) and I have (OS Build 14393.1066).
In the packet details of KB4013429 in Microsoft Update Catalog is shows that it has been replaced with KB4015438 (OS Build 14393.969), KB4016635 (OS Build 14393.970) and KB4015217 (OS Build 14393.1066 and 14393.1083).

I can see that update KB4015217 was installed via Windows Update a couple of days ago. Since these are cumulative updates does this 4015217 include all the ReFS fixes from KB4013429 that I need?

Regarding the ReFS RegKeys in question. Are these supposed to be added automatically by this KB-fix or should I create these keys manually? None of them are present in my registry right now even thou I have KB4015217 installed…
MatBac
Lurker
 
Posts: 1
Liked: never
Joined: Mon Apr 24, 2017 12:56 pm
Full Name: Mattias Backrud

Re: REFS 4k horror story

Veeam Logoby dmartenstyn » Mon Apr 24, 2017 1:56 pm

So I've been watching this issue closely with a vested interest since I have just deployed a solution utilising Veeam, a local repository on ReFS and Windows 2016.

My physical Veeam B&R server is fairly hefty (2 x E5-2667 v4's, 256GB RAM with a LSI MegaRAID 9361-4i attached to 16 x HGST Deskstar 4TB HDD's in a RAID10). ReFS was initially formatted at 4k but having stumbled on this thread (luckily at the beginning of deployment) I blew the config away and went with 64k instead. I've not experienced any issues thus far, fingers crossed, however my backup jobs are fairly small (1 job containing 14 VM's -> 400GB for a full to local repository and another that goes to a weekly rotated external hard disk). I've had RamMap running in the background since the start and the Metafile has creeped in usage (currently at 6.1GB). Free memory seems fine at 223GB. I've not made any updates or applied any patches to Windows (Veeam is 9.5.0.823).

This environment goes live in approximately 3-4 weeks so I'm stuck in the middle a tad. Whilst I am fully aware we have personally experienced no issues as of yet I am a bit hesitant since I am a contractor here and once I leave my client will effectively be on their own with the architecture. I am at the stage that I can effectively blow the config away again and go with NTFS if required. The lack of information from Veeam / Microsoft is a little concerning it must be said.
dmartenstyn
Lurker
 
Posts: 1
Liked: never
Joined: Mon Apr 24, 2017 1:35 pm

Re: REFS 4k horror story

Veeam Logoby lepphce1 » Mon Apr 24, 2017 2:00 pm

@Gostev,
Thanks for the reply. I've not opened a ticket with Veeam up to this point because I didn't think anything worthwhile could be immediately remedied by support. Would you like those of us who are having ReFS troubles to open a Veeam ticket on this issue, if we have not already done so?
lepphce1
Enthusiast
 
Posts: 27
Liked: 2 times
Joined: Tue Jun 28, 2016 4:40 pm

Re: REFS 4k horror story

Veeam Logoby evander » Mon Apr 24, 2017 3:26 pm 1 person likes this post

Just a thought for those that are at the point where they are building a new repository and have to make a decision which way to go, ReFS or NTFS. If you have plenty of disk space and/or your available disk space will take a while to fill up why not create two volumes on the same server and format one ReFS and one NTFS. If you can, run your backup window twice per night to each one, (NTFS first I suggest) and then if Microsoft finally fix ReFS you can simply blow away the NTFS and extend the volume, or just split your backup jobs between two ReFS volumes.
The benefits of ReFS are really great (if it works) so build your ReFS and cover your bet with NTFS.
I understand your concern may be that if the server locks up nothing will backup but that again can be less stress if you are forced to blow away (or simply dismount/pull-out) your ReFS partition the server will still be up and running on the NTFS partition ready to resume backups. This is also only if you are one of the unlucky ones that has this problem with ReFS as its very sporadic at best and not everyone seems to be affected, myself included.

This is especially easy if your repository is running as a VM but not that much more work if its running on a physical server and worth the extra admin if you ask me.

2 cents.
evander
Enthusiast
 
Posts: 59
Liked: 3 times
Joined: Thu Nov 17, 2011 7:55 am

Re: REFS 4k horror story

Veeam Logoby Gostev » Mon Apr 24, 2017 7:24 pm

lepphce1 wrote:@Gostev,
Thanks for the reply. I've not opened a ticket with Veeam up to this point because I didn't think anything worthwhile could be immediately remedied by support. Would you like those of us who are having ReFS troubles to open a Veeam ticket on this issue, if we have not already done so?

No, we really want everyone experiencing the issue to open a ticket with Microsoft instead, to help raise the priority of this issue on their side.
Gostev
Veeam Software
 
Posts: 21250
Liked: 2317 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: REFS 4k horror story

Veeam Logoby alesovodvojce » Mon Apr 24, 2017 8:51 pm

After week of tests, and after countless ReFS horror days personally lived here, we have copied our Refs 4k repo to different filesystems to see the size differences. Here it is.

In actual numbers
ReFS 4k: 21 TB (source repo)
ReFS 64k 31 TB at least - we had to stop the file copy as the underlying disks runs out of free space
NTFS: 31 TB at least - same reason to stop. Finally we have shrinked source repo to 13 TB by deleting it files. After that, the target NTFS partition copied that size to 24 TB. So 13 TB Refs 4k made 24 TB NTFS).

Generalized
Refs 4k - best space saver. But lot of troubles (as in this thread)
Refs 64k - not a win in space saving, whilst still lot of refs benefits. but, the troubles will theoretically start as well, they are just postponed for later (when the repo size grows over unsaid limit)
NTFS - not win in space saving, no special benefits. Main benefit is stable filesystem = backups secured

We migrated first repo to NTFS now, enjoying stable backups. Second repo remais in Refs 4k for now for experiments
alesovodvojce
Enthusiast
 
Posts: 26
Liked: 1 time
Joined: Tue Nov 29, 2016 10:09 pm

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: No registered users and 28 guests