Host-based backup of VMware vSphere VMs.
maja
Novice
Posts: 3
Liked: never
Joined: Oct 03, 2012 2:38 pm
Full Name: Marco Janse
Location: The Netherlands
Contact:

Win2016 ReFs repository server freeze during Back-up copy

Post by maja »

We have Veeam 9.5 Update 1 running and are using Windows 2016 with ReFS as backup-repository servers. Sometimes during back-up or copy, the repository servers completely freezes. The Win2016 repository server is a vSphere Virtual Machin and when this happens, the VMware tools service stops running and I can no longer connect to the VM using any possible method (direct console, RDP, PowerShell Remoting, SMB, etc)

Backups and copies show messages like this:

'Unable to allocate processing resources. Error: No scale-out repository extents are available. '

When I do a hard reset of the Win2016 repository VM, the machine comes back online and jobs usually continue, but the OS does not show any related cause in the event log.

Any ideas?
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by Gostev »

Large repository and 4KB cluster size used for ReFS?
maja
Novice
Posts: 3
Liked: never
Joined: Oct 03, 2012 2:38 pm
Full Name: Marco Janse
Location: The Netherlands
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by maja »

Yes, large repository. I logged a case with Veeam for this: 02060234.

Last week, we have now changed the cluster size to 64K and the freezes have not occurred ever since.
However, backup file sizes are getting a lot larger now... :(
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by Gostev »

Should be no more than 10% larger - good price for stability until Microsoft figures out those 4KB cluster size problems!
davecla
Enthusiast
Posts: 26
Liked: 4 times
Joined: Feb 03, 2016 9:40 pm
Full Name: Dave Clarke
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by davecla »

I'm having exactly the problem described here

Windows 2016 VM with ReFS as repository server. During back-up or copy, the repository servers freezes (seems to slow, then freeze over a short period of time). VMware tools service stops running and I can no longer connect to the VM.
The repository is on an RDM. Lots of resources available to the server.

Have to power cycle to restart. Nothing in the logs....

I'm running update 2 and the ReFS volume is using 64kb clusters (i triple checked)

Havn't logged a call yet. Hoping someone here has the answer.....
mwvme
Expert
Posts: 163
Liked: 33 times
Joined: Dec 05, 2015 10:19 pm
Full Name: Michael White
Location: Calgary, Alberta Canada
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by mwvme »

Hello there,
The fact that VMware Tools freeze makes this quite interesting. I would not expend that with a ReFS issue. I suggest you talk to support. You can start with us and I bet we can help.

Michael
Michael White
Field Product Manager
https://notesfrommwhite.net
@mwVme
davecla
Enthusiast
Posts: 26
Liked: 4 times
Joined: Feb 03, 2016 9:40 pm
Full Name: Dave Clarke
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by davecla »

I restarted the server and sat and watched the copy job progress and resource monitor for about 45mins today.

As the backup copy job ran, the server got noticeably slower from a UI perspective and then stopped. Vsphere posted a virtual machine CPU usage error. While I watched resource monitor on the guest there was no sign of high CPU or memory usage.
I disabled the backup-copy jobs and restarted the server. The small backup job running in the site completes ok. The backup-copy job from that site to another site also completes ok.

I've got some new storage being install at that site next week, so will try moving the repository back to NTFS. There's about 50TB so the move will take so time :-(
davecla
Enthusiast
Posts: 26
Liked: 4 times
Joined: Feb 03, 2016 9:40 pm
Full Name: Dave Clarke
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by davecla »

To add to above, server is still running happily today with the backup copy jobs disabled.
mfirewalker
Influencer
Posts: 23
Liked: 4 times
Joined: Jul 07, 2017 9:58 am
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by mfirewalker »

I have a similar issue with a physical Windows Server 2016 Standard machine as ReFS fast clone repository and 4 KB allocation unit (4.4 of 17.1 TB used). It previously happened only every week or so, but today I had to power cycle 4 times in a row because the server immediately locked up completely as mentioned above. For me it is related to backup jobs rather than backup copy jobs, I had to stop the running jobs and disable them to actually do anything again on the repository server. With the jobs stopped, the server did not freeze. I am now unable to do backups. I am currently running update 2 for 9.5 but was previously running update 1 with the same issue.
kwinsor
Service Provider
Posts: 54
Liked: 4 times
Joined: Oct 17, 2014 1:26 pm
Full Name: Kent Winsor
Location: Toronto
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by kwinsor »

Hi, we have the same issue. We have a physical server connected to Dell MD direct attached storage running 2016 and REFS for the repo drive. We use it as a target for copy jobs from our other datacenter. The server dies nearly every day and it seems it's with copy jobs that contain VM's with VMDK's 2 TB or larger. If we disable jobs with large VMDK's it's ok. Our allocation unit size is 64 KB. The server will get slower and slower until you can no longer RDP to it. We have to connect via iDRAC and power cycle it.
Thank you,
Kent
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by tsightler »

So my theory is that Veeam is attempting to write data to the filesystem faster than it can keep up with, which is causing Windows to buffer these writes and, as memory is exhausted, you get a hang. It is just a theory, but it fits the pattern and is similar to issues seen with NTFS in some cases, although in those cases the system doesn't hang completely, just slows to a crawl. This is just a theory, but if correct, I have a few options that might be worth testing:

1) Use performance monitor to see the actual write rate to the ReFS volume during the backup copy, then set the write throttle in Veeam on the repository to about 80% of this value. This will throttle writes to the repo and slow down the copy job, but should keep it from needing memory for buffering.

2) If the above doesn't work, try the UseUnbufferedAccess registry key. I actually haven't tested this behavior yet, but I believe it will open all backup files with direct access, bypassing the OS buffer cache. I'll try to test the behavior in my lab in the next couple of weeks, but it might be worth a shot just to see if there is any impact.
mfirewalker
Influencer
Posts: 23
Liked: 4 times
Joined: Jul 07, 2017 9:58 am
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by mfirewalker »

I have now copied almost 9 TB of backup files to change the cluster size of the ReFS repository from 4K to 64K. I will now resume the backup jobs and see how it goes. However, reading other posts here I expect the issues to continue (or arise again as backup sizes increase). Unfortunately the fast clone advantage is now gone for the existing files because I had to copy them. I will consider the steps described by tsightler, however, deeper investigation into the technical details by Veeam would be much appreciated. For the record: largest files are around 2 TB on the affected Repository.
kwinsor
Service Provider
Posts: 54
Liked: 4 times
Joined: Oct 17, 2014 1:26 pm
Full Name: Kent Winsor
Location: Toronto
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by kwinsor »

We are not seeing the server run out of memory. Our monitoring software doesn't see it go above 50%.
Thank you,
Kent
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by tsightler »

Not sure what monitoring tool you are using, but I've found that many do not include Standby memory in the calculation for memory utilization. Regardless, it does not require a complete exhaustion of memory for the symptom to occur and yet still be related to memory buffering. I didn't see if where you mention how much memory vs storage you have, but a huge percentage of performance and hang issues that I've seen in the field have been resolved by increasing the amount of memory available, regardless of whether it seemed the system was under memory pressure or not.
kwinsor
Service Provider
Posts: 54
Liked: 4 times
Joined: Oct 17, 2014 1:26 pm
Full Name: Kent Winsor
Location: Toronto
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by kwinsor »

My server is in the process of dying as we speak. SNMP has tripped. Here's a snippet of resource monitor and the server appears idle. It's not accepting any data. I cannot get Windows Explorer to show any contents of the repository drive. YOu can see in my image the veeam copy job was stuck for 4 hours on one HDD of a single server. If we kill the Veeam Data Mover Service the server functions normal. We happened to login at the right time today to catch this server before it became unresponsive. There's something wrong with Veeam on Windows 2016 with REFS or maybe 2016 in general. All my other local repos are Windows 2012 R2 with NTFS. I'm building another with REFS on 2016 as a VM today and will test with large transfer.

http://ibb.co/d8cBua
http://ibb.co/eYG4Ea
Thank you,
Kent
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by tsightler »

There's an entire 30 page thread on issues with Veeam + ReFS in the main forum, we have lots of data on cases that work, and cases that cause issues, and are still collecting more data. I have probably talked to 100+ customers that are having no issues, vs a dozen or so that have. The ones that have experience issues have had many various problems, from anti-virus, to firmware drivers, to things addressed by the Microsoft hotfix. We haven't found resolutions for a few of these cases.

In some cases there are hotfixes from Microsoft that address some of these, but the backup copy/hang case is a little less understood, but in my experience it always occurs when the system has "free memory", but it is all in standby, i.e. for whatever reason Windows is unable to buffered memory to disk to truly free it. Unfortunately, unless I'm missing it, your screenshots of resource manager cut off the portion that breaks down the "free" memory into Standby vs Free.

Of course, there could also be issues that is not yet discovered, I was just trying to offer some suggestions based on what I have seen in the field with these clients. Do you have a support case?
kwinsor
Service Provider
Posts: 54
Liked: 4 times
Joined: Oct 17, 2014 1:26 pm
Full Name: Kent Winsor
Location: Toronto
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by kwinsor »

Thanks. We have a case as of yesterday. Veeam Support - Case # 02221686
Thank you,
Kent
jamerson
Veteran
Posts: 366
Liked: 24 times
Joined: May 01, 2013 9:54 pm
Full Name: Julien
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by jamerson »

We have the same issue with quiet a lot of customer with Windows Server 2016 with REFS Disks.
actually haven't thought about the REFS allocation.
is there is a way to check which allocations is the hdd formatted with to narrow down this issue ?
mikegodwin
Enthusiast
Posts: 54
Liked: 1 time
Joined: Oct 12, 2012 12:28 am
Full Name: Mike Godwin
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by mikegodwin »

Can you run this (assuming E: drive is your ReFS drive), and look for Bytes Per Cluster:

Code: Select all

fsutil fsinfo refsinfo e:
ferrus
Veeam ProPartner
Posts: 299
Liked: 43 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by ferrus »

From the ReFS thread in the main forum, as I think it's probably relevant here:
It looks like Microsoft have released the official patch now.

Code: Select all

Addressed performance issues in ReFS when backing up many terabytes of data. 
Addressed issue where a stuck thread in ReFS might cause memory corruption.
https://support.microsoft.com/en-gb/help/4025334
jamerson
Veteran
Posts: 366
Liked: 24 times
Joined: May 01, 2013 9:54 pm
Full Name: Julien
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by jamerson »

mikegodwin wrote:Can you run this (assuming E: drive is your ReFS drive), and look for Bytes Per Cluster:

Code: Select all

fsutil fsinfo refsinfo e:
just checked one Storages and has the next :

C:\Windows\system32>fsutil fsinfo refsinfo f:
REFS Volume Serial Number : 0xd0ba8eb4ba8e971e
REFS Version : 3.1
Number Sectors : 0x00000003a37e0000
Total Clusters : 0x000000000746fc00
Free Clusters : 0x0000000004c05b8d
Total Reserved : 0x0000000000077c09
Bytes Per Sector : 512
Bytes Per Physical Sector : 4096
Bytes Per Cluster : 65536
Checksum Type: CHECKSUM_TYPE_NONE


as I believe the Storage is 4096 bytes .

and the off site storage

C:\Windows\system32>fsutil fsinfo refsinfo d:
REFS Volume Serial Number : 0x84ae3af2ae3adc7c
REFS Version : 3.1
Number Sectors : 0x00000001d10a0000
Total Clusters : 0x000000003a214000
Free Clusters : 0x000000000e86ad32
Total Reserved : 0x00000000003c0ba0
Bytes Per Sector : 512
Bytes Per Physical Sector : 512
Bytes Per Cluster : 4096
Checksum Type: CHECKSUM_TYPE_NONE

which one need to be reformatted to 64k?
davecla
Enthusiast
Posts: 26
Liked: 4 times
Joined: Feb 03, 2016 9:40 pm
Full Name: Dave Clarke
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by davecla » 1 person likes this post

To close off my experience here - After working with Veeam support for a few weeks and trying numerous options we have given up on ReFS and move our repo back to NTFS - everything working fine now.
ing:DT79
Novice
Posts: 5
Liked: never
Joined: Aug 09, 2016 7:24 am
Full Name: Ing. DAVIDE TONINI
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by ing:DT79 »

I have the same issue. No news from the community about this topic?
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by mkretzer »

We are also giving up on REFS. I think there is no hope that it will be fixed soon...
ing:DT79
Novice
Posts: 5
Liked: never
Joined: Aug 09, 2016 7:24 am
Full Name: Ing. DAVIDE TONINI
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by ing:DT79 »

What's your experience mkretzer?
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by mkretzer »

How it all began you can read in the first post of the 4 k horror story.

We had many tries and basically are switching between our production backup repo and a temporary repo for 7 months now.

We had:

- Crashes with 4 k Block size (no longer happening with 64 k and 384 GB of RAM)
- Merges are ultra-fast the first week and then take longer and longer every week - a backup which took slightly over an hour now takes 10 hours after 5 weeks (still happening)
- While merging concurrent running backups take forever (somewhat better since the latest patches)
- After backups were deleted the filesystem gets even slower (somewhat better since the latest patches)
- Active fulls start extremly fast and then creep down to 1/6 performance the backend is capable of, stalling every few minutes (still happening)

Right now my theory is that integrity streams are to blame for the slow write speed. The problem i Veeam support told me there is no option to turn that off - which i find terrible as our backend storage already does silent data corruption protection and now it is done on two levels...

Markus
ing:DT79
Novice
Posts: 5
Liked: never
Joined: Aug 09, 2016 7:24 am
Full Name: Ing. DAVIDE TONINI
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by ing:DT79 »

Very interesting experience!

In my case I have a DELL PowerEdge R730xd as Windows Server 2016 repository, with 10G nics: during the first Active Full I see the transfer rate growing up quickly, but after a few minutes slowing down completely to 0! The job is still running, so after a long time the transfer rate goes up again...the overall speed and time are very poor!!

There aren't any kind of logs from the operating system, so I opened a support ticket to Veeam.

I think that the problem could be at low-level inside the ReFS: without SSD caching (tiering) enabled is too slow to write data to physical disks.
jamerson
Veteran
Posts: 366
Liked: 24 times
Joined: May 01, 2013 9:54 pm
Full Name: Julien
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by jamerson »

We are having the same issue with couple of our customer,
the solution was is go back to NTFS untill is this fixed in the next release.
tomkod
Lurker
Posts: 1
Liked: never
Joined: Feb 01, 2018 7:22 am
Full Name: Tomasz Dorocicz
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by tomkod »

Hello @ll,

we are experiencing the same issue in our infrastructure.
In our case - there is Veeam Server running on Windows 2016 which has iSCSI Disk attached directly to the System and formatted as 4TB with ReFS 64K. This drive is set in Veeam as Repository for Copy Job to take advantage of Fast Clone feature etc.
At the first run everything was running smoothly, but after transfer about 1TB data it has started to freezing and performance drops dramatically. The job is still running but the transfer goes up for about minute than it freeze for about 5-10minutes. When its frozen than the iSCSI drive mapped as Drive Z:\ in System is unavailable.
I was changing the network configuration and iSCSI Target settings without knowing that it's the ReFS issue.
BTW. If the Copy Job is disabled, this issue not occurs. I can copy some large files directly to that drive without any problems. It starts to happening when the Copy Job is running.
For now the solution as I read is back to NTFS but it's realy
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Win2016 ReFs repository server freeze during Back-up cop

Post by Gostev »

Microsoft is releasing the fix for this ReFS issue in the next cumulative update (there's the main 50 pages thread about this issue, this is a duplicate discussion).
Post Reply

Who is online

Users browsing this forum: Ivan239, NightBird and 94 guests