Crap. Can you find out exactly what they've done in the beta 2 version and compare to whats been done today? I s*** you not its perfect, well at least for me. The symptoms you describe above were occurring on my machine (" This was causing the system to completely freeze (clock not updating in the task bar), and often BSOD. )" . I have 50 TBs of storage and 192 gigs of ram and the freezing still occurred. Granted I haven't' tried the latest driver but from what Im seeing its still a no go for me.Gostev wrote:Just to correct the expectations: the issues that we kept working with Microsoft for the past two years were not related to performance of Active Fulls or fast clone operations, but rather OS stability due to the retention processing and specifically, deleting large amount of backup files with ReFS block cloning in use. This was causing the system to completely freeze (clock not updating in the task bar), and often BSOD. These are the issues that were being discussed in this topic.
I don't expect the patch to be fixing any other issues, and actually I've not been aware of the two specific ones mentioned above to exist. For example, we have definitely not seen full backup performance in our labs, and the only times we saw fast clone performance issues was when the corresponding regression was temporarily introduced in the May 2018 Windows Updates.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
-
- Enthusiast
- Posts: 57
- Liked: 8 times
- Joined: May 09, 2011 12:43 pm
- Full Name: Sebastian
- Location: Germany
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
To clarify my post, i don´t expect an higher backup performance for full backups. Today and with the last two or three? MS patches that are ReFS relevant, we fill the 10GBit/s network link on full backup operations.Gostev wrote:Just to correct the expectations: the issues that we kept working with Microsoft for the past two years were not related to performance of Active Fulls or fast clone operations, but rather OS stability due to the retention processing and specifically, deleting large amount of backup files with ReFS block cloning in use. This was causing the system to completely freeze (clock not updating in the task bar), and often BSOD. These are the issues that were being discussed in this topic.
I don't expect the patch to be fixing any other issues, and actually I've not been aware of the two specific ones mentioned above to exist. For example, we have definitely not seen full backup performance in our labs, and the only times we saw fast clone performance issues was when the corresponding regression was temporarily introduced in the May 2018 Windows Updates.
So far so good.
The only struggle is, that the OS become unresponsive (SNMP, WMI) on Fast Clone and/or delete operations. So our monitoring stays in alert state for the VEEAM repositories in the main backup window.
It seems that a high count of Fast Clone operations is responsible for this behaviour. An option for simultaneous Fast Clone operations per repository would be nice.
Thanks.
-
- Influencer
- Posts: 14
- Liked: 4 times
- Joined: Jul 19, 2018 2:10 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
@Gostev, thanks for the clarification, I can't hide my huge disappointment though
We are experiencing extremely slow Fast Clone/Synthetic Full operations.
We have two identical servers (backup and replication), each with the following config:
128GB RAM
152TB repository
Cisco S3260 M4
24 SATA drives in RAID6
This system has been benchmarked to write 500-600MBps sustainable to the repository's ReFS partition with 1.1GBps respectable peaks. However backup job write speeds are an avg. of 50-100MBps and Fast Clone/Synthetic Full times can range between 10-50 hrs. Half of the job runs show source as bottleneck (~90%), but other half shows target (~70%). What in your opinion could be causing the slow performance we're getting on target?
We are experiencing extremely slow Fast Clone/Synthetic Full operations.
We have two identical servers (backup and replication), each with the following config:
128GB RAM
152TB repository
Cisco S3260 M4
24 SATA drives in RAID6
This system has been benchmarked to write 500-600MBps sustainable to the repository's ReFS partition with 1.1GBps respectable peaks. However backup job write speeds are an avg. of 50-100MBps and Fast Clone/Synthetic Full times can range between 10-50 hrs. Half of the job runs show source as bottleneck (~90%), but other half shows target (~70%). What in your opinion could be causing the slow performance we're getting on target?
-
- VP, Product Management
- Posts: 6025
- Liked: 2853 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Do you have your repository tasks limited or do you have the task limit unchecked (the default)? What concurrency do you have set and what is your memory/CPU configuration?soehl wrote:The only struggle is, that the OS become unresponsive (SNMP, WMI) on Fast Clone and/or delete operations. So our monitoring stays in alert state for the VEEAM repositories in the main backup window.
It seems that a high count of Fast Clone operations is responsible for this behaviour. An option for simultaneous Fast Clone operations per repository would be nice.
-
- Enthusiast
- Posts: 57
- Liked: 8 times
- Joined: May 09, 2011 12:43 pm
- Full Name: Sebastian
- Location: Germany
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
We have several HP(E) boxes, mainly Apollo 4510 Gen9/10.
One example:
HPE Apollo 4510 Gen9
2x E5-2690V4 = 28 Cores
196GB RAM
60x 4TB NL SAS Disk on one HPE Smart Array P840 (4GB) Cache with enabled SSD SmartCache (800GB)
The RAID-Configuration is, 2x RAID 60 with each 30 Disk = 2x approximately 100TB netto filesystem, formatted with ReFS 64KB blocksize
Each filesystem is one VEEAM repository, with "Use per-VM backup files"-option activated and a task limit of 20.
I played around with the concurrent tasks limit option, from 5 to unchecked, but don´t found an value that brings a real improvement. Besides that backup duration is higher on a lower concurrent task limit.
Thanks!
One example:
HPE Apollo 4510 Gen9
2x E5-2690V4 = 28 Cores
196GB RAM
60x 4TB NL SAS Disk on one HPE Smart Array P840 (4GB) Cache with enabled SSD SmartCache (800GB)
The RAID-Configuration is, 2x RAID 60 with each 30 Disk = 2x approximately 100TB netto filesystem, formatted with ReFS 64KB blocksize
Each filesystem is one VEEAM repository, with "Use per-VM backup files"-option activated and a task limit of 20.
I played around with the concurrent tasks limit option, from 5 to unchecked, but don´t found an value that brings a real improvement. Besides that backup duration is higher on a lower concurrent task limit.
Thanks!
-
- Veeam Legend
- Posts: 1192
- Liked: 412 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
We have a similar configuration with an external FTS storage with 24 disks and write speed is also between 60 - 140 MB/s which is quite slow... I know the focus is on REFS synthetic performance but still active/incr performance is bad as well...l0stb@ackup wrote: This system has been benchmarked to write 500-600MBps sustainable to the repository's ReFS partition with 1.1GBps respectable peaks. However backup job write speeds are an avg. of 50-100MBps and Fast Clone/Synthetic Full times can range between 10-50 hrs. Half of the job runs show source as bottleneck (~90%), but other half shows target (~70%). What in your opinion could be causing the slow performance we're getting on target?
-
- Enthusiast
- Posts: 75
- Liked: 5 times
- Joined: Aug 08, 2018 10:19 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
I've had 4 lockups today - 2 using the old driver (where the clock got stuck) and 2 using the new driver (where the clock keeps on running)
I'm evacuating 3 ReFS FC-Storages to an internal ReFS Storage and this is sort of an nightmare... The FC-Storages aren't Blockcopy since they are the evacuated internal RAID (from yesterday) so there is no Blockcopy on these volumes...
2 x E5-2620v4
128GB Ram
3 x FC each ~12TB
1 x internal ~50TB
I'm evacuating 3 ReFS FC-Storages to an internal ReFS Storage and this is sort of an nightmare... The FC-Storages aren't Blockcopy since they are the evacuated internal RAID (from yesterday) so there is no Blockcopy on these volumes...
2 x E5-2620v4
128GB Ram
3 x FC each ~12TB
1 x internal ~50TB
-
- Chief Product Officer
- Posts: 31643
- Liked: 7133 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Try reducing concurrent tasks on the repo. If the clock keeps on running, but the server is slow to respond, this is usually due to a heavy I/O or CPU load.
-
- Enthusiast
- Posts: 75
- Liked: 5 times
- Joined: Aug 08, 2018 10:19 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Even if I'm evacuating only 1 of the FCs onto the local repo (that's 4 Tasks) the drive becomes unresponding (showing no information about used/free in explorer and so on) and Backups targeting the repo (15 minutes SQL transaction log backups) are not running... Problem occured out of nowhere after evacuating the drive yesterday...
Edit:
And you can't browse the drive if it hangs - only way is to reboot (which sometimes fails and a reset is required)
Edit:
And you can't browse the drive if it hangs - only way is to reboot (which sometimes fails and a reset is required)
-
- Veteran
- Posts: 636
- Liked: 100 times
- Joined: Mar 23, 2018 4:43 pm
- Full Name: EJ
- Location: London
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
What happens if you try a robocopy of a similar sort of data going through the same sort of route? Does the system still freeze up? i.e. eliminate Veeam from the equation to see whether or not it is directly connected with Veeam activities or is there something more fundamental wrong with your environment.
-
- Enthusiast
- Posts: 75
- Liked: 5 times
- Joined: Aug 08, 2018 10:19 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Didn't happen before, the lockup/hang/slow happens everytime a evacuation of a .vbk (or a large .vib as it seems) reaches 100% - I think B&R then tries to verify - that read seems to be the Problem in my case. Since one of the backups is ~3TB that is one looooong lockup
When the verify is over (and the clock kept on running) the server is all good again. I'm talking about 1 task at a time at the repo (not like 3 x 4 Tasks (4 tasks for each evacuated FC storage) or something)
I've copied about ~4TB of data worth with the explorer which worked without a Problem. I didn't experience problems with normal B&R actions, only the evacuation gave me headaches
When the verify is over (and the clock kept on running) the server is all good again. I'm talking about 1 task at a time at the repo (not like 3 x 4 Tasks (4 tasks for each evacuated FC storage) or something)
I've copied about ~4TB of data worth with the explorer which worked without a Problem. I didn't experience problems with normal B&R actions, only the evacuation gave me headaches
-
- Veteran
- Posts: 636
- Liked: 100 times
- Joined: Mar 23, 2018 4:43 pm
- Full Name: EJ
- Location: London
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Using explorer to copy files isn't going to put as much of a strain on things as it does not do multi-threading. Robocopy would.
When you look at the job actions in the Home / Jobs screen what does it show you for the Load reading and Primary bottleneck?
When you look at the job actions in the Home / Jobs screen what does it show you for the Load reading and Primary bottleneck?
-
- Enthusiast
- Posts: 75
- Liked: 5 times
- Joined: Aug 08, 2018 10:19 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
As the evacuation is a system task it doesn't show this - it's not a normal job.
Agreed that explorer copy isn't multitask, should have clarified that there were about 8-9 simultaneous copies done (so this is sort of multithreaded) - robocopy was a lot more overhead for these copies so I sticked to normal explorer copies
Agreed that explorer copy isn't multitask, should have clarified that there were about 8-9 simultaneous copies done (so this is sort of multithreaded) - robocopy was a lot more overhead for these copies so I sticked to normal explorer copies
-
- Chief Product Officer
- Posts: 31643
- Liked: 7133 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Guys, it's not about multi-threading (although just for the record, Robocopy does have /MT switch that makes it multi-threaded). Neither Explorer nor Robocopy is a good tool for comparison for a different reason: unlike Veeam, they don't enable ReFS data integrity streams on the target file. And having that enabled completely changes ReFS I/O pattern.
-
- Novice
- Posts: 3
- Liked: 1 time
- Joined: Oct 31, 2014 1:23 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Just a report from the field. Was suffering 10x performance degradation during backup file merge operations on ReFS after July updates. No issues before this, no Memory issues or server lockups. (24TB repo with 64GB memory)
Rolling the driver back resolved this for me temporarily, installing KB4343884 also seems to have worked as a permanent fix.
Rolling the driver back resolved this for me temporarily, installing KB4343884 also seems to have worked as a permanent fix.
-
- Enthusiast
- Posts: 75
- Liked: 5 times
- Joined: Aug 08, 2018 10:19 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Okay, but is there any explanation why the lockup for me is only happening if the verify of an evacuation is in progress? Or is this expected behaviour? I'm having no problem so far using the repos with normal synthetic full and so on.Gostev wrote:Guys, it's not about multi-threading (although just for the record, Robocopy does have /MT switch that makes it multi-threaded). Neither Explorer nor Robocopy is a good tool for comparison for a different reason: unlike Veeam, they don't enable ReFS data integrity streams on the target file. And having that enabled completely changes ReFS I/O pattern.
-
- Chief Product Officer
- Posts: 31643
- Liked: 7133 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
I suppose this is when the deletion of the evacuated backup file occurs. ReFS has some known issues when deleting large files that use block cloning. These issues were supposedly fixed in the most recent update (KB4343884), we're testing it right now to confirm.
-
- Enthusiast
- Posts: 75
- Liked: 5 times
- Joined: Aug 08, 2018 10:19 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
But not the volume with the source is locking up (no information about free/used showing up in explorer) - it's the target that's locking up. So I'm not quite sure the deletion is the problem? Maybe evacuation is doing a verify read or something?
-
- Service Provider
- Posts: 178
- Liked: 12 times
- Joined: Jan 30, 2018 3:24 pm
- Full Name: Kevin Boddy
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Hi,
We're deploying a new Windows 2016 backup repository. 2x 8c cpu, 256GB memory, 600TB raw storage, hardware raid. Should we look at ReFS again or stick to NTFS?
Thanks
We're deploying a new Windows 2016 backup repository. 2x 8c cpu, 256GB memory, 600TB raw storage, hardware raid. Should we look at ReFS again or stick to NTFS?
Thanks
-
- Service Provider
- Posts: 158
- Liked: 9 times
- Joined: Dec 05, 2014 2:13 pm
- Full Name: Iain Green
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
I'd wait for the results from Gostevs testing, however, you appear to have sufficient memory for REFS.
Many thanks
Iain Green
Iain Green
-
- Veteran
- Posts: 636
- Liked: 100 times
- Joined: Mar 23, 2018 4:43 pm
- Full Name: EJ
- Location: London
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
It was a test on his setup without using Veeam. He said he was trying to use Explorer file copy to stress-test his environment, only saying a file copy using Explorer isn't going to max-out the server.Gostev wrote:Guys, it's not about multi-threading (although just for the record, Robocopy does have /MT switch that makes it multi-threaded). Neither Explorer nor Robocopy is a good tool for comparison for a different reason: unlike Veeam, they don't enable ReFS data integrity streams on the target file. And having that enabled completely changes ReFS I/O pattern.
-
- Expert
- Posts: 159
- Liked: 28 times
- Joined: Sep 29, 2017 8:07 pm
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
I'm using the August update and I haven't had any issues since I installed it. 160GB ram / ~~55TB repo (25 used)
-
- Service Provider
- Posts: 158
- Liked: 9 times
- Joined: Dec 05, 2014 2:13 pm
- Full Name: Iain Green
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Would be interested to hear from a user who is not following the 1gb /1tb rule and is using the new update?
Many thanks
Iain Green
Iain Green
-
- Veteran
- Posts: 636
- Liked: 100 times
- Joined: Mar 23, 2018 4:43 pm
- Full Name: EJ
- Location: London
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Just found out while focusing all my attention on the repository we're currently loading up that one of our other repositories (with only 6 small jobs on) was quietly having STOP errors. Two at the beginning of this month. 1st and 8th between 6 and 8pm.
That repository has the recommended amount of RAM and is only being used for 6 very small jobs, only 800gb of a 54tb volume.
The ReFS.sys version is 10.0.14393.2395 - which is different from the other repository. It seems to be later but I can't find any guide which explains which version was released when and in which update, I know they juggled them around a bit so the higher number does not necessarily mean it is the latest distribution.
I'm updating all our Veeam servers to the latest version in KB4343884.
That repository has the recommended amount of RAM and is only being used for 6 very small jobs, only 800gb of a 54tb volume.
The ReFS.sys version is 10.0.14393.2395 - which is different from the other repository. It seems to be later but I can't find any guide which explains which version was released when and in which update, I know they juggled them around a bit so the higher number does not necessarily mean it is the latest distribution.
I'm updating all our Veeam servers to the latest version in KB4343884.
-
- Chief Product Officer
- Posts: 31643
- Liked: 7133 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
KB4343884 has ReFS driver version 10.0.14393.2457
-
- Enthusiast
- Posts: 76
- Liked: 16 times
- Joined: Oct 27, 2017 5:42 pm
- Full Name: Nick
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
We’re not experiencing the problems in this thread’s topic (we’re doing very small backups right now) but just as an observation; tracking (and getting) these Update KB Releases and ReFS Versions is a bit of a head scratcher...
2 Systems, Both:
Win Server 2016 Version 1607
Hyper-V Hosts / VB&R All-In-One Backup Servers
Getting updates directly from Windows Update (not WSUS controlled)
System-1
KB 4343887 (Installed Sept 1)
KB 4343884 (Not Installed and not showing as Available)
O/S Build 14393.2430
REFS.SYS v10.0.14393.2395
System-2
KB 4343887 (Installed Aug 25)
KB 4343884 (Installed Sept 10)
O/S Build 14393.2457
REFS.SYS v10.0.14393.2457
So apparently the numerically Lower KB installed a numerically Higher ReFS version (which happens to match the OS Build Suffix).
And the other system isn’t even being offered the KB with the Higher ReFS version?!
Nick
2 Systems, Both:
Win Server 2016 Version 1607
Hyper-V Hosts / VB&R All-In-One Backup Servers
Getting updates directly from Windows Update (not WSUS controlled)
System-1
KB 4343887 (Installed Sept 1)
KB 4343884 (Not Installed and not showing as Available)
O/S Build 14393.2430
REFS.SYS v10.0.14393.2395
System-2
KB 4343887 (Installed Aug 25)
KB 4343884 (Installed Sept 10)
O/S Build 14393.2457
REFS.SYS v10.0.14393.2457
So apparently the numerically Lower KB installed a numerically Higher ReFS version (which happens to match the OS Build Suffix).
And the other system isn’t even being offered the KB with the Higher ReFS version?!
Nick
-
- Enthusiast
- Posts: 75
- Liked: 5 times
- Joined: Aug 08, 2018 10:19 am
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
@Gostev
Are the lockups we have encountered while evacuating part of the evaluations? Maybe you could try to do an evacuation from an REFS Volume to another REFS Volume inside the same SOBR and see if the lockups occurs as soon as any evacuation hits 100%?
Are the lockups we have encountered while evacuating part of the evaluations? Maybe you could try to do an evacuation from an REFS Volume to another REFS Volume inside the same SOBR and see if the lockups occurs as soon as any evacuation hits 100%?
-
- Veteran
- Posts: 636
- Liked: 100 times
- Joined: Mar 23, 2018 4:43 pm
- Full Name: EJ
- Location: London
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Nick-SAC wrote:We’re not experiencing the problems in this thread’s topic (we’re doing very small backups right now) but just as an observation; tracking (and getting) these Update KB Releases and ReFS Versions is a bit of a head scratcher...
2 Systems, Both:
Win Server 2016 Version 1607
Hyper-V Hosts / VB&R All-In-One Backup Servers
Getting updates directly from Windows Update (not WSUS controlled)
System-1
KB 4343887 (Installed Sept 1)
KB 4343884 (Not Installed and not showing as Available)
O/S Build 14393.2430
REFS.SYS v10.0.14393.2395
System-2
KB 4343887 (Installed Aug 25)
KB 4343884 (Installed Sept 10)
O/S Build 14393.2457
REFS.SYS v10.0.14393.2457
So apparently the numerically Lower KB installed a numerically Higher ReFS version (which happens to match the OS Build Suffix).
And the other system isn’t even being offered the KB with the Higher ReFS version?!
Nick
I'm afraid this is unlikely to help demystify the situation... I saw similar confusion when updating three of mine yesterday. I think the 'Date modified' refers to the time the patch was applied rather than the time Microsoft made changes to it, it took me a few moments to see that properly.
-
- Veteran
- Posts: 636
- Liked: 100 times
- Joined: Mar 23, 2018 4:43 pm
- Full Name: EJ
- Location: London
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
The version at the top-right of my screenshot was having STOP errors.
-
- Enthusiast
- Posts: 55
- Liked: 9 times
- Joined: Apr 27, 2014 8:19 pm
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
I´ve been postponing updates on my ReFS repos for quite some time now. I´ve been running KB4093119 (refs.sys 2097) for 5 very stable months, no issues what so ever.
Reading the latest in this tread I decided to go for the latest update (KB4457131, refs.sys 2457) and after 2 days I can definitely say that it´s slower. Difficult to say by how much but I estimate 50%. This is not good, but it may be good enough since I´v been struggling with ReFS for more than a year until 2097 was released whith thousands of percent or more performance degradation. Memory or CPU has never been an issue (24 cores, 384GB RAM).
Still - it makes me wonder - if you have a perfect working version (2097) what went wrong in future releases...
Reading the latest in this tread I decided to go for the latest update (KB4457131, refs.sys 2457) and after 2 days I can definitely say that it´s slower. Difficult to say by how much but I estimate 50%. This is not good, but it may be good enough since I´v been struggling with ReFS for more than a year until 2097 was released whith thousands of percent or more performance degradation. Memory or CPU has never been an issue (24 cores, 384GB RAM).
Still - it makes me wonder - if you have a perfect working version (2097) what went wrong in future releases...
Who is online
Users browsing this forum: ddujakovic and 123 guests