- 
				sconley
- Enthusiast
- Posts: 48
- Liked: 3 times
- Joined: Mar 18, 2011 7:36 pm
- Full Name: Sean Conley
Slow backup file merge with REFS
I have been seeing a significant slow down in the time the merge operations.  One of our production jobs was taking less than 15 minutes to complete the merge process when we initially did the upgrade to 9.5 with REFS.  This operation is now regularly taking over 2 hours.  The action log still shows fast clone, so I'm unsure what could be causing this.  Our environment is fairly simple.  It is entirely virtual running vSphere 6/vCenter 6, with our B&R server running server 2012 R2, and a single proxy/repository VM running server 2016 with the repository space directly attached to the OS via iSCSI.  We use backup copy jobs to copy offsite to a proxy/repository VM that has an identical configuration and I am not seeing the same issues on the remote end when it goes through the merge process.
I am also going to open a case with support and get them to look into this. I just wanted to post here and see if anyone has experienced anything similar.
			
			
									
						
										
						I am also going to open a case with support and get them to look into this. I just wanted to post here and see if anyone has experienced anything similar.
- 
				Gostev
- Chief Product Officer
- Posts: 32737
- Liked: 7958 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Slow backup file merge with REFS
If you don't see the same with Backup Copy job, then this could potentially be an impact from fragmentation due to the primary backup job mode you have chosen. Which one is it?
			
			
									
						
										
						- 
				sconley
- Enthusiast
- Posts: 48
- Liked: 3 times
- Joined: Mar 18, 2011 7:36 pm
- Full Name: Sean Conley
Re: Slow backup file merge with REFS
All backup jobs are forever incremental with monthly compact and defragment maintenance scheduled.  Looking at the job history the compact operations seem to have been running as expected.
			
			
									
						
										
						- 
				tsightler
- VP, Product Management
- Posts: 6040
- Liked: 2867 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Slow backup file merge with REFS
I'm curious if your ReFS volume is formatted with 4K clusters or the Veeam recommended 64K.  I've seen this issue with 4K clusters in a number of cases, although on a smaller scale, but similar ratio (for example a merge that was taking 5 minutes slowly degrades to taking 40+ minutes).
			
			
									
						
										
						- 
				sconley
- Enthusiast
- Posts: 48
- Liked: 3 times
- Joined: Mar 18, 2011 7:36 pm
- Full Name: Sean Conley
Re: Slow backup file merge with REFS
The volume is formatted with 64k clusters.  I actually rebuilt the repository at one point as I started with the default 4k cluster size before reading about some of the nightmare stories with 4k allocation size.  Here is the output from fsutil:
C:\Windows\system32>fsutil fsinfo refsinfo e:
REFS Volume Serial Number : 0x5c2eae7f2eae51b6
REFS Version : 3.1
Number Sectors : 0x000000117ffa0000
Total Clusters : 0x0000000022fff400
Free Clusters : 0x00000000083d8585
Total Reserved : 0x00000000001ec09b
Bytes Per Sector : 512
Bytes Per Physical Sector : 4096
Bytes Per Cluster : 65536
Checksum Type: CHECKSUM_TYPE_NONE
			
			
									
						
										
						C:\Windows\system32>fsutil fsinfo refsinfo e:
REFS Volume Serial Number : 0x5c2eae7f2eae51b6
REFS Version : 3.1
Number Sectors : 0x000000117ffa0000
Total Clusters : 0x0000000022fff400
Free Clusters : 0x00000000083d8585
Total Reserved : 0x00000000001ec09b
Bytes Per Sector : 512
Bytes Per Physical Sector : 4096
Bytes Per Cluster : 65536
Checksum Type: CHECKSUM_TYPE_NONE
- 
				mkretzer
- Veeam Legend
- Posts: 1289
- Liked: 464 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Ok our system now is no longer crashing (as we discussed in the "REFS 4K horror story") but is also extremly slow merging. Also, other backups doing incrementals at the same time are very slow as well.
We opened a Veeam ticket (02163118) but when Veeam support found out that the issue also happens with normal file copies to the REFS they basically told us that it is not a Veeam issue. Ressource monitor show that the disk are in no way at maximum load.
We are thinking of going back to NTFS...
			
			
									
						
										
						We opened a Veeam ticket (02163118) but when Veeam support found out that the issue also happens with normal file copies to the REFS they basically told us that it is not a Veeam issue. Ressource monitor show that the disk are in no way at maximum load.
We are thinking of going back to NTFS...
- 
				haslund
- Veeam Software
- Posts: 902
- Liked: 163 times
- Joined: Feb 16, 2012 7:35 am
- Full Name: Rasmus Haslund
- Location: Denmark
- Contact:
Re: Slow backup file merge with REFS
Can you share more details of your storage used for this backup repository? Is it local DAS or some remote storage such as iSCSI/FC?
			
			
									
						
							Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
			
						- 
				mkretzer
- Veeam Legend
- Posts: 1289
- Liked: 464 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Storage is a dedicated Hitachi HUS110 FC SAN system with 96 disks in a RAID 60 Configuration (8 Disks per RAID set). The system is quite fast. Latency in perfmon shows as 2-8 ms, so the storage itself is not the issue. Disk queue is 0 all the time.
Initial backups went on with 600 - 800 MB/s. Yesterday a periodic active full only reached 52 MB/s with 93% target bottleneck. Right now there is a synthetic running and nothing else and it is at 78 % after nearly 7 hours.
Two weeks ago before the issues started Synthetic full took little more than an hour - with other jobs running in parallel.
The storage itself received only ~100 IO/s so this could be handled by one disk - but there are 96 disks avaiable just for this one job and the system does not much.
REFS is 64 k, we added RAM so we have 384 GB now and latest MS hotfixes with registry setting 1 for REFS is applied. RAM free is 65 %.
			
			
									
						
										
						Initial backups went on with 600 - 800 MB/s. Yesterday a periodic active full only reached 52 MB/s with 93% target bottleneck. Right now there is a synthetic running and nothing else and it is at 78 % after nearly 7 hours.
Two weeks ago before the issues started Synthetic full took little more than an hour - with other jobs running in parallel.
The storage itself received only ~100 IO/s so this could be handled by one disk - but there are 96 disks avaiable just for this one job and the system does not much.
REFS is 64 k, we added RAM so we have 384 GB now and latest MS hotfixes with registry setting 1 for REFS is applied. RAM free is 65 %.
- 
				haslund
- Veeam Software
- Posts: 902
- Liked: 163 times
- Joined: Feb 16, 2012 7:35 am
- Full Name: Rasmus Haslund
- Location: Denmark
- Contact:
Re: Slow backup file merge with REFS
I assume you are utilizing multipathing to access the Hitachi disk system, are you using only Windows MPIO or did you install any special software from Hitachi? Would it be possible to test with only a single path active?
			
			
									
						
							Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
			
						- 
				mkretzer
- Veeam Legend
- Posts: 1289
- Liked: 464 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
We use windows MPIO. But I know this system for 4 years now. It is capable of much IO single path or not. 
The problem is definately REFS. We never had this with NTFS. This is our third try to implement REFS. With our second try we used a new Fujitsu Eternus DX60 S3 and the exact same thing happened. We also thought the storage is the problem and even let the vendor check that.
As i said latencies show very good values in perfmon.
Merges have finished now and i am going to test normal incremental backups to see if everything behaves normal again.
			
			
									
						
										
						The problem is definately REFS. We never had this with NTFS. This is our third try to implement REFS. With our second try we used a new Fujitsu Eternus DX60 S3 and the exact same thing happened. We also thought the storage is the problem and even let the vendor check that.
As i said latencies show very good values in perfmon.
Merges have finished now and i am going to test normal incremental backups to see if everything behaves normal again.
- 
				haslund
- Veeam Software
- Posts: 902
- Liked: 163 times
- Joined: Feb 16, 2012 7:35 am
- Full Name: Rasmus Haslund
- Location: Denmark
- Contact:
Re: Slow backup file merge with REFS
I completely understand and respect your thoughts here. I am trying to look across customer posts and see quite a few are using FC or iSCSI and just wonder if anything is connected to the multipathing. Is there any chance it could be tested - just for the purpose of confirming it does not have any impact?
			
			
									
						
							Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
			
						- 
				mkretzer
- Veeam Legend
- Posts: 1289
- Liked: 464 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
We could do this - but i think there is another reason: Those using FC or ISCSI have external arrays which often means much bigger repositories. And since the problem was not there right away it is very likely that the number of used/fast cloned blocks has something to do with if the problem occours...
			
			
									
						
										
						- 
				haslund
- Veeam Software
- Posts: 902
- Liked: 163 times
- Joined: Feb 16, 2012 7:35 am
- Full Name: Rasmus Haslund
- Location: Denmark
- Contact:
Re: Slow backup file merge with REFS
This seems to align with what the good @tsightler commented almost a month ago here post239915.html#p239915mkretzer wrote:Since the problem was not there right away it is very likely that the number of used/fast cloned blocks has something to do with if the problem occours...
Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
			
						- 
				mkretzer
- Veeam Legend
- Posts: 1289
- Liked: 464 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Yes! Problem is at the same time we started to do periodic active fulls again also the first backups went out of retention. That is the reason we now disabled active full to see if it works better.
Furthermore, we did not do a reboot after the weekend but the system seems to have returned to normal speed. One thing i found after checking our monitoring system was that memory usage while active full was running went up 120 GB in the matter of half an hour. Shortly after, WMI cut out for 15 minutes as the memory value recovered.
Is there perhaps a fixed value for the REFS filesystem how much memory the driver (?) can use and if that is reached system becomes kind of unstable?
			
			
									
						
										
						Furthermore, we did not do a reboot after the weekend but the system seems to have returned to normal speed. One thing i found after checking our monitoring system was that memory usage while active full was running went up 120 GB in the matter of half an hour. Shortly after, WMI cut out for 15 minutes as the memory value recovered.
Is there perhaps a fixed value for the REFS filesystem how much memory the driver (?) can use and if that is reached system becomes kind of unstable?
- 
				JimmyO
- Enthusiast
- Posts: 55
- Liked: 9 times
- Joined: Apr 27, 2014 8:19 pm
- Contact:
Re: Slow backup file merge with REFS
Having the exact same issue - ReFS seems to cause a lot of fragmentation. My first daily merge took 1 hour, now a month or so later, it takes up to 30 hours, meaning that I will no longer be able to do daily backups.....
			
			
									
						
										
						- 
				Gostev
- Chief Product Officer
- Posts: 32737
- Liked: 7958 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Slow backup file merge with REFS
With fast clone???JimmyO wrote:My first daily merge took 1 hour
- 
				JimmyO
- Enthusiast
- Posts: 55
- Liked: 9 times
- Joined: Apr 27, 2014 8:19 pm
- Contact:
Re: Slow backup file merge with REFS
Yes! (I have approx. 1TB of increments /job)
			
			
									
						
										
						- 
				mkretzer
- Veeam Legend
- Posts: 1289
- Liked: 464 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Is fragmentation really relevant while fast cloning?
And why is active full slow when there are still 70 TB avaible which have never been written to (we see that in the storage)
Edit:
No matter how high the fragmentation is, we have 96 disks and the system is still "fast-cloning" with only 150 IOPS... This is the speed two disks in a RAID 0 could do random
			
			
									
						
										
						And why is active full slow when there are still 70 TB avaible which have never been written to (we see that in the storage)
Edit:
No matter how high the fragmentation is, we have 96 disks and the system is still "fast-cloning" with only 150 IOPS... This is the speed two disks in a RAID 0 could do random
- 
				Gostev
- Chief Product Officer
- Posts: 32737
- Liked: 7958 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Slow backup file merge with REFS
Fast cloning does not move the actual data around, it's about updating metadata so IOPS requirements are very low.
			
			
									
						
										
						- 
				mkretzer
- Veeam Legend
- Posts: 1289
- Liked: 464 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Ok yesterday we had to increase the size of a disk of one of our VMs so this morning the entire disk had to be read. The backup normally takes 30 minutes but now this and the other backups running at the same time basically write with less than 1 MB/s and snapshots are getting bigger and bigger. Repo drive disk queue is basically 0 all the time...
@Gostev: If Veeam wants to look at the situation there is now a chance. But i guess only MS can really solve this and i know first level support cannot help us. So we will move back to NTFS very soon....
			
			
									
						
										
						@Gostev: If Veeam wants to look at the situation there is now a chance. But i guess only MS can really solve this and i know first level support cannot help us. So we will move back to NTFS very soon....
- 
				Gostev
- Chief Product Officer
- Posts: 32737
- Liked: 7958 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Slow backup file merge with REFS
Did you try rebooting the repository server? I've only heard about such issue (slow writes) once in the past half a year since 9.5 release, and that one time it was fixed by server reboot, so this is not necessarily related to ReFS usage.
			
			
									
						
										
						- 
				mkretzer
- Veeam Legend
- Posts: 1289
- Liked: 464 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
@gostev Yes, last week when i opened a Veeam case... Your support told me to copy a file with windows explorer and the copy speed was also close to zero. 
It is definately REFS: When we had 128 GB RAM we had the server crash, now since we have 384 GB our free RAM goes to 34 % or 35 % (reproducable!) and then the IO goes VERY slow. I guess without the RAM we would see the crash as before.
Now again copy operations with windows explorer are extremly slow...
Edit: One more thing: We have now set all the registry settings for the REFS problem to check if they will change behaviour after the next reboot. If not we will be forced to go back to NTFS
			
			
									
						
										
						It is definately REFS: When we had 128 GB RAM we had the server crash, now since we have 384 GB our free RAM goes to 34 % or 35 % (reproducable!) and then the IO goes VERY slow. I guess without the RAM we would see the crash as before.
Now again copy operations with windows explorer are extremly slow...
Edit: One more thing: We have now set all the registry settings for the REFS problem to check if they will change behaviour after the next reboot. If not we will be forced to go back to NTFS
- 
				Gostev
- Chief Product Officer
- Posts: 32737
- Liked: 7958 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Slow backup file merge with REFS
Ah, now I remember - so you're the customer I was referring to anyway  so we haven't seen this issue reproduced elsewhere yet.
  so we haven't seen this issue reproduced elsewhere yet.
			
			
									
						
										
						 so we haven't seen this issue reproduced elsewhere yet.
  so we haven't seen this issue reproduced elsewhere yet.- 
				suprnova
- Service Provider
- Posts: 38
- Liked: never
- Joined: Apr 08, 2016 5:15 pm
- Contact:
Re: Slow backup file merge with REFS
I seem to have a very similar issue, although mine only took a few days to start, and it seems to be only when the larger backups started hitting their retention.  I'm still trying to find a correlation but while the fast clone merging is happening, the repository drives are up and down from a monitoring perspective.  They show up in Windows explorer, but you can't browse them and no storage data is reported.  While there are no merges in progress, jobs are fine.
These are two extreme examples of the larger backups.
Example1:
5/25: 4h:46m
5/26: 10h:4m
5/27: 10h:31m
5/28: 5h:52m
5/29: 34h46m
Example2:
5/25: 3h11m
5/26: 26h58m
5/28: 29h10m
Previous to the above two starting:
5/20: 42s
5/21: 1m
5/23: 21m
5/24: 5m48s
after:
5/25: 13m37s
5/26: 1h13m
5/27: 4h15m
5/28: 3h32m
			
			
									
						
										
						These are two extreme examples of the larger backups.
Example1:
5/25: 4h:46m
5/26: 10h:4m
5/27: 10h:31m
5/28: 5h:52m
5/29: 34h46m
Example2:
5/25: 3h11m
5/26: 26h58m
5/28: 29h10m
Previous to the above two starting:
5/20: 42s
5/21: 1m
5/23: 21m
5/24: 5m48s
after:
5/25: 13m37s
5/26: 1h13m
5/27: 4h15m
5/28: 3h32m
- 
				Gostev
- Chief Product Officer
- Posts: 32737
- Liked: 7958 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Slow backup file merge with REFS
That sounds like an issue discussed in this mega thread. This impacts a portion of our customers and is being investigated by Microsoft.suprnova wrote:I'm still trying to find a correlation but while the fast clone merging is happening, the repository drives are up and down from a monitoring perspective. They show up in Windows explorer, but you can't browse them and no storage data is reported.
- 
				suprnova
- Service Provider
- Posts: 38
- Liked: never
- Joined: Apr 08, 2016 5:15 pm
- Contact:
Re: Slow backup file merge with REFS
I am trying to figure out if it is related.  I did put in the test Microsoft fix from support but it made no difference.  There was only one merge occurring when the gaps started. Either way, the merging is then taking forever, I am assuming because the repository is unstable.
			
			
									
						
										
						- 
				sconley
- Enthusiast
- Posts: 48
- Liked: 3 times
- Joined: Mar 18, 2011 7:36 pm
- Full Name: Sean Conley
Re: Slow backup file merge with REFS
So is the thinking that the slow merge issue may be related to the issue in the REFS 4k thread even given that I have formatted the repo volume using 64 kb blocks?  I am curious because if that is the case I will start following that thread more closely.  I also see that at least one person has been provided with a test fix that they have, at least in VERY early testing, had positive results with in addressing that issue.
			
			
									
						
										
						- 
				Gostev
- Chief Product Officer
- Posts: 32737
- Liked: 7958 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Slow backup file merge with REFS
Right. Since the core issue puts an extremely heavy load on ReFS (to the point where the volume becomes unreachable sometimes), it is perfectly expected that all other I/O activities involving this volume come to crawl as the result.
			
			
									
						
										
						- 
				sconley
- Enthusiast
- Posts: 48
- Liked: 3 times
- Joined: Mar 18, 2011 7:36 pm
- Full Name: Sean Conley
Re: Slow backup file merge with REFS
That makes sense.  In that case I will follow the progress of the other thread to see how it plays out over the next few days/weeks.  In my case I am lucky that it is not crippling to our daily operations, possibly since we have a relatively small deployment.  So at least I am not at the point of contemplating rebuilding the repository yet again.  Thanks for the input.
			
			
									
						
										
						- 
				mkretzer
- Veeam Legend
- Posts: 1289
- Liked: 464 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
So merges are taking longer and longer. This time we disabled all scheduled synthetic fulls, no copies and no tape backups were running. Here are the merge times of the first backup that merges on weekends:
Without fast-Clone (same Storage): 5:04
4 weeks before: 0:33
3 weeks before: 0:12
2 weeks before: 2:44
1 week before: 3:12
This week: 3:59
So if this continues REFS is no longer faster than NTFS. Furthermore, NTFS does not have the same issues with active fulls at the same time on a fast storage.
So what changed? The only thing that happened 2 weeks ago was that for the first time the day before the merge retention points were deleted.
@Gostev: Do you have an idea what we can do next? Perhaps increase retention so there are no points deleted and see if backup merges get faster again next week?
			
			
									
						
										
						Without fast-Clone (same Storage): 5:04
4 weeks before: 0:33
3 weeks before: 0:12
2 weeks before: 2:44
1 week before: 3:12
This week: 3:59
So if this continues REFS is no longer faster than NTFS. Furthermore, NTFS does not have the same issues with active fulls at the same time on a fast storage.
So what changed? The only thing that happened 2 weeks ago was that for the first time the day before the merge retention points were deleted.
@Gostev: Do you have an idea what we can do next? Perhaps increase retention so there are no points deleted and see if backup merges get faster again next week?
Who is online
Users browsing this forum: Bing [Bot], carter.cahill, emil.davis and 48 guests