Comprehensive data protection for all workloads
sconley
Enthusiast
Posts: 48
Liked: 3 times
Joined: Mar 18, 2011 7:36 pm
Full Name: Sean Conley

Slow backup file merge with REFS

Post by sconley »

I have been seeing a significant slow down in the time the merge operations. One of our production jobs was taking less than 15 minutes to complete the merge process when we initially did the upgrade to 9.5 with REFS. This operation is now regularly taking over 2 hours. The action log still shows fast clone, so I'm unsure what could be causing this. Our environment is fairly simple. It is entirely virtual running vSphere 6/vCenter 6, with our B&R server running server 2012 R2, and a single proxy/repository VM running server 2016 with the repository space directly attached to the OS via iSCSI. We use backup copy jobs to copy offsite to a proxy/repository VM that has an identical configuration and I am not seeing the same issues on the remote end when it goes through the merge process.

I am also going to open a case with support and get them to look into this. I just wanted to post here and see if anyone has experienced anything similar.
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow backup file merge with REFS

Post by Gostev »

If you don't see the same with Backup Copy job, then this could potentially be an impact from fragmentation due to the primary backup job mode you have chosen. Which one is it?
sconley
Enthusiast
Posts: 48
Liked: 3 times
Joined: Mar 18, 2011 7:36 pm
Full Name: Sean Conley

Re: Slow backup file merge with REFS

Post by sconley »

All backup jobs are forever incremental with monthly compact and defragment maintenance scheduled. Looking at the job history the compact operations seem to have been running as expected.
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Slow backup file merge with REFS

Post by tsightler » 2 people like this post

I'm curious if your ReFS volume is formatted with 4K clusters or the Veeam recommended 64K. I've seen this issue with 4K clusters in a number of cases, although on a smaller scale, but similar ratio (for example a merge that was taking 5 minutes slowly degrades to taking 40+ minutes).
sconley
Enthusiast
Posts: 48
Liked: 3 times
Joined: Mar 18, 2011 7:36 pm
Full Name: Sean Conley

Re: Slow backup file merge with REFS

Post by sconley »

The volume is formatted with 64k clusters. I actually rebuilt the repository at one point as I started with the default 4k cluster size before reading about some of the nightmare stories with 4k allocation size. Here is the output from fsutil:

C:\Windows\system32>fsutil fsinfo refsinfo e:
REFS Volume Serial Number : 0x5c2eae7f2eae51b6
REFS Version : 3.1
Number Sectors : 0x000000117ffa0000
Total Clusters : 0x0000000022fff400
Free Clusters : 0x00000000083d8585
Total Reserved : 0x00000000001ec09b
Bytes Per Sector : 512
Bytes Per Physical Sector : 4096
Bytes Per Cluster : 65536
Checksum Type: CHECKSUM_TYPE_NONE
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Slow backup file merge with REFS

Post by mkretzer »

Ok our system now is no longer crashing (as we discussed in the "REFS 4K horror story") but is also extremly slow merging. Also, other backups doing incrementals at the same time are very slow as well.

We opened a Veeam ticket (02163118) but when Veeam support found out that the issue also happens with normal file copies to the REFS they basically told us that it is not a Veeam issue. Ressource monitor show that the disk are in no way at maximum load.

We are thinking of going back to NTFS...
haslund
VeeaMVP
Posts: 839
Liked: 149 times
Joined: Feb 16, 2012 7:35 am
Full Name: Rasmus Haslund
Location: Denmark
Contact:

Re: Slow backup file merge with REFS

Post by haslund »

Can you share more details of your storage used for this backup repository? Is it local DAS or some remote storage such as iSCSI/FC?
Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Slow backup file merge with REFS

Post by mkretzer »

Storage is a dedicated Hitachi HUS110 FC SAN system with 96 disks in a RAID 60 Configuration (8 Disks per RAID set). The system is quite fast. Latency in perfmon shows as 2-8 ms, so the storage itself is not the issue. Disk queue is 0 all the time.
Initial backups went on with 600 - 800 MB/s. Yesterday a periodic active full only reached 52 MB/s with 93% target bottleneck. Right now there is a synthetic running and nothing else and it is at 78 % after nearly 7 hours.
Two weeks ago before the issues started Synthetic full took little more than an hour - with other jobs running in parallel.

The storage itself received only ~100 IO/s so this could be handled by one disk - but there are 96 disks avaiable just for this one job and the system does not much.

REFS is 64 k, we added RAM so we have 384 GB now and latest MS hotfixes with registry setting 1 for REFS is applied. RAM free is 65 %.
haslund
VeeaMVP
Posts: 839
Liked: 149 times
Joined: Feb 16, 2012 7:35 am
Full Name: Rasmus Haslund
Location: Denmark
Contact:

Re: Slow backup file merge with REFS

Post by haslund »

I assume you are utilizing multipathing to access the Hitachi disk system, are you using only Windows MPIO or did you install any special software from Hitachi? Would it be possible to test with only a single path active?
Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Slow backup file merge with REFS

Post by mkretzer »

We use windows MPIO. But I know this system for 4 years now. It is capable of much IO single path or not.

The problem is definately REFS. We never had this with NTFS. This is our third try to implement REFS. With our second try we used a new Fujitsu Eternus DX60 S3 and the exact same thing happened. We also thought the storage is the problem and even let the vendor check that.

As i said latencies show very good values in perfmon.

Merges have finished now and i am going to test normal incremental backups to see if everything behaves normal again.
haslund
VeeaMVP
Posts: 839
Liked: 149 times
Joined: Feb 16, 2012 7:35 am
Full Name: Rasmus Haslund
Location: Denmark
Contact:

Re: Slow backup file merge with REFS

Post by haslund »

I completely understand and respect your thoughts here. I am trying to look across customer posts and see quite a few are using FC or iSCSI and just wonder if anything is connected to the multipathing. Is there any chance it could be tested - just for the purpose of confirming it does not have any impact?
Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Slow backup file merge with REFS

Post by mkretzer »

We could do this - but i think there is another reason: Those using FC or ISCSI have external arrays which often means much bigger repositories. And since the problem was not there right away it is very likely that the number of used/fast cloned blocks has something to do with if the problem occours...
haslund
VeeaMVP
Posts: 839
Liked: 149 times
Joined: Feb 16, 2012 7:35 am
Full Name: Rasmus Haslund
Location: Denmark
Contact:

Re: Slow backup file merge with REFS

Post by haslund »

mkretzer wrote:Since the problem was not there right away it is very likely that the number of used/fast cloned blocks has something to do with if the problem occours...
This seems to align with what the good @tsightler commented almost a month ago here post239915.html#p239915
Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Slow backup file merge with REFS

Post by mkretzer »

Yes! Problem is at the same time we started to do periodic active fulls again also the first backups went out of retention. That is the reason we now disabled active full to see if it works better.

Furthermore, we did not do a reboot after the weekend but the system seems to have returned to normal speed. One thing i found after checking our monitoring system was that memory usage while active full was running went up 120 GB in the matter of half an hour. Shortly after, WMI cut out for 15 minutes as the memory value recovered.
Is there perhaps a fixed value for the REFS filesystem how much memory the driver (?) can use and if that is reached system becomes kind of unstable?
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: Slow backup file merge with REFS

Post by JimmyO »

Having the exact same issue - ReFS seems to cause a lot of fragmentation. My first daily merge took 1 hour, now a month or so later, it takes up to 30 hours, meaning that I will no longer be able to do daily backups.....
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow backup file merge with REFS

Post by Gostev »

JimmyO wrote:My first daily merge took 1 hour
With fast clone???
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: Slow backup file merge with REFS

Post by JimmyO »

Yes! (I have approx. 1TB of increments /job)
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Slow backup file merge with REFS

Post by mkretzer »

Is fragmentation really relevant while fast cloning?
And why is active full slow when there are still 70 TB avaible which have never been written to (we see that in the storage)

Edit:
No matter how high the fragmentation is, we have 96 disks and the system is still "fast-cloning" with only 150 IOPS... This is the speed two disks in a RAID 0 could do random
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow backup file merge with REFS

Post by Gostev »

Fast cloning does not move the actual data around, it's about updating metadata so IOPS requirements are very low.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Slow backup file merge with REFS

Post by mkretzer »

Ok yesterday we had to increase the size of a disk of one of our VMs so this morning the entire disk had to be read. The backup normally takes 30 minutes but now this and the other backups running at the same time basically write with less than 1 MB/s and snapshots are getting bigger and bigger. Repo drive disk queue is basically 0 all the time...

@Gostev: If Veeam wants to look at the situation there is now a chance. But i guess only MS can really solve this and i know first level support cannot help us. So we will move back to NTFS very soon....
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow backup file merge with REFS

Post by Gostev »

Did you try rebooting the repository server? I've only heard about such issue (slow writes) once in the past half a year since 9.5 release, and that one time it was fixed by server reboot, so this is not necessarily related to ReFS usage.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Slow backup file merge with REFS

Post by mkretzer »

@gostev Yes, last week when i opened a Veeam case... Your support told me to copy a file with windows explorer and the copy speed was also close to zero.
It is definately REFS: When we had 128 GB RAM we had the server crash, now since we have 384 GB our free RAM goes to 34 % or 35 % (reproducable!) and then the IO goes VERY slow. I guess without the RAM we would see the crash as before.

Now again copy operations with windows explorer are extremly slow...

Edit: One more thing: We have now set all the registry settings for the REFS problem to check if they will change behaviour after the next reboot. If not we will be forced to go back to NTFS
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow backup file merge with REFS

Post by Gostev »

Ah, now I remember - so you're the customer I was referring to anyway ;) so we haven't seen this issue reproduced elsewhere yet.
suprnova
Enthusiast
Posts: 38
Liked: never
Joined: Apr 08, 2016 5:15 pm
Contact:

Re: Slow backup file merge with REFS

Post by suprnova »

I seem to have a very similar issue, although mine only took a few days to start, and it seems to be only when the larger backups started hitting their retention. I'm still trying to find a correlation but while the fast clone merging is happening, the repository drives are up and down from a monitoring perspective. They show up in Windows explorer, but you can't browse them and no storage data is reported. While there are no merges in progress, jobs are fine.

These are two extreme examples of the larger backups.
Example1:
5/25: 4h:46m
5/26: 10h:4m
5/27: 10h:31m
5/28: 5h:52m
5/29: 34h46m

Example2:
5/25: 3h11m
5/26: 26h58m
5/28: 29h10m

Previous to the above two starting:
5/20: 42s
5/21: 1m
5/23: 21m
5/24: 5m48s

after:
5/25: 13m37s
5/26: 1h13m
5/27: 4h15m
5/28: 3h32m
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow backup file merge with REFS

Post by Gostev »

suprnova wrote:I'm still trying to find a correlation but while the fast clone merging is happening, the repository drives are up and down from a monitoring perspective. They show up in Windows explorer, but you can't browse them and no storage data is reported.
That sounds like an issue discussed in this mega thread. This impacts a portion of our customers and is being investigated by Microsoft.
suprnova
Enthusiast
Posts: 38
Liked: never
Joined: Apr 08, 2016 5:15 pm
Contact:

Re: Slow backup file merge with REFS

Post by suprnova »

I am trying to figure out if it is related. I did put in the test Microsoft fix from support but it made no difference. There was only one merge occurring when the gaps started. Either way, the merging is then taking forever, I am assuming because the repository is unstable.
sconley
Enthusiast
Posts: 48
Liked: 3 times
Joined: Mar 18, 2011 7:36 pm
Full Name: Sean Conley

Re: Slow backup file merge with REFS

Post by sconley »

So is the thinking that the slow merge issue may be related to the issue in the REFS 4k thread even given that I have formatted the repo volume using 64 kb blocks? I am curious because if that is the case I will start following that thread more closely. I also see that at least one person has been provided with a test fix that they have, at least in VERY early testing, had positive results with in addressing that issue.
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Slow backup file merge with REFS

Post by Gostev »

Right. Since the core issue puts an extremely heavy load on ReFS (to the point where the volume becomes unreachable sometimes), it is perfectly expected that all other I/O activities involving this volume come to crawl as the result.
sconley
Enthusiast
Posts: 48
Liked: 3 times
Joined: Mar 18, 2011 7:36 pm
Full Name: Sean Conley

Re: Slow backup file merge with REFS

Post by sconley »

That makes sense. In that case I will follow the progress of the other thread to see how it plays out over the next few days/weeks. In my case I am lucky that it is not crippling to our daily operations, possibly since we have a relatively small deployment. So at least I am not at the point of contemplating rebuilding the repository yet again. Thanks for the input.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Slow backup file merge with REFS

Post by mkretzer »

So merges are taking longer and longer. This time we disabled all scheduled synthetic fulls, no copies and no tape backups were running. Here are the merge times of the first backup that merges on weekends:

Without fast-Clone (same Storage): 5:04
4 weeks before: 0:33
3 weeks before: 0:12
2 weeks before: 2:44
1 week before: 3:12
This week: 3:59

So if this continues REFS is no longer faster than NTFS. Furthermore, NTFS does not have the same issues with active fulls at the same time on a fast storage.
So what changed? The only thing that happened 2 weeks ago was that for the first time the day before the merge retention points were deleted.

@Gostev: Do you have an idea what we can do next? Perhaps increase retention so there are no points deleted and see if backup merges get faster again next week?
Post Reply

Who is online

Users browsing this forum: bct44, Semrush [Bot], volkanvk and 179 guests