-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
One more thing: Our write speed is extemly slow again just because two synthetics (fast clone) are running. Disk latency is still between 2 and 4 ms so disk is not loaded. Before the synthetics started speed was normal.
-
- Enthusiast
- Posts: 38
- Liked: never
- Joined: Apr 08, 2016 5:15 pm
- Contact:
Re: Slow backup file merge with REFS
That's really strange, we've had basically all of the other issues, but our write speeds are normal while fast clone synthetics are running. We are running the experimental refs.sys driver though so maybe that is a potential fix. But Veeam support is unable to report what exactly the new driver fixes.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
I mean it must be something in the REFS driver. With NTFS there is REAL LOAD on the volume while merging but we never ever had any issues getting the data to disk while all the merges ran. We basically never had to care if the active fulls were going at the same time as synthetics - it just worked (same backend storage).
Where do i get that REFS driver and is it usable for production?
Where do i get that REFS driver and is it usable for production?
-
- Enthusiast
- Posts: 38
- Liked: never
- Joined: Apr 08, 2016 5:15 pm
- Contact:
Re: Slow backup file merge with REFS
Our issue has always been merges with NTFS, just way too slow, but we are also a service provider where there may be 100+ merges overnight. Veeam support can provide the driver. It's tough to say if it is production worthy or not, but I can say it has not made things worse.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Ok setting all the REFS Reg settings Microsoft recommends now leads to even slower merges and also slower write speed to the paralell running backups. Memory usage is lower but now our backup window is too long... We will remove the settings again and increase retention as a test.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Without deleting retention points our merges doubled in speed! We are not at the original "fast clone" speed from the beginning but its much better. Even an active full does not immediately stall when a merge is running in parallel.
Right now we have enough space to host even another week so we will see if the merges will even get faster next weekend.
I wonder: is there anything i can optimize with better hardware for the repo server? I simply cannot see which ressource is limiting the REFS to merge faster/delete restore points without issue. RAM is no issue anymore since we increased to 384 GB, CPU usage is low and disk queue of the REPO is 0. Buying a new server would still be cheaper than buying more storage and going back to NTFS
Right now we have enough space to host even another week so we will see if the merges will even get faster next weekend.
I wonder: is there anything i can optimize with better hardware for the repo server? I simply cannot see which ressource is limiting the REFS to merge faster/delete restore points without issue. RAM is no issue anymore since we increased to 384 GB, CPU usage is low and disk queue of the REPO is 0. Buying a new server would still be cheaper than buying more storage and going back to NTFS
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Ok even without deletion of restore points everything got even slower with 6 weeks of backups on disk. We will go back to NTFS as we see no way to get this working right now...
-
- Enthusiast
- Posts: 54
- Liked: 1 time
- Joined: Oct 12, 2012 12:28 am
- Full Name: Mike Godwin
- Contact:
Re: Slow backup file merge with REFS
Has anyone tried disabling FileIntegrity in ReFS to see if it improves performance?
-
- Influencer
- Posts: 17
- Liked: 3 times
- Joined: Oct 18, 2017 6:40 pm
- Contact:
Re: Slow backup file merge with REFS
Several people have stated that they were switching to NTFS. For anyone that has, have you notice improvements and has the issue re-appeared?
We are facing a similar issue. Veeam B&R 9.5.0.1038, repository is 64K REFS on a physical Dell R730XD in a RAID 6. Backups are running between 150-350MB/s but a merge of a 330GB incremental to a 1.45TB full took 18 hours and brought the system to an almost standstill. Previously this was tremendously faster. I've had to disable health checks and defrags because of the time it was taking. Strangely backup copies are running normally to a secondary repository on identical hardware. OS is Windows 2016 build 1607 (14393.1884) with all latest patches. I'm working with support (02386355) and just read through this and the mega 4k thread hoping for answers.
We are facing a similar issue. Veeam B&R 9.5.0.1038, repository is 64K REFS on a physical Dell R730XD in a RAID 6. Backups are running between 150-350MB/s but a merge of a 330GB incremental to a 1.45TB full took 18 hours and brought the system to an almost standstill. Previously this was tremendously faster. I've had to disable health checks and defrags because of the time it was taking. Strangely backup copies are running normally to a secondary repository on identical hardware. OS is Windows 2016 build 1607 (14393.1884) with all latest patches. I'm working with support (02386355) and just read through this and the mega 4k thread hoping for answers.
-
- Service Provider
- Posts: 454
- Liked: 86 times
- Joined: Jun 09, 2015 7:08 pm
- Full Name: JaySt
- Contact:
Re: Slow backup file merge with REFS
Interesting. I also have a customer running 730xd hardware and raid6 with Windows server 2016 refs . No issues so far. Running a few TBs of hyperv vms with weekly synth fulls currently i believe
What are the exact drive types used in your case? 12Gbps NLSAS or 6Gbps Sata?
What are the exact drive types used in your case? 12Gbps NLSAS or 6Gbps Sata?
Veeam Certified Engineer
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
NTFS is not fast but predicatable. The issue cannot "re-appear" there because it never was there with NTFS because in NTFS the blocks are just copied. So if that would be an issue NTFS would had an issue in any application.
BTW for REFS the type of storage does not matter for the merge issue - the issue is not the backend speed (look at the disk queue lenght it should be very low)!
Do you use synthetic fulls for your primary backup? If so you have more block-cloned blocks and that should mean the issue is more likely to appear.
BTW for REFS the type of storage does not matter for the merge issue - the issue is not the backend speed (look at the disk queue lenght it should be very low)!
Do you use synthetic fulls for your primary backup? If so you have more block-cloned blocks and that should mean the issue is more likely to appear.
-
- Service Provider
- Posts: 454
- Liked: 86 times
- Joined: Jun 09, 2015 7:08 pm
- Full Name: JaySt
- Contact:
Re: Slow backup file merge with REFS
i'm not that sure about stating "type of storage does not matter" regarding the backend storage type. Performance of the backend isn't an issue during ReFS merge, that's quite clear. But other features like cache or the handling of certain types of IO could differ between technologies. I see different ReFS user experiences in this forum where almost in every case backend performance shouldn't be an issue, but something else is causing issues. For example, the use of storage spaces has some sort of (negative?) effect on this as it seems.
Veeam Certified Engineer
-
- Influencer
- Posts: 17
- Liked: 3 times
- Joined: Oct 18, 2017 6:40 pm
- Contact:
Re: Slow backup file merge with REFS
They're 12Gps drives. Our backups are forever-forward incremental with 30 restore points. No active or synthetic fulls.JaySt wrote:Interesting. I also have a customer running 730xd hardware and raid6 with Windows server 2016 refs . No issues so far. Running a few TBs of hyperv vms with weekly synth fulls currently i believe
What are the exact drive types used in your case? 12Gbps NLSAS or 6Gbps Sata?
-
- Service Provider
- Posts: 454
- Liked: 86 times
- Joined: Jun 09, 2015 7:08 pm
- Full Name: JaySt
- Contact:
Re: Slow backup file merge with REFS
So same drive types it seems.
Strange. No issues here so far. Have you tried running a synth full? same merge times?
Strange. No issues here so far. Have you tried running a synth full? same merge times?
Veeam Certified Engineer
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Again: Why should the drive type matter for an issue which happens after a certain amount of block cloned data is on the disk?
-
- Veeam Software
- Posts: 17
- Liked: 5 times
- Joined: Oct 04, 2017 8:14 am
- Full Name: Boris Urban
- Contact:
Re: Slow backup file merge with REFS
what kind of RAID Controller (PERCxxx) /Stripe Size/ are you using ? PThey're 12Gps drives. Our backups are forever-forward incremental with 30 restore points. No active or synthetic fulls.
-
- Influencer
- Posts: 17
- Liked: 3 times
- Joined: Oct 18, 2017 6:40 pm
- Contact:
Re: Slow backup file merge with REFS
As mentioned in first post:UrbanB wrote:
what kind of RAID Controller (PERCxxx) /Stripe Size/ are you using ? P
Veeam B&R 9.5.0.1038
Repository is 64K REFS on a physical Dell R730XD in a RAID 6.
Other details:
12Gbps on a perc H730 mini default stripe size, all latest firmware
OS is Windows 2016 build 1607 (14393.1884) with all latest patches
Using diskspd.exe based on this article https://www.veeam.com/kb2014
Code: Select all
CPU | Usage | User | Kernel | Idle
-------------------------------------------
avg.| 1.94%| 1.11%| 0.83%| 98.06%
READS
Total IO
thread | bytes | I/Os | MB/s | I/O per s | file
------------------------------------------------------------------------------
0 | 146989187072 | 2242877 | 233.63 | 3738.08 | #X (111GB)
------------------------------------------------------------------------------
total: 146989187072 | 2242877 | 233.63 | 3738.08
WRITES
CPU | Usage | User | Kernel | Idle
-------------------------------------------
avg.| 1.07%| 0.40%| 0.66%| 98.93%
Total IO
thread | bytes | I/Os | MB/s | I/O per s | file
------------------------------------------------------------------------------
0 | 350175100928 | 667906 | 556.58 | 1113.16 | D:\testfile.dat (1024MB)
------------------------------------------------------------------------------
total: 350175100928 | 667906 | 556.58 | 1113.16
READ/WRITE
CPU | Usage | User | Kernel | Idle
-------------------------------------------
avg.| 0.99%| 0.67%| 0.32%| 99.01%
Total IO
thread | bytes | I/Os | MB/s | I/O per s | file
------------------------------------------------------------------------------
0 | 60864069632 | 116089 | 96.74 | 193.48 | D:\testfile.dat (1024MB)
------------------------------------------------------------------------------
total: 60864069632 | 116089 | 96.74 | 193.48
TL;DR
I don't believe this is a physical server issue based on these test and that a merge of a 330GB incremental to a 1.45TB full took 18 hours and brought the system to an almost standstill.
-
- Service Provider
- Posts: 454
- Liked: 86 times
- Joined: Jun 09, 2015 7:08 pm
- Full Name: JaySt
- Contact:
Re: Slow backup file merge with REFS
again: have you tried running a synthetic full ? To compare the synth full process to the process of merging the oldest two backup files? any difference?
Veeam Certified Engineer
-
- Influencer
- Posts: 17
- Liked: 3 times
- Joined: Oct 18, 2017 6:40 pm
- Contact:
Re: Slow backup file merge with REFS
I answered that question on page 3; no haven’t done synthetic or active fulls. It was suggested by support to do that but it doesn’t really give an indication of the problem (other than saying your full backup may be ‘bad’) and no gaurantee the issue won’t come back. Furthermore, carving off 30ish TB to start a new backup chain isn’t easily doable for us (or many organizations for that matter). I’d be much more willing to entertain that as a solution if it was less ‘role the dice and see what happens’ type of fix.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
Why do you need 30ish TB for a synthetic? With REFS these are spaceless. Stil that is also the reason i think this will be of no use...
-
- Influencer
- Posts: 17
- Liked: 3 times
- Joined: Oct 18, 2017 6:40 pm
- Contact:
Re: Slow backup file merge with REFS
Apologies, support's suggestion is to do an active full not synthetic, I didn't clarify that.
-
- Novice
- Posts: 6
- Liked: 3 times
- Joined: Dec 13, 2017 6:13 pm
- Contact:
Re: Slow backup file merge with REFS
We are experiencing the exact same issue. Right now, I have 4 jobs compacting using fast-clone. One is 7% done at 8 hours of compacting. Another is at 86% done at 12 hours. The 3rd is at 62% at 19 hours, and the last is at 44% at 11 hours. The server was last rebooted about 25 hours ago.
We did try the suggested MS registry changes, but that reduced performance drastically. At least our jobs eventually complete now. With the registry changes, we'd be looking at weeks for some of these jobs to complete.
There were the registry settings edited in our tests:
We also tried these with also changing the following
It's my understanding that anything with fast-clone is only supposed to be metadata updates and should be extremely fast. We have not noticed this to be the case, as file merges typically are the longest task for many of our backup jobs. When we had these registry settings enabled, RAM usage never went over 10%. With default registry settings, we seem to max out at 35-40% or so RAM. I'm curious as to why REFS doesn't cache more in RAM.
We opened a support case (02401017) with Veeam and all suggestions were deemed unacceptable for our environment. We were told to try to reduce the number of I/O operations that can go to the repo. Looking at the I/O of the repo, seeing the disks basically be idle, we rejected this "solution". Disk I/O is almost always under 10MB/s and often under 100k a second. The disks are just sitting idle.
Our repos are patched to latest - 2017-11 Cumulative Update for Windows Server 2016 for x64-based Systems (KB4051033)
Repo hardware is a Cisco S3260 with 48 disks (plus spares) in a Raid60 array (12 x 4), giving us 218TB of capacity with 128GB of RAM. The S3260 giving us the most problems is using about 100TB of that capacity. RAM usage right now is 33% used and CPU is at 1%. Disk I/O is currently a whopping 1 KB/sec.
Veeam has really pushed REFS hard, having webinars talking about it and touting it's benefits, but it seems that almost everyone that is using it in a production environment of any scale is having major issues. I do think it is a Microsoft Issue with REFS, but that doesn't shift blame away from Veeam. Veeam is pushing this as a viable solution and it clearly is not in it's current state.
We did try the suggested MS registry changes, but that reduced performance drastically. At least our jobs eventually complete now. With the registry changes, we'd be looking at weeks for some of these jobs to complete.
There were the registry settings edited in our tests:
Code: Select all
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsDisableCachedPins = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsNumberOfChunksToTrim = 32
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsProcessedDeleteQueueEntryCountThreshold = 512
Code: Select all
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableInlineTrim = 1
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Disk\TimeOutValue=0x78
We opened a support case (02401017) with Veeam and all suggestions were deemed unacceptable for our environment. We were told to try to reduce the number of I/O operations that can go to the repo. Looking at the I/O of the repo, seeing the disks basically be idle, we rejected this "solution". Disk I/O is almost always under 10MB/s and often under 100k a second. The disks are just sitting idle.
Our repos are patched to latest - 2017-11 Cumulative Update for Windows Server 2016 for x64-based Systems (KB4051033)
Repo hardware is a Cisco S3260 with 48 disks (plus spares) in a Raid60 array (12 x 4), giving us 218TB of capacity with 128GB of RAM. The S3260 giving us the most problems is using about 100TB of that capacity. RAM usage right now is 33% used and CPU is at 1%. Disk I/O is currently a whopping 1 KB/sec.
Veeam has really pushed REFS hard, having webinars talking about it and touting it's benefits, but it seems that almost everyone that is using it in a production environment of any scale is having major issues. I do think it is a Microsoft Issue with REFS, but that doesn't shift blame away from Veeam. Veeam is pushing this as a viable solution and it clearly is not in it's current state.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Slow backup file merge with REFS
I 100 % agree with Jackal830 and can only recommend one thing now: go back to NTFS! I costed us alot money and time but now with NTFS everything works so well that the time with REFS seemed like a nightmare. Every morning there was another fire to put out - if not the monitoring system woke us up at night because the server was unresponsive again. Next time we will not be the first ones to try!
-
- Novice
- Posts: 6
- Liked: never
- Joined: Dec 14, 2017 1:47 pm
- Full Name: Alexander Eriksson
- Contact:
Re: Slow backup file merge with REFS
Hi,
I was just wondering if the RAID level can be a factor in this issue? I have read this thread and noticed that RAID6 or RAID60 is used most of the time.
Anyone here experience these REFS issues with RAID10? It´s a long-shot but can tha parity calculations be a factor?
We are looking at a new backup-infrastructure with Veeam, backing up around 1500 VMs (VM backups only) 30 days restore points. The solution that was suggested to us is
1 Dell server connected to a Dell JBOD with 60x6TB NLSAS disks in a RAID10 using REFS and Win2016.
When reading this post, I really don´t feel like going with REFS.
I was just wondering if the RAID level can be a factor in this issue? I have read this thread and noticed that RAID6 or RAID60 is used most of the time.
Anyone here experience these REFS issues with RAID10? It´s a long-shot but can tha parity calculations be a factor?
We are looking at a new backup-infrastructure with Veeam, backing up around 1500 VMs (VM backups only) 30 days restore points. The solution that was suggested to us is
1 Dell server connected to a Dell JBOD with 60x6TB NLSAS disks in a RAID10 using REFS and Win2016.
When reading this post, I really don´t feel like going with REFS.
-
- Novice
- Posts: 6
- Liked: 3 times
- Joined: Dec 13, 2017 6:13 pm
- Contact:
Re: Slow backup file merge with REFS
It would be extremely difficult for us to go back to NTFS. I believe we may have grown too large for Veeam to be a viable product for us without the promised I/O savings of REFS.
We are a cloud service provider with over 100 customer jobs for backups. We have grown too large for our previous backup methodology to be cost effective. We used multiple older EMC VNX arrays in raid 10 configuration to store backup data. Raid 5 and 6 was too slow. As we added more customers, we had to keep adding VNX arrays. This was not cost feasible in the least. Renewing support on these older arrays was still cheaper than building a new solution, even with consumer-level parts, but adding new VNX hardware is very expensive. Long ago, we also rolled our own backup storage, saving a lot of money. That had a different set of issues that I'm not prepared to go into detail about here. Let's just say that we now want storage hardware that is highly reliable and had a trusted vendor backing it going forward.
When Veeam started pushing REFS, we figured we could condense 3 full cabinets of VNX arrays into 8U. We bought hardware and at first it seemed to work wonderfully. As we migrated more jobs over though, performance got worse and worse. We are now sitting here wondering if we can limp by and hope a fix is found, or to abandon our project and go back to NTFS. The main problem with NTFS is you can't have giant drive sizes (multiple hundreds of TB). Veeam addresses this by having scale-out repositories, but they have their own set of limitations. For example, you have to have per-vm backup data, not per-job. You lose a lot of dedup space when doing that. Some of our customers have full-backup sizes of 20+ TB, and that is growing. NTFS just doesn't scale well for us.
There are many suggestions for improvements we have for Veeam, but they seem to have no interest. For example, why do merges and compacts have to happen at the end of a job? Why can't all jobs run and then merges and compacts happen separately outside the backup window? The hardware is just sitting idle during this time. Right now, we have 20 jobs still running from overnight. 5 of them are in the merging or compacting stage. 11 of them are simply waiting for repository resources. We are now outside our backup window for these 11 jobs and they are just sitting around, waiting for merging and compacting of other jobs. This seems silly. The main priority here should be getting data off production datastores in the window, and then performing file manipulation later. Another request is the ability to manually fire off a compact. Sometimes we are so backed up on jobs in the morning the we cancel all running merges and compacts so the other jobs can run. This leaves a lot of temporary files out on the repo that don't get cleaned up until next job run (and likely canceled, again). At noon, the backup repo is idle, doing nothing. Why can't I just manually fire off a compact to take care of that issue? I can't, the feature does not exist. I'd have to do another backup, mid-day.
We are also a Veeam cloud connect partner, and that product is riddled with problems that seem ridiculous to us. For example, if we need to perform any sort of repo migration, we have to call every single one of our customers and have them perform tasks on their clients so their backups begin to work on other repos. We retired one repo and no matter what one customer did, he couldn't get his backup to work on the new. Veeam requested that we have the customer export his entire DB, send it to us, and then send it to Veeam. Could you imagine having hundreds of customers on that product? I know this is off topic for this discussion, but it's just another example how Veeam does not scale well.
Veeam, recently, has been playing the Microsoft blame-game with us. When we upgraded to Veeam 9.5, we ran into a DB error and Veeam's solution was to have us upgrade to a newer version of SQL. Granted, we were on an older SQL version (2008 R2), but right on this page: posting.php?mode=reply&f=2&t=42759 it states 2008 is supported. When we ran into the issue, the Veeam support agent told us it's supported unless you run into this specific error. What? So it's supported unless you have a problem? Then the Veeam support rep wanted us to upgrade our SQL DB that had many other products using is, in the middle of the day and had difficulties understanding that we couldn't just take something down mid-day. This took us months to resolve and pushed back our entire VMware ESXi 6.5 upgrade along with it. All because they wouldn't help us fix a Veeam DB issue on a supported DB version.
We also had an issue where there was some sort of TLS issue that prevented us from powering on restored VMs. This only affected Server 2008, but again, this is a supported version. One would think that Veeam would create a patch to address this issue. Nope, this is a Microsoft issue and you need to make registry changes. OK, I get it that it's something with Microsoft, but this is a product that you support on MicroSoft OS's. If there is an issue that affects all your customers running on a specific version of an OS you support, you need to patch that. Even if the patch simply changes the registry, there should be a patch.
Then there is this REFS issue, which we agree is a MicroSoft issue. That doesn't make it any easier to swallow. As far as I have read, Veeam hasn't even been able to reproduce this issue internally. Why not? They are pushing this as a new killer feature, but they aren't giving it the time and resources it deserves to be fixed. Veeam should be working directly with Microsoft, with a large joint lab. They are quick to point the finger to MS and they seem to think the problem stops there for them. It doesn't work like this in the business world. If someone has a problem with your product, it's your problem.
Basically, we are at a point where we have to decide if we can wait and hope for this issue to be fixed, or spend a lot of money and go back to NTFS, or spend more money and go to another product that is able to scale much better for our environment.
We are a cloud service provider with over 100 customer jobs for backups. We have grown too large for our previous backup methodology to be cost effective. We used multiple older EMC VNX arrays in raid 10 configuration to store backup data. Raid 5 and 6 was too slow. As we added more customers, we had to keep adding VNX arrays. This was not cost feasible in the least. Renewing support on these older arrays was still cheaper than building a new solution, even with consumer-level parts, but adding new VNX hardware is very expensive. Long ago, we also rolled our own backup storage, saving a lot of money. That had a different set of issues that I'm not prepared to go into detail about here. Let's just say that we now want storage hardware that is highly reliable and had a trusted vendor backing it going forward.
When Veeam started pushing REFS, we figured we could condense 3 full cabinets of VNX arrays into 8U. We bought hardware and at first it seemed to work wonderfully. As we migrated more jobs over though, performance got worse and worse. We are now sitting here wondering if we can limp by and hope a fix is found, or to abandon our project and go back to NTFS. The main problem with NTFS is you can't have giant drive sizes (multiple hundreds of TB). Veeam addresses this by having scale-out repositories, but they have their own set of limitations. For example, you have to have per-vm backup data, not per-job. You lose a lot of dedup space when doing that. Some of our customers have full-backup sizes of 20+ TB, and that is growing. NTFS just doesn't scale well for us.
There are many suggestions for improvements we have for Veeam, but they seem to have no interest. For example, why do merges and compacts have to happen at the end of a job? Why can't all jobs run and then merges and compacts happen separately outside the backup window? The hardware is just sitting idle during this time. Right now, we have 20 jobs still running from overnight. 5 of them are in the merging or compacting stage. 11 of them are simply waiting for repository resources. We are now outside our backup window for these 11 jobs and they are just sitting around, waiting for merging and compacting of other jobs. This seems silly. The main priority here should be getting data off production datastores in the window, and then performing file manipulation later. Another request is the ability to manually fire off a compact. Sometimes we are so backed up on jobs in the morning the we cancel all running merges and compacts so the other jobs can run. This leaves a lot of temporary files out on the repo that don't get cleaned up until next job run (and likely canceled, again). At noon, the backup repo is idle, doing nothing. Why can't I just manually fire off a compact to take care of that issue? I can't, the feature does not exist. I'd have to do another backup, mid-day.
We are also a Veeam cloud connect partner, and that product is riddled with problems that seem ridiculous to us. For example, if we need to perform any sort of repo migration, we have to call every single one of our customers and have them perform tasks on their clients so their backups begin to work on other repos. We retired one repo and no matter what one customer did, he couldn't get his backup to work on the new. Veeam requested that we have the customer export his entire DB, send it to us, and then send it to Veeam. Could you imagine having hundreds of customers on that product? I know this is off topic for this discussion, but it's just another example how Veeam does not scale well.
Veeam, recently, has been playing the Microsoft blame-game with us. When we upgraded to Veeam 9.5, we ran into a DB error and Veeam's solution was to have us upgrade to a newer version of SQL. Granted, we were on an older SQL version (2008 R2), but right on this page: posting.php?mode=reply&f=2&t=42759 it states 2008 is supported. When we ran into the issue, the Veeam support agent told us it's supported unless you run into this specific error. What? So it's supported unless you have a problem? Then the Veeam support rep wanted us to upgrade our SQL DB that had many other products using is, in the middle of the day and had difficulties understanding that we couldn't just take something down mid-day. This took us months to resolve and pushed back our entire VMware ESXi 6.5 upgrade along with it. All because they wouldn't help us fix a Veeam DB issue on a supported DB version.
We also had an issue where there was some sort of TLS issue that prevented us from powering on restored VMs. This only affected Server 2008, but again, this is a supported version. One would think that Veeam would create a patch to address this issue. Nope, this is a Microsoft issue and you need to make registry changes. OK, I get it that it's something with Microsoft, but this is a product that you support on MicroSoft OS's. If there is an issue that affects all your customers running on a specific version of an OS you support, you need to patch that. Even if the patch simply changes the registry, there should be a patch.
Then there is this REFS issue, which we agree is a MicroSoft issue. That doesn't make it any easier to swallow. As far as I have read, Veeam hasn't even been able to reproduce this issue internally. Why not? They are pushing this as a new killer feature, but they aren't giving it the time and resources it deserves to be fixed. Veeam should be working directly with Microsoft, with a large joint lab. They are quick to point the finger to MS and they seem to think the problem stops there for them. It doesn't work like this in the business world. If someone has a problem with your product, it's your problem.
Basically, we are at a point where we have to decide if we can wait and hope for this issue to be fixed, or spend a lot of money and go back to NTFS, or spend more money and go to another product that is able to scale much better for our environment.
-
- Novice
- Posts: 6
- Liked: 3 times
- Joined: Dec 13, 2017 6:13 pm
- Contact:
Re: Slow backup file merge with REFS
I would not suggest it in it's current state. When issues are happening, disks are basically idle. I haven't tested raid10 and REFS, but if Windows disk I/O reported by Resource Monitor is to be believed, it's not an IOP issue. Right now, my backups are basically stalled with a disk rate of 100K a second and a disk queue length of 0.aleeri wrote:Hi,
I was just wondering if the RAID level can be a factor in this issue? I have read this thread and noticed that RAID6 or RAID60 is used most of the time.
Anyone here experience these REFS issues with RAID10? It´s a long-shot but can tha parity calculations be a factor?
We are looking at a new backup-infrastructure with Veeam, backing up around 1500 VMs (VM backups only) 30 days restore points. The solution that was suggested to us is
1 Dell server connected to a Dell JBOD with 60x6TB NLSAS disks in a RAID10 using REFS and Win2016.
When reading this post, I really don´t feel like going with REFS.
What's really annoying is when the issue is super bad, you can't even browse the Veeam Repo directory, even though there is no I/O. There are serious issues with REFS and it is certainly not ready for use.
-
- Novice
- Posts: 6
- Liked: 3 times
- Joined: Dec 13, 2017 6:13 pm
- Contact:
Re: Slow backup file merge with REFS
I have been talking with a colleague this morning and he told me that there was some changes for REFS in update 3 of Veeam 9.5. I was excited to see what the "fixes" were:
This is the latest stable version of Windows 2016 (version 1607) and we are running it. So, Veeam is going to disable the feature that is supposed to be the number one reason to use REFS with Veeam? Wow.Note: that Veeam has decided to temporarily disable fast cloning functionality (ReFS) when using repository on Windows Server 2016 1607 in Update 3.
-
- Novice
- Posts: 6
- Liked: 3 times
- Joined: Dec 13, 2017 6:13 pm
- Contact:
Re: Slow backup file merge with REFS
I am very concerned with Fast-Clone being disabled in Update 3.
Does this mean that once it's re-enabled, every single job is going to need either a full backup or compact run on it for I/O savings again?
If you copy over backup data from a NTFS drive to a REFS drive, Veeam either has to do a full backup or a compact to start using the I/O saving features of REFS. The file system in itself isn't enough, the application has to be REFS aware. If Fast-Clone is disabled, is that essentially getting rid of all the I/O savings and metadata updates for the data on disk?
Those folks that use REFS and take a lot of synthetic fulls are about to see their disk usage explode as well. Fast-Clone is what makes those synthetic fulls take very little space.
If all of this is true, even the few people that are having REFS work correctly for them are in for a wild ride.
Does this mean that once it's re-enabled, every single job is going to need either a full backup or compact run on it for I/O savings again?
If you copy over backup data from a NTFS drive to a REFS drive, Veeam either has to do a full backup or a compact to start using the I/O saving features of REFS. The file system in itself isn't enough, the application has to be REFS aware. If Fast-Clone is disabled, is that essentially getting rid of all the I/O savings and metadata updates for the data on disk?
Those folks that use REFS and take a lot of synthetic fulls are about to see their disk usage explode as well. Fast-Clone is what makes those synthetic fulls take very little space.
If all of this is true, even the few people that are having REFS work correctly for them are in for a wild ride.
-
- Influencer
- Posts: 17
- Liked: 3 times
- Joined: Oct 18, 2017 6:40 pm
- Contact:
Re: Slow backup file merge with REFS
Have you seen this: veeam-backup-replication-f2/refs-4k-hor ... ml#p264150 Can't confirm the issues will be solved but it's progress nonetheless.Jackal830 wrote: Then there is this REFS issue, which we agree is a MicroSoft issue. That doesn't make it any easier to swallow. As far as I have read, Veeam hasn't even been able to reproduce this issue internally. Why not? They are pushing this as a new killer feature, but they aren't giving it the time and resources it deserves to be fixed. Veeam should be working directly with Microsoft, with a large joint lab. They are quick to point the finger to MS and they seem to think the problem stops there for them. It doesn't work like this in the business world. If someone has a problem with your product, it's your problem.
Basically, we are at a point where we have to decide if we can wait and hope for this issue to be fixed, or spend a lot of money and go back to NTFS, or spend more money and go to another product that is able to scale much better for our environment.
-
- Influencer
- Posts: 17
- Liked: 3 times
- Joined: Oct 18, 2017 6:40 pm
- Contact:
Re: Slow backup file merge with REFS
Since i was the one who bumped this post and there's nothing worse than support forum posts without followups . . .
Recap:
Issue: Slow file merges, compacts and defrags. Sometimes so slow it bogged the system down to almost crashed state. Backups run very fast 300-600 MB/s merges were getting slower and slower.
Environment: 2 Dell 730xds with 12x12Gbps drives in a raid 6. Formatted with 64k block size. Standard stripe size. 54TB repository. Windows 2016 Version 1607 Build 14393.1914. Latest MS patches and firmware. Forward forever, no active or synthetic fulls, no GFS, 30 restore points.
Troubleshooting:
Diskspd.exe results = normal
Disk queue length = 0
Processor and memory usage = nominal.
Free space > 15%
NOTE: Server 2 with same hardware and OS and ReFS setup, contains all of the backup copies. Merges and speeds all normal.
Support wanted to do an active full, as expected, backup time was normal. Essentially this resets the backup chain and we won't know the results for another 20 days. However, I don't think this is a 'fix' it just implies that the full backup could be 'bad'.
Current State
Out of curiosity I moved a smaller set of backups to an NTFS volume and merge speeds, while not as fast as originally with ReFS, are much faster. With the volume of posts about ReFS and Gostev's post about a call with the ReFS team veeam-backup-replication-f2/refs-4k-hor ... ml#p264150 It would seem more and more that the issue is with the file system. I personally think Veeam should be adding this as a disclaimer on their documentation, especially that in Update 3 Fast Clone is being disabled on certain builds of Server 2016. Also, like a previous poster stated, this needed far more rigorous testing imho.
I'm going to wait for the MS patch mentioned earlier and see if that fixes issues. If not, we will be going back to NTFS, even though it's slower, it's stable. Hope anyone reading this finds some use out of it.
Recap:
Issue: Slow file merges, compacts and defrags. Sometimes so slow it bogged the system down to almost crashed state. Backups run very fast 300-600 MB/s merges were getting slower and slower.
Environment: 2 Dell 730xds with 12x12Gbps drives in a raid 6. Formatted with 64k block size. Standard stripe size. 54TB repository. Windows 2016 Version 1607 Build 14393.1914. Latest MS patches and firmware. Forward forever, no active or synthetic fulls, no GFS, 30 restore points.
Troubleshooting:
Diskspd.exe results = normal
Disk queue length = 0
Processor and memory usage = nominal.
Free space > 15%
NOTE: Server 2 with same hardware and OS and ReFS setup, contains all of the backup copies. Merges and speeds all normal.
Support wanted to do an active full, as expected, backup time was normal. Essentially this resets the backup chain and we won't know the results for another 20 days. However, I don't think this is a 'fix' it just implies that the full backup could be 'bad'.
Current State
Out of curiosity I moved a smaller set of backups to an NTFS volume and merge speeds, while not as fast as originally with ReFS, are much faster. With the volume of posts about ReFS and Gostev's post about a call with the ReFS team veeam-backup-replication-f2/refs-4k-hor ... ml#p264150 It would seem more and more that the issue is with the file system. I personally think Veeam should be adding this as a disclaimer on their documentation, especially that in Update 3 Fast Clone is being disabled on certain builds of Server 2016. Also, like a previous poster stated, this needed far more rigorous testing imho.
I'm going to wait for the MS patch mentioned earlier and see if that fixes issues. If not, we will be going back to NTFS, even though it's slower, it's stable. Hope anyone reading this finds some use out of it.
Who is online
Users browsing this forum: ddujakovic, Google [Bot], ken.tyrrell, saschak and 144 guests