Comprehensive data protection for all workloads
dasfliege
Service Provider
Posts: 79
Liked: 17 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

I was one of the persons heavily involved in solving the problems with 2019 LTSC ReFS in January/February. Things went well for us since then. Our Repos all performed perfectly. But since 1-2 Weeks, we also encounter heavy drops in performance during ReFS Operations like filemerge. "Funny" thing is, that we didn't performed any updates. Veeam is still on v10.0.0.4461 and we also didn't installed any windows updates, as our server is still running in "testmode" and with the private refs.sys we received from MS back in february.

I'm going to do some troubleshooting myself now and will update windows and veeam to it's latest version first. Just wanted to let you know, that i also observed some crazy behavior regarding ReFS performance.

Gostev
SVP, Product Management
Posts: 27126
Liked: 4439 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

Make sure your tape export jobs don't overlap with synthetic fulls. This was the issue for one other customer, where it also started "suddenly" just because tape outs were taking longer and longer each day, and eventually started to overlap with synthetic fulls.

If you don't find anything, then thanks to this previous case our support now has an excellent performance debug version of the data mover, which lays out how long all I/O operations take by type. The log it produces makes it super easy to pin-point the issue. For example, in the previous case this log immediately made it clear that the actual block cloning performance was not an issue. But I believe this module requires 10a.

dasfliege
Service Provider
Posts: 79
Liked: 17 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

Thanks Anton. I already created a case with veeam, in order to get the block-clone-spd utility and Andrey from the support told me, that there may be a problem with Tape-Jobs running at the same time. But...

I have now stopped all jobs and rebooted both our repository servers before i triggered the block-clone-spd and the values are really really bad. I guesss as no veeam components are running at all, i have to check that behavior with MS again? Maybe Andrew Hanson or Chris Puckett is still reading this threat?? :-)

These are our values from block-clone-spd:
All block cloning took 72.773s.
Average speed: 703.561 MiB/s

I also wonder how i should completely avoid, that tape jobs don't overlap with other synthetic operations. We export 50TB to tape weekly, which takes quite a while. If i can't run any normal backup (it's merging process) during that timeframe, we wouln't be able to take backups for 2-3 days per week, which isn't an option.

Gostev
SVP, Product Management
Posts: 27126
Liked: 4439 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

Yes, if block cloning performance is bad even if there's no activity on the server, then it is better to open a case with Microsoft. There were some changes on the ReFS team, so Andrew is responsible for something else now.

One major architecture change in v11 implicitly addresses the tape out overlap issue too, so going forward this will not be an issue for synthetic full performance. And for now, just scheduled your synthetic fulls outside of those 2-3 days when the tape out happens.

dasfliege
Service Provider
Posts: 79
Liked: 17 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

Anton,
We don't do synthetic fulls. But as merges of normal backupfiles and creation of GFS restorepoints also are block clone activities, i wonder if they also interfere with tape backups, or if this problem really only applies to synthetic full operations.

Gostev
SVP, Product Management
Posts: 27126
Liked: 4439 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

How do you create your GFS restore points then, if you "don't do synthetic fulls"? I mean, the only other way to create a GFS full backup is to do an active full, but this one obviously does not include block cloning.

dasfliege
Service Provider
Posts: 79
Liked: 17 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

Sorry for the confusion. We of course have synthetic full operations running, but only via Copy Jobs GFS restore point generation. We don't have synthetic fulls enabled in our primary backup jobs.
So my question is, if also copy job GFS restore point generations interfere with tape jobs (as they obviously use the same mechanism), or if this only applies to synthetic full operations activated in a primary backup job.

Also i would need to know, if tape-jobs have an impact on backupfile merge operations. As these are happening after each and every backup job run, we would have a problem running normal backups during the period when the exports to tape are running.

Gostev
SVP, Product Management
Posts: 27126
Liked: 4439 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

Yes, tape jobs which export synthesized full backup will interfere with all functionality that uses block cloning, in cases when a tape job requires access to the same file that is also used as a sourse for block cloning.

vmtech123
Enthusiast
Posts: 42
Liked: 16 times
Joined: Mar 28, 2019 2:01 pm
Full Name: SP
Contact:

Re: Windows 2019, large REFS and deletes

Post by vmtech123 » 1 person likes this post

Thanks the Veeam forums once again for this post. I replace my physical proxy/repo servers by unmounting the san drives and connecting them to the new physical devices, I removed the old proxy's and repos and added them on the new devices and imported remapped all my jobs. I was quite happy to not lose my 500TB+ data. backups were running great, but the copy jobs were not catching up and taking FOREVER.

Running the following commands, and stopping some of the windows defender services on specific folders made a HUGE difference. CPU and Memory is at a much more reasonable level now also. I had ran these on the old servers but it was some time ago.

fsutil behavior set DisableDeleteNotify ReFS 1
REG ADD HKLM\System\CurrentControlSet\Control\FileSystem /v RefsEnableLargeWorkingSetTrim /t REG_DWORD /d 1

Gostev
SVP, Product Management
Posts: 27126
Liked: 4439 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev » 1 person likes this post

Please note that the second reg key is created by Veeam automatically starting from 10a.

dasfliege
Service Provider
Posts: 79
Liked: 17 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

Anyone else still having heavy issues with 2019 1809 even with all tweaks in place and no synthetic jobs running parralel?
We still have enormous performance drops as soon as it comes to synthetic operations like merges or GFS point creation. Veeam support is unable to assist any more, so i may have to open a MS premier case again.

mkretzer
Expert
Posts: 682
Liked: 159 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

@dasfliege did you not talk to ReFS devs directly in the past?

dasfliege
Service Provider
Posts: 79
Liked: 17 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

Yes i did. I will try to contact them directly, prior to opening a regular case.

18436572
Novice
Posts: 9
Liked: 1 time
Joined: Jul 25, 2017 6:52 pm
Full Name: Devin Meade
Contact:

Re: Windows 2019, large REFS and deletes

Post by 18436572 »

We have had a repository on Windows 2016 v1607 and REFS due to this thread for most of this year. It works absolutely great with reverse incrementals. I now have the opportunity to reload this server with the latest Windows server version. All Veeam backups have been copied elsewhere or removed, so I am "going for it" now (hopefully tomorrow).

I see that Windows Server 2019 LTSB version 2009 is now available. I plan on installing this version because this thread now advises that 2019 is stable with all the latest patches - and I have nothing really to loose. Also per this thread it seems that the tweaks are not really necessary, but can deploy them if needed.

Also we are on Veeam B&R 10a (v10.0.1.4854).

Any qualms from this group about Windows Server 2019 Standard LTSB "v2009" ?

mkretzer
Expert
Posts: 682
Liked: 159 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

Hello,

what do you mean?
https://docs.microsoft.com/en-us/window ... lease-info does not show any new LTSB version!

Markus

NightBird
Service Provider
Posts: 201
Liked: 42 times
Joined: Apr 28, 2009 8:33 am
Location: Strasbourg, FRANCE
Contact:

Re: Windows 2019, large REFS and deletes

Post by NightBird »

Certainly SAC 2009 ? But not ltsb

18436572
Novice
Posts: 9
Liked: 1 time
Joined: Jul 25, 2017 6:52 pm
Full Name: Devin Meade
Contact:

Re: Windows 2019, large REFS and deletes

Post by 18436572 »

Apologies - in the MS Business Center download it shows:
"Windows Server 2019 (Standard Core/Datacenter Core) (updated Sept 2020) 64 Bit English"

I took that as version 2009 because it was updated Sept 2020. I downloaded it and it's v1809.

I assume :lol: no issues with Windows Server 2019 Standard v1809 and REFS, correct?

mkretzer
Expert
Posts: 682
Liked: 159 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

For us 2019 v1809 with latest updates still works "good enough" but not quite as fast as 2004 SAC.

janisk
Lurker
Posts: 1
Liked: never
Joined: Nov 09, 2020 3:17 pm
Full Name: Janis
Contact:

Re: Windows 2019, large REFS and deletes

Post by janisk »

PeterC wrote: Aug 13, 2020 6:57 am It looks like we are back to square 1, at this moment our 2019 LTSC repo is crawling to the finishline during backups.

We are using an HPE Apollo 4200 with 1 200 TB volume as repo for our Veeam VBR 10a.
We have had a lot of trouble in the past which seemed to have been solved after ReFS patch KB 4531331.
After setting the task limit on that repo to 30 (32 core cpu) and installing the patch backups were running normal again.

Our backups used to be done at around 15:00, but after this they were finished between 06:00 - 07:00. We almost cracked open a bottle of champagne.
Lately backups started to run a bit longer, nothing alarming.
But to days ago suddenly backups wer slowing down a lot more, they never finished before the afternoon.
No jobs were added or anything else (except Windows updates), and again we were in trouble.

What we see is that when jobs start the I/O to the repo is between 1 - 6 GB/s for the combined jobs. But when one of the jobs is ready and starts the merge, the I/O for the other jobs just drops to a few MB/s and sometimes KB/s. When the merges are finished the I/O returns to normal values.
But because we have multiple jobs running from 20:30 to 03:00, the later the job start the longer it takes for the job to finish.
We see some jobs waiting for infrastructue availability more than 10 hours.
We have actually no idea why this is suddenly happening. As i said before, we haven't changed anything on this repo.

Just to make sure all other repos (windows 2016) are performing like clockwork.
@PeterC - did upgrade to v10a really was the cause of ReFS slow merging issue return?

PeterC
Enthusiast
Posts: 26
Liked: 7 times
Joined: Apr 10, 2018 2:24 pm
Full Name: Peter Camps
Contact:

Re: Windows 2019, large REFS and deletes

Post by PeterC »

@ janisk, sorry for the late reply. But after update 10a we had several problems, but I can not really be sure that the problems we are facing are caused by update 10a.
We have had 2 cases with Veeam and Microsoft which lead to nothing. So at this point we are back at HPE, they are testing with a similar setup to see is the problem is caused by the hardware used.
If we get some answers I will post these here.

karsten123
Service Provider
Posts: 36
Liked: 3 times
Joined: Apr 03, 2019 6:53 am
Full Name: Karsten Meja
Contact:

Re: Windows 2019, large REFS and deletes

Post by karsten123 »

Until final clarification i will go for server 2016 for sure.

18436572
Novice
Posts: 9
Liked: 1 time
Joined: Jul 25, 2017 6:52 pm
Full Name: Devin Meade
Contact:

Re: Windows 2019, large REFS and deletes

Post by 18436572 » 1 person likes this post

Yes I am sticking with our 2016 server as well, it works flawlessly.

spiritie
Expert
Posts: 109
Liked: 12 times
Joined: Mar 01, 2016 10:16 am
Full Name: Gert van Niekerk
Location: Denmark
Contact:

Re: Windows 2019, large REFS and deletes

Post by spiritie »

R.I.P my 700 TB LUN on ReFS Server 2019 (1809, 17763.1554) fully patched, running super slow.

Recently we reduced retention on our jobs from 90 days to 30 days and Veeam is now using [slow clone] to merge everything into 30 days.

Is there anyway to disable fast clone on our repository? I'm almost sure that native merge will be faster.

Gostev
SVP, Product Management
Posts: 27126
Liked: 4439 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev » 1 person likes this post

No, I'm afraid this is not possible. 700TB is a lot of data to work through regardless, I would not be so sure native merge will be faster... you can't cheat physics :D

spiritie
Expert
Posts: 109
Liked: 12 times
Joined: Mar 01, 2016 10:16 am
Full Name: Gert van Niekerk
Location: Denmark
Contact:

Re: Windows 2019, large REFS and deletes

Post by spiritie »

Gostev for this particular job (only one enabled on this repository right now) it is "only" 7.54 TB data and 83 VM's.

Currently: 18-11-2020 13:30:25 :: Merging oldest incremental backup into full backup file (32% done) [fast clone] | Running time is now almost 3 hours.

For me this seems way to slow even when using fast clone. Resource manager report of barely 100 MB/s and disk queue length of 0.0.1.
This system has 128 GB of RAM, 90% of it is in standby.

To compare this to fresh data being copied into the machine during nightly backup copy jobs, we we are seeing much much higher performance.
(60 disks in RAID 60) where Veeam usually report around 1 GB/s.

Gostev
SVP, Product Management
Posts: 27126
Liked: 4439 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

In that case, this is most likely not a ReFS or block cloning issue at all. You should get support take a performance debug log to see what specific step in synthetic full processing takes a long time. This should give a good hint as to where the issue really is.

By the way, why are you using this non-default backup mode without periodic synthetic fulls? This is not typical for ReFS users.

popjls
Influencer
Posts: 17
Liked: 1 time
Joined: Jun 25, 2018 3:41 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by popjls »

2019 has been pretty good until recently, I think the latest refs patches have taken it back a step again... starting to see backup file checks taking forever again (and ram usage going through the roof) and the reg entries are still there unless these updates removed them which I doubt. These have normally completed without issue but now... 64 hours later, still going and sitting at 86-90%.. Sigh...

Gostev
SVP, Product Management
Posts: 27126
Liked: 4439 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev » 1 person likes this post

Thing is, I don't believe ReFS on 2019 has received any updates recently, or in the past many months. While it's easy to just blame everything on ReFS, you should always look at 3rd party software first, since it gets updates much more often than file system drivers. For example, high RAM usage can come from antivirus, hardware drivers or other software too.

Also, keep in mind all issues ever reported on ReFS were either in file deletion or block cloning logic. Backup file checks does neither (just regular reads) and I highly doubt ReFS can still have bugs like "RAM usage going through the roof when reading files" at its current level of maturity. Honestly, from your description I'd rather suspect a Veeam bug even, before saying it's a possible ReFS issue.

JRRW
Influencer
Posts: 22
Liked: 11 times
Joined: Dec 10, 2019 3:59 pm
Full Name: Ryan Walker
Contact:

Re: Windows 2019, large REFS and deletes

Post by JRRW » 1 person likes this post

A good point, Gostev; considering Defender is built into 2019, I have to imagine some people may have forgotten (as an example) to have appropriately applied Veeam's recommended AV Exclusions.

JRRW
Influencer
Posts: 22
Liked: 11 times
Joined: Dec 10, 2019 3:59 pm
Full Name: Ryan Walker
Contact:

Re: Windows 2019, large REFS and deletes

Post by JRRW »

mkretzer wrote: Nov 03, 2020 8:47 pm For us 2019 v1809 with latest updates still works "good enough" but not quite as fast as 2004 SAC.
I've wanted to roll SAC, but the other two main employees that leverage Veeam as admins aren't comfortable with Core :?

Even though I have WAC deployed, and frankly you don't ever need to log into the repository anyways... But yeah, SAC would be nice. Did you / do you upgrade your SAC in place? That was the other main concern I had as well, ensuring an upgrade wouldn't mess things up.

Post Reply

Who is online

Users browsing this forum: Bing [Bot], Majestic-12 [Bot] and 37 guests