Comprehensive data protection for all workloads
Locked
WinstonWolf
Veteran
Posts: 284
Liked: 11 times
Joined: Jan 06, 2011 8:33 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by WinstonWolf »

gm2783 wrote:Hi all

see below the commands to export the REFS driver:

mkdir C:\%Path%
expand -f:* C:\%Path%\windows10.0-kb4093120-x64_72c7d6ce20eb42c0df760cd13a917bbc1e57c0b7.msu C:\%Path%\Expand
expand -f:*.cab C:\%Path%\Expand\*.cab c:\%Path%\expand\2
expand -f:refs.sys c:\%Path%\expand\2\*.cab c:\%Path%\expand\2\
expand -f:refsv1.sys c:\%Path%\expand\2\*.cab c:\%Path%\expand\2\

Microsoft®Update Catalog:
https://www.catalog.update.microsoft.co ... =KB4093120
How can i change the refs.sys after this in an running System ? Thanks
mkaec
Veteran
Posts: 462
Liked: 133 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mkaec »

WinstonWolf wrote:How can i change the refs.sys after this in an running System ? Thanks
If it is a VM, you could power off the VM and mount the VHDX in the host. If it is a physical system, you could boot into a WinPE environment. Another option would be to populate the PendingFileRenameOperations registry value, hope you got it right, and reboot.
Jo_Seph_B
Lurker
Posts: 2
Liked: 1 time
Joined: Jun 26, 2018 2:37 pm
Full Name: Joseph Baldwin
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Jo_Seph_B » 1 person likes this post

We're in the process of migrating back to NTFS. It might be slow but at least it'll be reliable.

I appreciate this is mostly an MS issue but my biggest gripe with Veeam is the fact they still recommend ReFS as the best file system to use. The software even recommends it if I try and create an NTFS repository! Veeam need to react quicker to tell customers to stop using a technology if its not reliable and this is a prime example.

Even if this gets fixed my confidence is shot using ReFS. 4 months of looking daft in front of a customer because backups are failing just isn't an acceptable place to be, once migrated we'll have to manually check every backup for consistency as I need to be sure we've not lost data. I'm disappointed in both MS pushing a technology that was obviously not ready, and Veeam for not stopping customers deploying it with ReFS as the file system of choice and even pushing it as the best file system to be on. Its a shame as Veeam is the best backup product I've ever used and we've been a customer for 6+ years. I'm not going to lie at renewal time I'll be taking a look at the market again, which I've even felt the need to do before with Veeam.
Gostev
Chief Product Officer
Posts: 31428
Liked: 6633 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

Jo_Seph_B wrote:Veeam need to react quicker to tell customers to stop using a technology if its not reliable and this is a prime example.
Wait, how is this particular issue a prime example when it is NOT a reliability issue? All this latest regression does is make ReFS as slow as NTFS (which you're migrating back to) :D actually, still significantly faster - especially on low-end backup repositories. Or, are you talking about some other issue?
LBegnaud
Service Provider
Posts: 19
Liked: 7 times
Joined: Jan 24, 2018 12:08 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by LBegnaud » 1 person likes this post

I agree with Gostev. For smaller environments, the ReFS issues are negligible / non-existent, and in larger systems ReFS is the only sane way to do things (ReFS + reverse incremental is approximately one million times better than the old way of doing it). It has been a long time since ReFS has had show-stopping issues for us.

For context, we have about 12 remote B&R servers running WS2016 ReFS repos and one primary datacenter B&R server with ~500 VMs.
antipolis
Enthusiast
Posts: 73
Liked: 9 times
Joined: Oct 26, 2016 9:17 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by antipolis »

actually I think jo_seph_b has a point here...

while I'm not considering moving back to ntfs myself, waiting 6+ months for some fixes from MS was already terrible by itself; but having a regression like this only 2 months after getting said fixes, and then having to wait two more months... ?? (unless of course you manually rollback the driver... which I did) that's really really awful for a technology pushed by both MS and veeam as production ready (and I'm not intending to point any fingers here...)

great fun when you find out on monday morning that the weekend patching caused regression and half your backup jobs are still running...

then again... yeah... ReFS benefits are so huge that I will just deal with it but meh
Gostev
Chief Product Officer
Posts: 31428
Liked: 6633 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

Well, to be fair to Microsoft, the initial ReFS issue was extremely complex (as we found out after every aspect of it was finally understood), so yes that fix took very long. But for example, with this latest performance regression, we had a private hotfix from Microsoft within 3 days of reporting the issue back in May. It was simply too late already to include one into their June updates - I get it, we have these situations ourselves when a hotfix is too late to make the immediate update (code freeze has already happened) - just bad timing. And the issue is not critical enough for them to ship an out of band patch - there's no data corruption problem.

Overall, bugs and teething issues around new technologies happen to every vendor. This is absolutely normal, and so early adopters will always struggle. I recommend those who are mad at Microsoft for ReFS teething issues to look no further than VVols, which VMware has been pushing hard for the last few years "as the best file system to be on", quoting Joseph. Well guess what, it is only now that we know that almost everyone who is using VVols have their VMs, backups and replicas corrupted. Which is arguably 10x bigger issue that everything ReFS experienced to date - ironically, just about everything EXCEPT actual data corruption. Something we really want to see from a file system, by the way!

But somehow, this is considered "okay" - I mean, look at the VMware subforum. No one there blames Veeam for supporting VVols since day 1 and promoting this support in its release documentation, just like we did with out ReFS support. Neither anybody feels the need to look for replacing Veeam because our backups and replicas were equally impacted by this terrible VVols bug. Double standards at its best?
Mgamerz
Expert
Posts: 159
Liked: 28 times
Joined: Sep 29, 2017 8:07 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Mgamerz »

For replacing the file, for those above who asked, I rebooted server using shift + restart, and rebooted to command prompt in the options. Then I renamed refs.sys in C:\Windows\system32\drivers to something like refs_june2018.sys, and put in the one from february (pulled from april cumulative update using expand instructions above). Just rebooted it, and so far it seems to be working.
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by FrancWest »

No need, you can simply rename the original file to .old or something and copy the extracted refs.sys to the C:\Windows\System32\drivers folder.

For some reason, it's not locked even when the driver is loaded.

after replacing the driver, reboot the server to have it activated.

Franc.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

I still have my server in test mode, still on the private fix release months ago. I knew this wasn't over. No crashes with the private fix, no speed problems. Geeez :roll:
KFM
Service Provider
Posts: 13
Liked: 2 times
Joined: May 14, 2013 1:46 am
Full Name: KFM
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by KFM »

SBarrett847 wrote:I see this hard lockup if my Repo VM hasn't been assigned enough memory - increase memory and the issue doesn't occur for me.
Hi Stephen,

Thanks for your suggestion but my repo VMs all have 8 vCPU and 32 GB RAM. It locks up even when it's not doing any ingestion of data and the server is simply idling. i.e. no backups/restores/jobs are running to it during the deletion of the files. A long time ago I made sure that if I was doing any bulk big deletes that there would be no jobs running to minimise the impact to them if I had to reboot the repos.
billcouper
Service Provider
Posts: 150
Liked: 30 times
Joined: Dec 18, 2017 8:58 am
Full Name: Bill Couper
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by billcouper »

@KFM
I also have issues deleting files causing repository server to freeze/lock up with REFS volume. But only if I use the Veeam console to do it!

Every time I have used the Veeam console to delete backup chains the associated repository server locks solid. 100% cpu for hours and hours if you let it before hitting reset.

I have found the only reliable way to delete files is through the operating system. I just login to a repository server and delete the files/folders/whatever I need using Explorer, then run a rescan on the associated sobr in Veeam. When I delete files the server runs a high cpu/ram for a while and in disk management and if I keep refreshing I can see the amount of free space going up slowly. This always works. I have never had a repo server freeze doing it through Explorer.
gm2783
Service Provider
Posts: 6
Liked: 2 times
Joined: Apr 10, 2017 12:42 pm
Full Name: Giuseppe Marchese
Location: Mägenwil
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by gm2783 » 1 person likes this post

I've replaced the REFS drivers yesterday:
refs.sys
refsv1.sys

Our B2D are now 10x faster during fast cloning. :D

REFS Driver 10.0.14393.2312
Fast Cloning Time:
06:19 h

REFS Driver 10.0.14393.2097
Fast Cloning Time:
00:31 h

In this job there are 5 Fileserver. The Total size is 5.3 TB
The VIB's are aboout 80 ~ 90 GB.

Just for your information...
Jo_Seph_B
Lurker
Posts: 2
Liked: 1 time
Joined: Jun 26, 2018 2:37 pm
Full Name: Joseph Baldwin
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Jo_Seph_B »

Gostev wrote:Wait, how is this particular issue a prime example when it is NOT a reliability issue? All this latest regression does is make ReFS as slow as NTFS (which you're migrating back to) :D actually, still significantly faster - especially on low-end backup repositories. Or, are you talking about some other issue?
Feel free to track my issue over the last 4-6 months (0306889 and 0274109) since go live on a green field setup in Jan following all Veeams best practice guides on ReFS. 6 months down the road and the server still crashes with 100% CPU once a week, failed jobs constantly having to be reran. Veeam support really not able to offer much at all.

We resell Veeam, multiple other customers I've not even needed to log into for 6 months on NTFS. I get the success email each day and move on.

With ReFS I'm working 12 hour weekends to get consistent backups Rescanning datastores, moving files around that have got removed. even doing a ReFS disk restore as it failed. Separate repository still on ReFS also resulted in server crashes just moved the issue.

Tried turning off the ReFS features and it still fails. Everything results in a fail. Veeam support just pointing the finger elsewhere and not even willing to really guide or recommend when we asked if we should switch back, constantly having to chase for information as slow to respond. In the end we decided to get shot of ReFS, needless to say we'll be in touch further down the road about this at a higher level than a forum or ticket.

As above I know the underlying issue isn't Veeams fault. My grip is the fact Veeam are still using ReFS, if I deploy a repository on NTFS the software actually recommends I use ReFS!! Really! Its been unreliable since day 1 so why is Veeam software still suggesting I switch to it. Thats my issue here. Veeam support for ReFS should have been pulled months ago when the issues started flooding in, until MS resolved it and you guys checked it worked properly. Take a look at this thread, the impact wasn't small!

Not intersted in the vvol story, nice as it is, this thread is for Veeam and ReFS issues stop pushing focus on other issues not related.
Gostev
Chief Product Officer
Posts: 31428
Liked: 6633 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

Jo_Seph_B wrote:Feel free to track my issue over the last 4-6 months (0306889 and 0274109) since go live on a green field setup in Jan following all Veeams best practice guides on ReFS. 6 months down the road and the server still crashes with 100% CPU once a week, failed jobs constantly having to be reran.
OK, so you're talking about the original issue fixed in February ReFS.sys update - not about the latest performance regression being discussed on the last few pages.
Jo_Seph_B wrote:My grip is the fact Veeam are still using ReFS, if I deploy a repository on NTFS the software actually recommends I use ReFS!! Really! Its been unreliable since day 1 so why is Veeam software still suggesting I switch to it. Thats my issue here. Veeam support for ReFS should have been pulled months ago when the issues started flooding in, until MS resolved it and you guys checked it worked properly.
Well, then you have to know that the majority most of our customers actually had success with ReFS since day 1. I have noted this multiple times in the beginning of this thread, just like the fact that it was actually quite hard for us to reproduce the issue in our own lab to demonstrate one to Microsoft. This is because there were too many variables for running into the issue, which became apparent when the issues was fully understood - namely the usage of per-VM chains, backup modes with periodic fulls, 4KB ReFS cluster size, backup repository with low RAM size etc.

If ReFS was unreliable for the majority of users, of course we'd remove the recommendation from the UI. But instead, we opted to tweak one U2 to suggest 64KB cluster size as this was clearly one of the culprits - and continued working with those customers for whom the integration did not work reliable. As you can imagine, simply removing the recommendation would be even easier for us to do, if there were good reasons to do this.

I realize these facts probably do not change anything for you specifically, but I wanted to provide the bigger picture behind this so that you understand why we did not pull ReFS support completely. If we did, we would not be able to iterate on this integration to get it usable for everyone. Just like, for example, VMware would not be able to keep iterating on VSAN and get it to where it is today, if it pulled one completely during the reliability chaos period of initial VSAN releases.
AlexL
Service Provider
Posts: 88
Liked: 5 times
Joined: Aug 24, 2010 8:55 am
Full Name: Alex
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by AlexL »

We've been using a 36TB REFS repo, for Backup Copy jobs only, with much success for almost a year now, it holds about 500 VM's spread over 50+ jobs or so (5 day retention, since the copy is for disaster recovery purposes only). Using 4k cluster size btw, the feb update fixed our slowdowns, no cpu and/or memory issues notices even before the feb update, just slow downs of the fast clone part. Never experienced any freeze what so ever as far as I can recall.
Earlier this month we added a new volume (4U60G2), this one 400TB using 64k blocks, and started moving large jobs here, we're talking about 10 jobs of 2,5TB with 100GB incrementals and another 5 jobs of 7TB and 200GB incrementals. Almost from the start we experienced slowdowns and freezes. Tried a lot, misc refs registry settings, lowering concurrency, reverted the refs.sys driver but still freezes, only when I limit the bandwidth in the repo setup the freezes (mostly) seem to stop.

There is a lot posted, both here and around the net, but I am a little confused about the current state of affairs

a) are any ReFS registry settings recommended and/or needed?
b) could it possibly be that with 4k blocks I would have less freezes than with 64k blocks (disregarding any possibly cpu/mem issues which should have been resolved with the feb patch anyway)?
c) should I still expect any freezes, considering I have no throthling in place, any registry settings in place if needed and the 'correct' refs driver (feb)?
d) would using per-vm files make a difference?
e) is the ingestion rate the culprit or the large files/deltas/deletions?

Any help would be grately appreciated, we are in the process of buying a Cisco S3260 for our primary backup jobs and would hate to see the same problems on that box since that one will also be used for our largest jobs.

Regards,
Alex
Mgamerz
Expert
Posts: 159
Liked: 28 times
Joined: Sep 29, 2017 8:07 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Mgamerz »

The 2018 may and june cumulative updates for 2016 server really broke down performance from what I can tell on this thread (both updated the refs driver). I don't think there were any registry tweaks recommended after the feb fix.
Gostev
Chief Product Officer
Posts: 31428
Liked: 6633 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

@Mgamerz that is correct, moreover any prior registry tweaks were recommended to be removed.

@Alex how much RAM you have on your 400TB repository server? Probably not 10x more than on that server with 36TB repo, right? This could be the culprit as all complaints have largely stopped since Feb ReFS update, however I know Microsoft was still working to optimize ReFS memory consumption, and they told someone who was still having issues that those optimizations should help his case.
KFM
Service Provider
Posts: 13
Liked: 2 times
Joined: May 14, 2013 1:46 am
Full Name: KFM
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by KFM »

billcouper wrote:@KFM
I have found the only reliable way to delete files is through the operating system. I just login to a repository server and delete the files/folders/whatever I need using Explorer, then run a rescan on the associated sobr in Veeam. When I delete files the server runs a high cpu/ram for a while and in disk management and if I keep refreshing I can see the amount of free space going up slowly. This always works. I have never had a repo server freeze doing it through Explorer.
Hi Bill,

You're luckier than I! I see the same behaviour as you when deleting files through Explorer, except on most occasions where I'm deleting a large number of large files (3TB+) the system will eventually hang and a reset is the only way to recover.

A lot of the focus of this thread is on high memory or slow clone/transforms with not a lot on the server lockups, which leads me to ask if this is even the right forum or should I be opening a case with Microsoft?
billcouper
Service Provider
Posts: 150
Liked: 30 times
Joined: Dec 18, 2017 8:58 am
Full Name: Bill Couper
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by billcouper » 1 person likes this post

@KFM
Things that helped with server freezes during backup in our environment:
* Lower the limit of tasks per extent.
* Lower the limit of tasks per backup proxy.
* If you have 100% cpu usage (on the repo server) for an extended period during backup add more vCPU's.
* If you have a high memory pressure (on the repo server) for an extended period during backup add more GB's.
AlexL
Service Provider
Posts: 88
Liked: 5 times
Joined: Aug 24, 2010 8:55 am
Full Name: Alex
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by AlexL »

@Gostev:
It is the same server that got the extra volume so obviously the physical memory stayed the same, 64GB, ram usage went from 60-70% free down to 40-50% free over the last month. As stated, I experience freezes without cpu issues (2 sockets, 12 cores each, cpu usage hardly ever above 10%) and without memory pressure.

Last night I removed all registry settings except RefsEnableLargeWorkingSetTrim, also I had only (manually) replaced refs.sys, I also replaced the refsv1.sys driver and rebooted. Now 12 hours later it seems better.
Mgamerz
Expert
Posts: 159
Liked: 28 times
Joined: Sep 29, 2017 8:07 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Mgamerz »

Is the refsv1 driver supposed to be replaced? Some of the earlier instructions didn't mention it, not sure I was supposed to also replace that one. (I only replaced refs.sys).
Raleigh
Novice
Posts: 7
Liked: never
Joined: Jun 26, 2018 11:33 pm
Full Name: Raleigh
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Raleigh »

KFM wrote:I certainly hope so! We're on 10.0.14393.2097 and I still have problems with high CPU causing to lock the server up. I can isolate this to outside of Veeam by simply deleting a large number of large (VBK) files in Windows File Explorer. The repository is passing down the UNMAPs to the underlying storage array (DisableDeleteNotify=0). An hour (or so) after the deletes the CPU on the repository servers goes to 100% and hangs the host. Reset is the only way to recover from it.

I'm assuming this is also what people are seeing? Just want to make sure we're on the same page with this refs problem else I might have to open a support case directly with Microsoft.
I'm very new to Veeam (since late March). Yes, what you describe above is more or less what we're experiencing. During certain backup jobs, the repository server CPU will jump up to 30-60% (it bounces around), memory usage climbs to almost 50%, and the server is essentially unresponsive. It still responds to ping over the network, and if I happened to have a Remote Desktop session open to it, that screen will update, and I can move the mouse around. However, I can't do much of anything else. I can't log into the server console. I can't gracefully restart the server. When the server enters this state it is essentially "crashed" for all practical purposes. I have to hard reset the server. I have had a case open with both Veeam Support and Microsoft support for almost three months now, but there has been no resolution.

--Raleigh
opg70
Influencer
Posts: 24
Liked: 3 times
Joined: Oct 06, 2013 8:48 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by opg70 »

Yes it should be from what I read
Gostev
Chief Product Officer
Posts: 31428
Liked: 6633 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

KFM wrote:The repository is passing down the UNMAPs to the underlying storage array
Please note that ReFS does not support thin provisioning, TRIM/UNMAP, or Offloaded Data Transfer (ODX) features enabled on the underlying storage array serving as the backup target.
Mgamerz
Expert
Posts: 159
Liked: 28 times
Joined: Sep 29, 2017 8:07 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Mgamerz »

Aye, our offsite server just locked up, I assume due to this issue. I had not yet downgraded the refs driver. On the bright side now I get to learn how to use IP KVM.
DesertBlizzard
Lurker
Posts: 2
Liked: 1 time
Joined: Jun 19, 2015 5:23 pm
Full Name: Robert Downs
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by DesertBlizzard » 1 person likes this post

I can confirm a back-rev'd refs.sys-2312 and resv1.sys-2312 to 2097/2214 respectively on a fully patched Server 2016 -1607, build 14393.2339 returns the server to former glory in my tests for the fast clone process. Next up, a production run.

Memory usage was much higher than with the 2312 version of the drivers, so I will be massaging this a little.

Want to also mention that none of the keys related to ReFS have been modified from their original settings on this server.
Raleigh
Novice
Posts: 7
Liked: never
Joined: Jun 26, 2018 11:33 pm
Full Name: Raleigh
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Raleigh »

OK, my repository server just locked up again this morning. And, I just received an email update from Microsoft Support on my open ticket: "the engineer has been documenting the analysis at the moment, however the analysis and action plan are not completed yet." They've been analyzing this issue since I opened the ticket with them in mid-April. This is really getting old. I've had this ticket open with MS since mid-April, and they don't yet seem to have a clue as to what is causing the problem. Or, they know, and they're just not sharing with me...

After weeks and weeks of troubleshooting this issue on my own, I narrowed it down to a particular backup job, and then to a particular file server being backed up. With Veeam Support help, we identified the operation that caused the problem: deleting a large (~5TB) vbk file from the repository. This causes a problem only on nights when retention policy calls for a deletion of the oldest vbk chain. It's definitely not a Veeam software issue causing the problem: I can crash our repository server just by trying to delete the 5TB .vbk file manually, using Windows Explorer. It doesn't do this on every job; only on the job that involves the large .vbk file. Thus, there exists some threshold file size that causes this problem. My jobs that have 1.3 and 2.4 TB vbk files seem to run just fine. It's only the job with a 4.6 TB vbk file that causes the server to become unresponsive when retention policy calls for the deletion of that file.

Is this what others are experiencing? Backup jobs involving smaller (<4TB) .vbk files don't seem to cause the repository server to become unresponsive, while jobs with large .vbk files do.

FWIW, our backup repository server's basic specs:
Dell PowerEdge R740XD
16GB RAM
29TB (ReFS 64K) storage volume

--Raleigh
AlexL
Service Provider
Posts: 88
Liked: 5 times
Joined: Aug 24, 2010 8:55 am
Full Name: Alex
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by AlexL »

I have a feeling it is more the .vib size that is causing the trouble than the .vbk size, could that also be the case in your situation Raleigh?
jslic
Novice
Posts: 3
Liked: 4 times
Joined: Jun 20, 2016 8:30 am
Full Name: Jesper Sorensen
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by jslic » 1 person likes this post

Raleigh wrote:OK, my repository server just locked up again this morning. And, I just received an email update from Microsoft Support on my open ticket: "the engineer has been documenting the analysis at the moment, however the analysis and action plan are not completed yet." They've been analyzing this issue since I opened the ticket with them in mid-April. This is really getting old. I've had this ticket open with MS since mid-April, and they don't yet seem to have a clue as to what is causing the problem. Or, they know, and they're just not sharing with me...

After weeks and weeks of troubleshooting this issue on my own, I narrowed it down to a particular backup job, and then to a particular file server being backed up. With Veeam Support help, we identified the operation that caused the problem: deleting a large (~5TB) vbk file from the repository. This causes a problem only on nights when retention policy calls for a deletion of the oldest vbk chain. It's definitely not a Veeam software issue causing the problem: I can crash our repository server just by trying to delete the 5TB .vbk file manually, using Windows Explorer. It doesn't do this on every job; only on the job that involves the large .vbk file. Thus, there exists some threshold file size that causes this problem. My jobs that have 1.3 and 2.4 TB vbk files seem to run just fine. It's only the job with a 4.6 TB vbk file that causes the server to become unresponsive when retention policy calls for the deletion of that file.

Is this what others are experiencing? Backup jobs involving smaller (<4TB) .vbk files don't seem to cause the repository server to become unresponsive, while jobs with large .vbk files do.

FWIW, our backup repository server's basic specs:
Dell PowerEdge R740XD
16GB RAM
29TB (ReFS 64K) storage volume

--Raleigh
FWIW we had similar issues with large .vbk files (some of ours are in excess of 60+TB) and we pretty much resolved it with with refs.sys 2097 AND adding more ram to the Veeam server.
Basically the 2097 driver would eliminate the performance issues and the added ram helped with the server crashes.
Locked

Who is online

Users browsing this forum: Bing [Bot] and 69 guests