Host-based backup of VMware vSphere VMs.
Post Reply
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

I have an open case, #05226147.

Since upgrading the Veeam server to Server 2022 and Veeam B&R to 11, and upgrading our local vSphere backup proxy to Server 2022 as well, we have been having backup jobs fail. Any job that needs to transfer data via the Server 2022 proxy to our target, a Synology DS2414rp+, ceases to transfer data after 5-20 minutes and subsequently times out the SMB connection and fails. Switching to a Server 2019 host for the vSphere backup proxy allows the backup to succeed. If the timeout period is increased on the Server 2022 backup proxy, the job simply takes longer to fail after dropping to 0KB/s. NetUseShareAccess has not changed this result.

We believe this is an issue with either the Veeam Transport component and Server 2022, so some interaction between Server 2022 and the Synology, however I can find no similar issues reported. I have had several exchanges with support in which I provided them with logs, however this has thus far proven fruitless.

I thought it might be worthwhile to ask if anyone else has experienced a similar issue or, conversely, if anyone else is using a Server 2022 vSphere backup proxy and *not* experiencing any issues?
Mildur
Product Manager
Posts: 9848
Liked: 2607 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Mildur »

Hi Tanner

I‘m using Linux proxies, so I can not test the behavior in my environment right now.

One question about the veeam version.
Are you on V11 or V11a?
Server 2022 is supported since V11a.
Product Management Analyst @ Veeam Software
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

Build: 11.0.1.1261 P20211211, which I'm fairly confident is "11a".
Mildur
Product Manager
Posts: 9848
Liked: 2607 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Mildur »

Yes, version looks good 👍
Product Management Analyst @ Veeam Software
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Gostev »

Do you have another Server 2022 machine to try as a backup proxy, even if a VM? This will help ruling out a hardware or network connectivity issues with that specific Server 2022, which I suspect is the problem here. Because if there were any real issues with Veeam Transport component on Server 2022, then we would've heard about them from many other customers by now. I mean, Server 2022 has been around for a while.

And the bigger question is, if you have Windows Server 2022 available for use in your backup infrastructure, then why do you backup to an SMB share in the first place? When you could get performance and space efficiencies of ReFS block cloning, while avoiding all the reliability and data integrity issues of the SMB stack altogether.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

Yeah, I kinda figured that'd be the case with Server 2022 and Veeam Transport but I'm running out of ideas so I thought I'd ask. The current proxy is already a VM, basically cloned directly from the template with no configuration changes at all except the addition of the Veeam Transport component. I will attempt to use a different 2022 Server and see if anything changes.

We don't use ReFS and a Windows server because we do not have in our possession a server chassis that can support the 12 8TB disks we use in the Synology and, as I've unfortunately had to mention several times over the years on this forum in answer to similar questions, I do not have the luxury of a budget to replace hardware that is still doing its job just fine. Besides which, isn't dedupe kinda exactly the opposite of backup?

Anyways, this configuration was working fine with Veeam 10 and a 2019 proxy, and now it isn't. If other people using Veeam 11a with Server 2022 backup proxies haven't reported any issues, then I'm inclined to believe the change in interaction either has to do with 2022+Synology or Veeam Transport in 11a+Synology. Since a 2019 proxy hasn't exhibited the issue (yet), I'm leaning towards the former, but I'll do some test to narrow down further.
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Gostev »

Honestly, I would never be so sure about "doing just fine" part when it comes to low-end NAS. Our support statistics show that these device are extremely prone to silent data corruption/loss, and sadly most folks only find out when they need to perform a restore.

Anyway, you don't need to replace your hardware to use ReFS backup repository. You just create a LUN on your existing Synology NAS and mount it to the Windows Server via iSCSI. However, I do recommend you get to the bottom of what's happening with your Server 2022 before making any changes.

No, deduplication is not the opposite of backup: it's completely perpendicular to backup. For example, many primary storage arrays also do inline deduplication these days. But in any case, ReFS repository does NOT do deduplication. Rather, it prevents duplication from happening. This is best compared to incremental backup: theoretically you could have each backup run create a full backup file, but instead I'm sure you're doing incremental backups and these are only capturing changed blocks (without also writing all unchanged blocks along again and again). You don't call incremental backup deduplication, right? And ReFS just takes it further allowing to do the same not only within a single full backup chain (full + dependent increments), but also across multiple full backup files of the same machine.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

If I use the same storage hardware that you claim is 'extremely prone to silent data corruption', I'm not sure there's any gain from using ReFS over iSCSI.

Backup is about having redundant copies, so despite your reasoning I am still not sure it is wise to intentionally eliminate redundant data when performing backups. Regardless, it is not relevant to the issue at hand.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

I have just updated my case with another job that is failing even with the Server 2019 proxy, leading me to suspect some interaction between 11a Transport and Synology's SMB implementation (which I'm pretty sure is just SAMBA). I will see if any changes in the Synology's SMB configuration alleviate the issue, which will hopefully help narrow down where the problem is.
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Gostev »

Redundancy is good and is moreover required according to the 3-2-1 backup rule. You always want to have an independent copy of your backups on a different storage! It is redundancy within the same storage device that is questionable, because it always goes on top of another redundancy already provided by RAID. So you're just stacking redundancies on top of a single point of failure. I consider this is unwise economically comparing to spending the same resources to have 3 copies of your backups for example (which is not an uncommon practice by the way). I personally always do two backup copies for important stuff, each on its own media.

Please note that our Transport does not talk directly to Synology in principle, instead we leverage Windows SMB and networking stacks. Although the error you're getting looks to be a networking error, as opposed to SMB stack specific. It's the OS failing to do some basic network operation and thus returning this error to Transport.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

I'm not convinced it is the OS. For one, this was not a problem prior to upgrading to Veeam B&R 11a. The only cause to doubt that is the roughly simultaneous upgrade of the proxy to Server 2022, but issue also occurs with a 2019 proxy. My records indicate the proxy used to run on Server 2102R2 so I will be trying that next.

I have performed some more tests in the mean time. The most curious result of which is that by forcing SMB2 w/o Large MTU support on the Synology the job does not fail, but instead continues with absurdly long gaps of 0KB/s. I am unsure why this setting doesn't trigger the same timeout as SMB3 seems to, but I think it highlights that these long drops to 0KB/s are the real issue.

I have yet to hear back from support, probably because the Veeam and Windows logs are as useless to their eyes as they are to mine on determining the real issue.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

The job still fails using Server 2012R2 as a proxy. Since the job seems to drop to 0KB/s and time out, and this only started happening with Veeam 11, I thought it might be related to "UseUnbufferedAccess" and/or "DisableHtAsyncIo" but neither seemed to have any effect. I have updated support with this information as well as a SAMBA debug log from the synology. Haven't heard from them since sending them the logs previously so I expect they still have nothing for me.

At this rate, I feel I will end up reverting to Veeam 10.
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Gostev »

Up to you to do this of course. Just keep in mind that V11 was released almost a year ago and you're the first to have such an issue after 475K downloads. So chances that it's a V11 bug are extremely slim. Although even immediately following a major release, 9 out of 10 support cases of "I upgraded Veeam and it broke" type appear to be cause by an unrelated issue. There's almost always some other change involved that people are forgetting, or are simple unaware of...
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

Well as I mentioned I also changed Windows Server versions, however I've used multiple older editions of Windows Server as proxies to no avail. I couldn't care less which part of the infrastructure is at fault as long as I am able to determine what actually is at fault. I'm not going to go changing my entire backup infrastructure on the off chance that running the whole thing on iSCSI and ReFS will work. So far, the only obvious things I haven't eliminated are Veeam 11 and the repository itself. Since the issue only seems to occur with large backups, testing the latter is going to be difficult since I have to find alternative storage that is deep enough and isn't needlessly replicated or backed up itself. I'm not sure if reverting to Veeam B&R 10 is going to be easier or not, since it might require a total rebuild if I can't migrate the database backwards.

Support has scheduled a call today, we'll see if they are able to help troubleshoot any more effectively on the phone than they have in email.
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Gostev »

Yeah, that's always the best way to go.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

During my discussion with Eduardo on the phone, it was pointed out that there are actually more servers involved in the process than I thought. I was under the impression that there would never be a server between the proxy and the repository, but there is the possibility that the *gateway* server may not be the same as the *proxy* server. Given this, I believe all the switching of proxy servers to rule out an issue there were in vein. The Server 2022 Proxy, having the most resources available to it, was probably being consistently chosen as the gateway regardless.

When I set the repository to force the Server 2022 proxy as the gateway, the job failed again. When I set it to force the B&R Server (also Server 2022) it succeeded. So it seems that whatever is wrong is limited to the proxy server after all, though I haven't yet nailed down precisely what is wrong.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

Curioser and curiouser... New results suggest that the proxy and the gateway cannot be the same system. Since backups were able to succeed with the B&R server forced as the gateway, I also set it up as the vSphere proxy and disabled the other proxy. Unfortunately this resulted in one of the problem jobs failing in the same way as before. Reverting that change so that the B&R server would always be the gateway but the Server 2022 proxy would always be the proxy seems to allowed the backup jobs to complete again. It is not clear to me why the gateway and proxy being the same server causes the jobs to fail.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

The issue continues. The backups consistently finish as long as I ensure that the gateway and proxy are different servers, but I am unaware of any documented reason a single server cannot or should not hold both roles. The last message from support asked me to increase the RAM allocation for the proxy to match recommendations, which I did to no avail. I am willing to perform more troubleshooting and diagnostics to suss out the underlying cause of the problem, but none have yet been suggested.
PetrM
Veeam Software
Posts: 3626
Liked: 608 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by PetrM »

Hi Tanner,

There is an option to request an escalation if you think that deeper technical analysis is needed, please refer to this KB. Please allow me to speak to our support team in order to escalate your case to a higher level for more precise troubleshooting.

Thanks!
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

Still working with support on this issue, however a new issue has cropped up: Veeam Agent jobs have begun failing in the same way as vSphere jobs but the same workaround doesn't alleviate the problem. Today, they even had a *new* error: ‘Agent failed to process method {ReFs.SetFileIntegrity}.‘, which led me to this forum thread: microsoft-hyper-v-f25/error-incorrect-f ... 68754.html.

I'll keep working with support, but honestly I'm not particularly impressed by Veeam B&R lately and since we do not rely on it nearly as much as we once did I'm beginning to wonder if I shouldn't be looking for a replacement.
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Gostev » 1 person likes this post

Honestly, if I were you, I would rather replace your backup storage. We always recommended against using low-end NAS as backup targets from reliability considerations specifically. Remember we have over 1 million of active Veeam Backup & Replication installations and thus a huge amount of support statistics. Which clearly shows that customers using low-end NAS as backup targets are by a few orders of magnitude more likely to experience issues including data loss/corruptions leading to failed recovery. And when this is further normalized by the amount of data protected, the picture becomes really terrible as larger customers are obviously not using similar crappy hardware and yet have zero storage-related issues while protecting PBs of data.

Also, I think it is simply unfair for to say you're not impressed by *Veeam* when ALL issues you're experiencing are caused by your backup storage device behaving unreliably. You basically gave a bus driver a broken bus to drive, and now telling him "honestly I'm not particularly impressed lately". By all means, feel free to replace a bus driver and see it that helps.

If you do want Veeam to work reliably though, then start from letting us completely own the data, without having us to send backups into some other vendor's unreliable software stack. This is exactly why we've been recommending general-purpose servers with internal or direct-attached storage for backup repositories all along. This is always the best choice until you have the money for proper enterprise-grade storage.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

That advice is perfectly fair, except that I'm not having a data loss or recovery issue. It's possible that this storage target has been working fine for years and has only, coincidentally, started failing in some mysterious way after upgrading Veeam and its underlying OS, but can't you see why I'm skeptical of that?
Also, I think it is simply unfair for to say you're not impressed by *Veeam* when ALL issues you're experiencing are caused by your backup storage device behaving unreliably.
I am not convinced this is the case. Try to understand where I'm coming from here:
  • The problems began after upgrading to Veeam B&R 11 and Server 2022 on the Veeam servers.
  • The storage has not changed, yet suddenly backups are having trouble completing.
  • Fiddling around with where the proxy and gateway roles are hosted has reliably worked around the issue.
Can you honestly tell me that these symptoms suggest a problem with the storage target? Again, I'm not saying it is impossible, but given the above I'll need to be convinced and support hasn't been able to do that. You're asking me to change my backup infrastructure on the off-chance it will work because support is unable to demonstrate that my current setup is actually the problem.

Altering that registry key, as well as pinning both gateway and proxy roles for Agents to one server (oddly the opposite of what works for vSphere), seems to have allowed the Agent backups to complete. If the storage target was the problem here, wouldn't you expect that this wouldn't have changed anything?
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Gostev »

cbc-tgschultz wrote: Feb 04, 2022 4:25 pmThat advice is perfectly fair, except that I'm not having a data loss or recovery issue. It's possible that this storage target has been working fine for years and has only, coincidentally, started failing in some mysterious way after upgrading Veeam and its underlying OS, but can't you see why I'm skeptical of that?
Being a Synology user myself at home, I actually can't see why would I personally be skeptical of that. It's not like Synology does not have its own firmware updates all the time, fixing some things while breaking others. And it's not like it did not have compatibility issues with newly released platform versions before.
cbc-tgschultz wrote: Feb 04, 2022 4:25 pmCan you honestly tell me that these symptoms suggest a problem with the storage target?
Yes, of course I can. This is based on the fact that you're the only customer out of literally thousands who is having such issues with Server 2022, and that your backup target is from the NAS vendor who is responsible for the largest number of storage-related support cases we get.
cbc-tgschultz wrote: Feb 04, 2022 4:25 pmAltering that registry key, as well as pinning both gateway and proxy roles for Agents to one server (oddly the opposite of what works for vSphere), seems to have allowed the Agent backups to complete. If the storage target was the problem here, wouldn't you expect that this wouldn't have changed anything?
Why would implementing this not change anything, if for example the registry key was specifically added to workaround the particular quirk of SMB stack implementation of low-end NAS devices? Without going into much details, this has to deal with their mishandling of block cloning capability related calls, which are a part of the SMB specification.
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz »

Being a Synology user myself at home, I actually can't see why would I personally be skeptical of that. It's not like Synology does not have its own firmware updates all the time, fixing somethings and breaking others. And it's not like it did not have compatibility issues with newly released platform version before.
I'll grant you that, but what did they break? Why has it apparently only affected Veeam? Why is it mysteriously worked around by altering where the proxy and gateway roles are hosted?
Without going into much details, this has to deal of their mishandling block cloning capability related calls, which are the part of SMB specification.
Now we're getting somewhere. It would have been nice if someone had mentioned that in the thread about the issue so I'd know what was going on when I flipped some obscure registry flag. That's all I'm asking for here: if Synology is screwing up the SMB protocol then I will gladly blame the Synology, just give me a way to prove it to myself before I make radical alterations to my environment, please.
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Gostev »

I'm sure support will give you something eventually. Just be prepared that it may take a while as at this point they're troubleshooting your environment. So if you really just want to prove it to yourself that the issue is not specific to your NAS, that's actually super easy to do by pointing a test job to ANY other backup target. Just pick any server you have with enough free disk space and make it a repository. Per your description above, you should not need much free space if takes just 5 minutes for the data transport to cease. Even a basic external SSD would do for this test... I actually just bought 1TB external NVMe SSD for like 100 bucks and it's been perfect for tests. I get over 800MB/s real write speed connecting over USB 3.2 8)
cbc-tgschultz
Enthusiast
Posts: 65
Liked: 11 times
Joined: May 13, 2016 1:48 pm
Full Name: Tanner Schultz
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by cbc-tgschultz » 1 person likes this post

Support was able to demonstrate that handling of the SMB protocol by SAMBA on the Synology was the cause of the failures. Specifically, SMB2_FIND_ID_BOTH_DIRECTORY_INFO with “*” pattern will not return files recently written. This SAMBA behavior seems to have been known about since at least 2017, so it is still a mystery as to why it is only now causing us problems, but I am satisfied with this explanation. However, I find the inability and apparent disinterest in explaining why the issue only occurs under weirdly specific circumstances (Agent jobs: gateway and proxy differ, VMWare jobs: gateway and proxy are the same) concerning. Regardless, I am willing to move on from this and will investigate switching everything over to iSCSI whenever I manage to find the time. For now the workaround holds.
Gostev
Chief Product Officer
Posts: 31816
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2022 as a vSphere Backup Proxy, SMB time outs

Post by Gostev » 1 person likes this post

cbc-tgschultz wrote: Feb 15, 2022 7:46 pmSupport was able to demonstrate that handling of the SMB protocol by SAMBA on the Synology was the cause of the failures.
Told ya! Thanks for coming back to update the topic with the root cause.
Post Reply

Who is online

Users browsing this forum: No registered users and 28 guests