Veeam suddenly having trouble talking with VMM

BrianBuchanan · Aug 19, 2020 12:59 pm

I have opened Case #04340894 but I'm also getting desperate to get our backups running again.

All 38 our Hyper-V servers are managed by System Center Virtual Machine manager. We have 26 remote sites and our data center.

We also have two Veeam Servers (Veeam01 and Veeam02). Veeam01 is licensed by socket for our primary 4-node cluster, and Veeam02 is licensed per instance for the 2 VMs at each site.

On Monday the 17th, Veeam01 jobs started failing:

Code: Select all

8/17/2020 10:08:18 PM :: Task failed. Failed to expand object Infrastructure. Error: An existing connection was forcibly closed by the remote host

For troubleshooting I started with restarting our VMM server and VMMSQL Server and restarted Veeam01 but the problem persisted.

Then I tried going through Backup Infrastructure, Managed Servers, Microsoft Hyper-V, right-click on my VMM server, Properties and next, next, next and it gets stuck on "Resolving Hyper-V hierarchy".

What's interesting is that Veeam02 is working fine. I can go to the VMM properties and next, next, next and it resolves the VMM hierarchy in a few moments.

The logs show an error:

Code: Select all

Failed to call method from IPSService
Error    The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:10:00'. (System.ServiceModel.CommunicationException)

I think that's just when the connection is timing out.

Both Veeam servers are running the same version 10.0.0.4461 P2 on Windows Server 2019, and I installed and configured both.

Work that's been going on in the background is that our 4-node Storage Spaces Direct cluster, which are the systems licensed per socket by Veeam01, have been upgraded from Server 2016 to 2019 over the last 4 weeks. Monday was the day the last node was added back to the cluster. (our process was to evict the node, wipe and reload and rejoin the cluster). So for the last 4 weeks our cluster has been 4-node, 3-node, 4-node as each node was upgraded in turn. The cluster functional level has not yet been upgraded, nor have we updated the StoragePool, we are addressing each issue identified by the Validate cluster report before we do that. This is a Dell Storage Spaces Direct Ready Node with full support by Dell, being upgraded by a Dell partner. (They have been basically following Clean OS installation) Maybe this is the issue? Maybe Veeam is using 2019 apis but they are failing since the cluster is still at 2016 functional level?

Thanks for any insights

Post by **PetrM** » Aug 19, 2020 1:10 pm this post

Hi Brian,

I would try to avoid the attempts to guess where is the root cause, it's better to let our support engineers to work on the issue and analyze debug logs.

Both error messages above tell us about potential connectivity issues between nodes or SCVMM itself does not reply fast enough to incoming requests by some reason. Maybe it would make sense to examine network traffic dumps taken from both nodes: Veeam Server and SCVMM.

Thanks!

BrianBuchanan · Aug 20, 2020 5:07 pm

Last night the cluster functional level was upgraded as the near to last step of our 2016 to 2019 upgrade and today Veeam is working fine again. This problem correlates with the install of 2019 of the final node (clean install) and concluded with the upgrade of the cluster functional level.

R&D Forums

Veeam suddenly having trouble talking with VMM

Re: Veeam suddenly having trouble talking with VMM

Re: Veeam suddenly having trouble talking with VMM

Who is online