All 38 our Hyper-V servers are managed by System Center Virtual Machine manager. We have 26 remote sites and our data center.
We also have two Veeam Servers (Veeam01 and Veeam02). Veeam01 is licensed by socket for our primary 4-node cluster, and Veeam02 is licensed per instance for the 2 VMs at each site.
On Monday the 17th, Veeam01 jobs started failing:
Code: Select all
8/17/2020 10:08:18 PM :: Task failed. Failed to expand object Infrastructure. Error: An existing connection was forcibly closed by the remote host
For troubleshooting I started with restarting our VMM server and VMMSQL Server and restarted Veeam01 but the problem persisted.
Then I tried going through Backup Infrastructure, Managed Servers, Microsoft Hyper-V, right-click on my VMM server, Properties and next, next, next and it gets stuck on "Resolving Hyper-V hierarchy".
What's interesting is that Veeam02 is working fine. I can go to the VMM properties and next, next, next and it resolves the VMM hierarchy in a few moments.
The logs show an error:
Code: Select all
Failed to call method from IPSService
Error The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:10:00'. (System.ServiceModel.CommunicationException)
Both Veeam servers are running the same version 10.0.0.4461 P2 on Windows Server 2019, and I installed and configured both.
Work that's been going on in the background is that our 4-node Storage Spaces Direct cluster, which are the systems licensed per socket by Veeam01, have been upgraded from Server 2016 to 2019 over the last 4 weeks. Monday was the day the last node was added back to the cluster. (our process was to evict the node, wipe and reload and rejoin the cluster). So for the last 4 weeks our cluster has been 4-node, 3-node, 4-node as each node was upgraded in turn. The cluster functional level has not yet been upgraded, nor have we updated the StoragePool, we are addressing each issue identified by the Validate cluster report before we do that. This is a Dell Storage Spaces Direct Ready Node with full support by Dell, being upgraded by a Dell partner. (They have been basically following Clean OS installation) Maybe this is the issue? Maybe Veeam is using 2019 apis but they are failing since the cluster is still at 2016 functional level?
Thanks for any insights