Since the upgrade of our Lab's vCenter from 6.5.0a to 6.5.0b, The start of any Backup or replication job in our Veeam BR v126.96.36.1993 causes some sort of crash inside vCenter, resulting in the URL "https://vcenter01.domain.local/sdk" to stop working and returns the following error when accessed in a browser:
- Code: Select all
503 Service Unavailable (Failed to connect to endpoint: [class Vmacore::Http::LocalServiceSpec:0x0000008c78e46e70] _serverNamespace = /sdk action = Allow _port = 8085)
Anyway, it makes Veeam B&R jobs crash with their own 503 errors "cannot talk to vCenter" (which is entirely logical).
It causes other platforms like vRealize Operations and LogInsight to lose connection to vCenter. The vCenter Webclients stop working also (flash based or the new HTML5 variant).
In short: Anything speaking to vCenter will lose connection to vCenter.
The problem is 100% reproducable. To kill vCenter, all I need to do is start a backup or replication job and within 2 minutes, vCenter is functionally dead.
To be clear: vCenter 6.5.0b runs perfectly stable since the upgrade as long as Veeam does not start a job. I discovered the issue because vCenter died every night since the upgrade and I correlated the times of death (always around 02:02) with the start of the nightly backups which start at 02:00.
I have not found a VMware KB article or anything else to help me discover why vCenter SDK dies (besides going through miles of logfiles). The 503 error is so common because it's a standard webserver error and just a consequence of some webserver or sub-component not working anymore.
I have no idea if only happens in our Lab or that other people have it too.
Unfortunately, our Lab runs on NFR licenses of VMware and Veeam so we are not entitled to any support (even so, VMware cannot be bothered as it's not a production system). All I can do is post this here and hope for a response.