Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

stevenrodenburg1 · Apr 05, 2017 12:41 pm

Up front: this only happens with vCenter for Windows 6.5.0b and not 6.5.0a !! We don't have the vCenter 65 appliance in this particular environment so we don't know if the issue happens with the appliance also.
______________________________________________
Hello,

Since the upgrade of our Lab's vCenter from 6.5.0a to 6.5.0b, The start of any Backup or replication job in our Veeam BR v9.5.0.823 causes some sort of crash inside vCenter, resulting in the URL "https://vcenter01.domain.local/sdk" to stop working and returns the following error when accessed in a browser:

Code: Select all

503 Service Unavailable (Failed to connect to endpoint: [class Vmacore::Http::LocalServiceSpec:0x0000008c78e46e70] _serverNamespace = /sdk action = Allow _port = 8085)

Anyway, it makes Veeam B&R jobs crash with their own 503 errors "cannot talk to vCenter" (which is entirely logical).
It causes other platforms like vRealize Operations and LogInsight to lose connection to vCenter. The vCenter Webclients stop working also (flash based or the new HTML5 variant).

In short: Anything speaking to vCenter will lose connection to vCenter.

The problem is 100% reproducable. To kill vCenter, all I need to do is start a backup or replication job and within 2 minutes, vCenter is functionally dead.

To be clear: vCenter 6.5.0b runs perfectly stable since the upgrade as long as Veeam does not start a job. I discovered the issue because vCenter died every night since the upgrade and I correlated the times of death (always around 02:02) with the start of the nightly backups which start at 02:00.

I have not found a VMware KB article or anything else to help me discover why vCenter SDK dies (besides going through miles of logfiles). The 503 error is so common because it's a standard webserver error and just a consequence of some webserver or sub-component not working anymore.

I have no idea if only happens in our Lab or that other people have it too.
Unfortunately, our Lab runs on NFR licenses of VMware and Veeam so we are not entitled to any support (even so, VMware cannot be bothered as it's not a production system). All I can do is post this here and hope for a response.

Dinyero.Johnson · Apr 06, 2017 2:49 am

I just joined the forums to share that I'm experiencing the same issue with a client who's using the 6.5.0b appliance. Rebooted the appliance and once it was back up, the vmware-vpxd service wouldn't start, and would present a scenario with errors similar to what is described here: http://www.desertpenguin.org/blog/could ... ne-or.html

Post by **Mike Resseler** » Apr 06, 2017 5:41 am this post

Hi Dinyero, Steven,

Could you both create a support call so this can be investigated? Please post the case ID here and the follow-up

PS: @DinYero: Welcome to the forums!

stevenrodenburg1 · Apr 06, 2017 7:22 am

Hello Mike,

We run an NFR License so we don't have a Support ID. The "create new case" page does not allow the field to be empty so I'm stuck. What i'll try now is get a collegue on the phone to read out our Production env. license, simply to be able to create the case.

(As DinYero posted: it happens on the 6.5.0b appliance too).

stevenrodenburg1 · Apr 06, 2017 8:11 am

Case created: ID# 02121214

stevenrodenburg1 · Apr 06, 2017 8:49 am

Logbundles from Veeam and vCenter are being created and uploaded.
My gutt-feeling tells me Veeam is NOT at fault here.
Veeam simply does what it has always done: send API calls to vCenter. However, for some freak-reason only vCenter 6.5.0b (not 6.5.0a) goes zombie after Veeam sends "something" to vCenter causing the web-component to go belly up.

In the meantime I think it is wise to inform customers to postpone upgrades to vCenter 6.5.0b (mind the "b") as it will likely* cause their backups to stop working.

* not many people will have done the upgrade to "b" and so far, only two cases have been publically reported.

stevenrodenburg1 · Apr 06, 2017 9:28 am

Suggestion: get Anton to read this and include it in the next Digest.

Apr 06, 2017 9:42 am

Steven,

Your gut feeling might be correct, and thanks for the case. I will inform Anton but he is most likely enjoying the good life now (since he is on holiday)

stevenrodenburg1 · Apr 06, 2017 9:47 am

Well, if Anton is on R&R, why don't you take over? Call it "The word from Resseler" (ok not very imaginative i know...)

Vlaamse Fritten rules!! (i'm Dutch)

stevenrodenburg1 · Apr 06, 2017 9:50 am

By the way, I think this will be a hell of a job finding out which API call turns vCenter 6.5.0b into a vegetable. I'd send it straight to engineering. This will not be easy. Why do vCenter 6.0.x and 6.5.0a have no issues at all but 6.5.0b goes zombie when Veeam talks to it...
I bet a crate of beer it's a bug in 6.5.0b

stevenrodenburg1 · Apr 06, 2017 10:08 am

To put this into perspective: with the info's we have so far, this means that (probably) anyone who has upgraded to vCenter 6.5.0b, cannot use Veeam Backup and Replication until this is fixed.

Apr 06, 2017 10:10 am

Re: Vlaamse Fritten rules: YEP!
Re: Crate of Beer: this better be Belgian Beer

Keep us updated and let us also know what support says.

jdubyak · Post by **jdubyak** » Apr 06, 2017 1:09 pm this post

We updated to 6.5 build 5178943, hosts to 5224529 on Monday. Ever since the update we are having issues backing up our primary file server. Job kicks off, gets stuck on 'preparing guest for hot backup', VSS hangs inside the OS, and eventually times out. *sigh*

Support Case ID: 02121772

Dinyero.Johnson · Apr 06, 2017 2:12 pm

@Mike, Yes, I will get a support call created and requisite logs uploaded. Will update with case number soon.

Thanks for the welcome!

stevenrodenburg1 · Apr 06, 2017 8:27 pm

jdubyak wrote:We updated to 6.5 build 5178943, hosts to 5224529 on Monday. Ever since the update we are having issues backing up our primary file server. Job kicks off, gets stuck on 'preparing guest for hot backup', VSS hangs inside the OS, and eventually times out. *sigh*

Support Case ID: 02121772

Hello Jordan,

I don't think your issue is related to this Topic. It feels more or less like a standard "VSS not working issue". Primarily because you have that issue with one backup-client. This topic is about Veeam causing vCenter to crash which is quite different from VSS issues.
Anyway, I suggest going over the guests own eventlogs to find out what makes VSS hang inside windows if you have not already done so.

stevenrodenburg1 · Apr 06, 2017 8:28 pm

Update: my case has been escalated.

Post by **Mike Resseler** » Apr 07, 2017 5:27 am this post

@Jordan: I think the others are right and this is another issue. But that doesn't matter, that needs to be solved also so keep working with support!

@Steven: Thanks for the update. I assume it will go to R&D now for further investigation

SysAdmAcc · Apr 10, 2017 7:30 am

Just to inform that we also updated to the latest version of vCentre Applicane ( 6.5.0.5300 Build Number 5178943) and running Veeam 9.5.0.823 and have NO issues running backups.

stevenrodenburg1 · Apr 16, 2017 7:26 pm

Update: Veeam found the cause of vCenter dying in vCenter's own logs. It's a SQL statement trying to insert a duplicate key in an object.

I replied this (snippet out of a larger email) to Support just now:
---------------------------------------------------
Support wrote:
“The error was caused by a primary key violation. Most probably removing of the row in question from table VPX_VM_VIRTUAL_DEVICE will resolve the issue.”

Me:
Those “cannot insert duplicate key row in object” errors are actually quite common.
If you simply google “vcenter cannot insert duplicate key row in object” you end up with a bunch of VMware KB articles. They talk about various versions of vCenter and about a wild variety of things causing such errors. None of which apply to this particular case by the way.

These errors only ever appear when Veeam talks to vCenter. I looked at the logs from the past week and these SQL errors do not appear. But as soon as Veeam starts talking because of backup-jobs (like today), I start seeing these SQL errors and vCenter dies.

My (Steven) conclusion so far: vCenter does not have these SQL errors when Veeam does not talk to it (is not running jobs). The vpxd.logs from the past week and today clearly prove this. As soon as I fired off those jobs today, the SQL errors started appearing.
I therefore have the feeling that Veeam is causing these SQL errors somehow. It could very well be a bug in vCenter 6.5.0b (not 6.5.0a) where, for example, Veeam fires off API calls to vCenter and vCenter messes it up doing incorrect SQL statements causing it’s own death shorty after."
-------------------------------------------------

I'm now waiting for the reply.

Apr 17, 2017 8:17 am

Hi Steven,
I've seen the same errors on a newly created VCSA 6.5b and even without Veeam activity I've seen those errors happening, and ultimately crashing vcsa every hour. I ended up doing a backup and restore of the appliance, and since that I've seen no errors anymore. Also, regardless what "strange" call Veeam would do, the write operation is done by vCenter against its own database, so if there's any chance that's the root cause, actually it feels to me that VMware should fix that problem. Veeam only engages official vSphere API.

R&D Forums

Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Re: Veeam B&R causes vCenter 6.5.0b SDK webservices to crash

Who is online