Host-based backup of VMware vSphere VMs.
Post Reply
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

V9: random soap fault- vm backup failure

Post by kjstech »

Hi, running v9 since last week and today I experienced my first set of failed backups. There are a small number of VM's that completely failed backup (including the retries) with messages like this:

Code: Select all

Error: Soap fault. fault.InvalidArgument.summaryDetail: '<InvalidArgumentFault xmlns="urn:internalvim25" xsi:type="InvalidArgument"><invalidProperty>deviceKey</invalidProperty></InvalidArgumentFault>', endpoint: '' Failed to upload disk. Agent failed to process method {DataTransfer.SyncDisk}.
Any idea what this means? Thanks!
PTide
Product Manager
Posts: 6408
Liked: 724 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: V9: random soap fault- vm backup failure

Post by PTide »

Hi,

A closer look to logs is needed. Please open a case with support team and post your case ID here. Do those VMs differ somehow from others? Also, this KB is worth checking.

Thank you.
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

Re: V9: random soap fault- vm backup failure

Post by kjstech »

Thanks 01671181 created.
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

Re: V9: random soap fault- vm backup failure

Post by kjstech » 1 person likes this post

While not detailed in the daily job log summaries, an examination of full logs to support yielded this gem:

Code: Select all

[18.01.2016 20:43:42] <  2824>      >>  |--tr:Failed to enumerate changed areas of the disk using CTK. Device key: [2001], size: [1319413952512]. VM ref: [vm-5231]. Change ID: [52 e2 b4 c9 f2 dc ad bf-a6 38 24 10 b0 ff ce e9/504]
Since its referencing CTK, we decided to try that vmware powershell script from Veeam KB1113 on this machine (and the few others). Last night the job completed fine for the machine in question.

It still is odd that it began to happen out of the blue when there were no updates to vSphere or ESXi.
PTide
Product Manager
Posts: 6408
Liked: 724 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: V9: random soap fault- vm backup failure

Post by PTide »

Thank you for feedback! As it was mentioned in release notes, it's considered to be a best practice to reset CBT after new installs/updates. Please see this thread and this KB for more info.
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

Re: V9: random soap fault- vm backup failure

Post by kjstech »

It happened again tonight. Just got the failure again. Here's what the email says:

Code: Select all

Error: Soap fault. fault.InvalidArgument.summaryDetail: '<InvalidArgumentFault xmlns="urn:internalvim25" xsi:type="InvalidArgument"><invalidProperty>deviceKey</invalidProperty></InvalidArgumentFault>', endpoint: '' Failed to upload disk. Agent failed to process method {DataTransfer.SyncDisk}. Exception from server: Soap fault. fault.InvalidArgument.summaryDetail: '<InvalidArgumentFault xmlns="urn:internalvim25" xsi:type="InvalidArgument"><invalidProperty>deviceKey</invalidProperty></InvalidArgumentFault>

I will update the case tomorrow.

I don't know, do we have to reset CBT every night?
PTide
Product Manager
Posts: 6408
Liked: 724 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: V9: random soap fault- vm backup failure

Post by PTide »

I don't know, do we have to reset CBT every night?
Of course not. That's an unexpected behaviour so please keep working with support team.

Also, did the failure happened to the very same VMs? If so please check if there are any snapshots present.

Thank you.
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

V9: random soap fault- vm backup failure

Post by kjstech »

Another job had showed two more VMs that failed with this same error

Code: Select all

Error: Soap fault. fault.InvalidArgument.summaryDetail: '<InvalidArgumentFault xmlns="urn:internalvim25" xsi:type="InvalidArgument"><invalidProperty>deviceKey</invalidProperty></InvalidArgumentFault>', endpoint: '' Failed to upload disk. Agent failed to process method {DataTransfer.SyncDisk}
There's no way they had snapshots as the are very active machines and vcenter sends us alert emails when the snapshot delta disk exceeds 5gb. That only takes a few hours to achieve and we didn't get a single email about that yesterday.

I also have two newer VM's who've never been backed up. First it was that veeam couldn't connect to them, but I opened up ping and smb in their respective firewalls and tested successful access of their c$ share from the Veeam server:

Code: Select all

Error: Error code: 0x800706ba Failed to invoke func [RegisterIndexJob]: The RPC server is unavailable.. RPC function call failed. Function name: [BlobCall]. Target machine: [10.1.1.39]. RPC error:The RPC server is unavailable. Code: 1722 Error code: 0x800706ba Failed to invoke func [RegisterIndexJob]: The RPC server is unavailable.. RPC function call failed. Function name: [BlobCall]. Target machine: [10.1.1.39]. RPC error:The RPC server is unavailable. Code: 1722

I forwarded the job summary emails to support and will update the logs when I get in to work.
PTide
Product Manager
Posts: 6408
Liked: 724 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: V9: random soap fault- vm backup failure

Post by PTide »

Should you feel like you're not satisfied with a service you get from tech team please use the "Talk to Manager" button to escalate the case. Looking forward to hear good news from you.

Thank you.
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

Re: V9: random soap fault- vm backup failure

Post by kjstech »

Thank you, all the appropriate logs were updated to the case. I generated 16 days worth of logs so they can go back and look at a time when the backups just worked. I have three backup jobs. In job 01 4 VM's failed out of 34. In job 02 3 VM's failed out of 13, but in the retry operation 2 of those failures were successful. What's weird is the retry operation indicated the 3 failures were all the same soap error urn:internalvim25. But two were able to recover and one wasn't. Then in job 03 that soap error happen on 2 out of 11 VM's, however during the first retry operation they were able to be backed up successfully. If CBT was messed up, I doubt a retry operation would have been able to make a difference. However I reset CBT Monday anyway, all backups on Monday evening worked, then its Tuesday into Wednesday where we are seeing the failures creep up again.
davecla
Enthusiast
Posts: 26
Liked: 4 times
Joined: Feb 03, 2016 9:40 pm
Full Name: Dave Clarke
Contact:

Re: V9: random soap fault- vm backup failure

Post by davecla »

Did you get a fix for this?

I have the same error.
Since upgrading to v9 I get SOAP auth errors.

In my case I have a job with 10 servers in it. Some servers will fail in each run. The servers that fail change in each run.
I'm also seeing many, many more CBT errors since upgrading to v9.

I have a call logged with veeam support, but my experience to date with veeam support is "sub optimal". I'm sure my TZ doesn't help, but still....
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: V9: random soap fault- vm backup failure

Post by foggy »

As far as I can see from the OP's case, it is still under investigation. I recommend to work with support within your case, should you be not satisfied with how it is going, you can always use the Talk to Manager button at the support portal.
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

Re: V9: random soap fault- vm backup failure

Post by kjstech »

So far I have been getting successful backups if Veeam's parallel processing is unchecked. In Veeam backup and replication, click the top left icon (sometimes referred to as the hamburger button). Click Options and then uncheck "Enable parallel processing".

Now its only been two days for me, but since then all the backups were running great. I only have two warnings on SQL servers, which is being looked at. But no more SOAP errors. I thought we would really take a performance hit from disabling parallel processing, however that's not the case. Our three nightly backup jobs are still completing well within our backup window. I'll be interested how long this weekend's synthetic full jobs will take. However we are using a Veeam accelerated data mover in an Exagrid appliance that handles a lot of that synthetic full data in itself.
Alex
Novice
Posts: 4
Liked: never
Joined: Jun 23, 2010 11:12 am
Full Name: Alex Bos
Contact:

Re: V9: random soap fault- vm backup failure

Post by Alex »

Hi,

I have the same error(I think):

Code: Select all

02/08/2016 10:32:25 PM :: Error: Soap fault. A specified parameter was not correct. 
startOffsetDetail: '<InvalidArgumentFault xmlns="urn:internalvim25" xsi:type="InvalidArgument"><invalidProperty>startOffset</invalidProperty></InvalidArgumentFault>', endpoint: ''
Failed to upload disk.
Agent failed to process method {DataTransfer.SyncDisk}
This started to happen when upgraded to V9

Will try the Parallel setting to Disable, just to get working backups again.

It only happens with Incremental backups, the full backups don't have that problem.
Alex
Novice
Posts: 4
Liked: never
Joined: Jun 23, 2010 11:12 am
Full Name: Alex Bos
Contact:

Re: V9: random soap fault- vm backup failure

Post by Alex »

Disabling Paralell is not the solution, just tried it. Also tried https://www.veeam.com/kb1113 , but it din't help.

it seems to only happen with the Large Disks(2+ TB) in a VM.
PTide
Product Manager
Posts: 6408
Liked: 724 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: V9: random soap fault- vm backup failure

Post by PTide »

Hi Alex,

Kindly open a case with our support team and post your case ID here.

Thank you.
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

Re: V9: random soap fault- vm backup failure

Post by kjstech » 2 people like this post

Ok the next thing support had me test is to re-enable parallel processing, but on one job choose a specific Veeam proxy. Then in the backup infrastructure on that veeam proxy specify the network transport mode.

So I did that as a test and last night this job completed sucessfully (about 45 minutes FASTER than usual too).

This got me thinking... We are an NFS shop here, why not try the new direct storage access? Both of my Veeam proxies have a second nic on that storage network and can ping our EMC VNX5200 NFS interfaces. So I went into our EMC VNX5200 storage array configuration and gave both Veeam proxies IP addresses as full access to our NFS file systems. I am testing a job right now and its sucessfully backedup 11 VM's in 15 minutes 17 seconds. Now the backup already ran a few hours ago so there was only 12.8 GB of changed data out of the 745.9 GB processed and 28.5 GB read.

Tonight I will be excited to see how NFS direct access handles the jobs, and also correlate if our alert-bot text message does not go out about our website not being accessible. See with NFS and hotadd, if a machine is not on the same ESXi host as the Veeam proxy (and it won't always be since we have 2 proxies but 6 hosts), when the hotadd disk is released it stuns the VM and it looses pings / nework connectivity. So I'm hoping that this mode will help according to https://www.veeam.com/kb1681

I know that is deriving a little off topic, however maybe you can try the different transport modes. For me network worked, and nfs appeared to work as well on one job I tested it with.
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

Re: V9: random soap fault- vm backup failure

Post by kjstech » 2 people like this post

Wow Direct NFS access is great. We didn't get any text message notifications of machines loosing ping during the backup window. We also halved our backup window. 10 gig port now is showing close to 10 gig going to the exagrid appliance. Previously it was a little less than half that.

The only issue is that both NFS and NBD failover could not access a VM on the same NFS datastore of other successful backups. The path would be vnxfs1\SolarWinds Log & Event Manager\SolarWinds Log +Jg-Event Manager.vmdk.

Logs sent on support ticket.

No CBT or SOAP errors at all in this transport mode.
davecla
Enthusiast
Posts: 26
Liked: 4 times
Joined: Feb 03, 2016 9:40 pm
Full Name: Dave Clarke
Contact:

Re: V9: random soap fault- vm backup failure

Post by davecla »

So in my case the SOAP Auth errors just stopped after about a week.
As far as I can tell nothing changed in the ESX or Veeam environments over that time which could have impacted on the backup process.

Strange...
Gostev
Chief Product Officer
Posts: 31457
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: V9: random soap fault- vm backup failure

Post by Gostev »

These issues are currently suspected to be caused by intermittent failures in vCenter SSL certificate validation process, so the root cause of the issue is likely outside of our code. We're currently trying to confirm that in one of the affected environments by running a temp hot fix that disables certificate validation completely. I asked devs to keep me posted on their findings.
kjstech
Expert
Posts: 160
Liked: 16 times
Joined: Jan 17, 2014 4:12 pm
Full Name: Keith S
Contact:

Re: V9: random soap fault- vm backup failure

Post by kjstech » 2 people like this post

Thanks Gostev,

We were able to close the case today. Since forcing transport to Direct Storage we haven't had a single SOAP error.
Were an NFS shop so we've also reaped huge benefits from not having disruptive VM STUN times during backup completion due to the hot add transport mode. We've also halved our backup window as the Direct Storage transport has proven to be almost twice as fast in throughput.

We had one issue with a VM having an ampersand (&) in the name, but the fix was to rename it in vSphere, then vmotion it to another filesystem. Storage vMotion takes care of renaming all the file paths. In Veeam we moved this particular VM to another job that is tied to that filesystem and it was sucessful.

So in our case changing the transport mode worked. Initially disabling parallel processing helped, but support had us do some tests with it re-enabled but using a different transport method. I came across the support for NFS direct transport in V9 and gave it a shot. THANK YOU!!!

For reference we are on the following vmware builds
ESXi 5.0.0, 2312428
vCenter Server 5.0.0, 2656067

Storage is NFS on an EMC VNX5200.
Backup is to an Exagrid appliance using the Veeam Accelerated Data Mover.
Post Reply

Who is online

Users browsing this forum: Gostev and 81 guests