Repeated VDDK error 1 and 2

lux209 · Post by **lux209** » Apr 28, 2016 10:13 am this post

Hello,

apparently the host reboot fully solved the problem for me. At least it did not came back since I did it 4 days ago..

campus-torsten · May 12, 2016 10:33 am

Any updates concerning the root cause or a bug fix for this?
Kr,
Torsten

chris_lalala · Post by **chris_lalala** » May 27, 2016 6:03 am this post

Same issuse here. vSphere 5.5, Veeam V9.

Only Workarround is moving the VM or Datastore or reboot the Host.

Ticket #01794584

picaroon · Post by **picaroon** » Jun 01, 2016 11:52 am this post

A case with VMware SDK has been created. I sent them the exported logs from an affected host with 'triva' mode enabled and I'm in the progress of setting up the VixDiskLib sample program.

chris.collins · Post by **chris.collins** » Jun 02, 2016 3:33 pm this post

We had the same issue - restarting the management agent on the host where the VM was housed resolved the issue (this only causes the host to be disconnected from vCenter for a minute or two but doesn't cause any downtime to the other VMs)

Backups have been fine since (2 days and counting)

Ice-Dog · Jun 06, 2016 8:12 am

Hi all,

OP here.

This has happened to me about 4-5 times since orginally posting. I have narrowed it down to 4 of my 14 hosts that this happens to repeatedly.

I contact my account manager at Veeam to put some weight on this - but the support answers where still poor and not escalated to a the top tier.. "try using another proxy", "try using hot-add". When I am using new transport servers and using hot-add and NFS v3 causes up to 60 sec stun time.

I dragged VMware on a call and the support guy form them was at first very adamant that this could never be a VMware issue if Veeam is able to contact the API to issue snapshot commands and rejected taking any look at my vSphere settings. However I sent them logs from the latest hosts that failed and vCenter with accurate details of the times when the backups error-ed.

They VMware actually came back to me with a possible solution. They see nothing being is being logged about the backups as the backups API is not reaching the ESXi. They think that maybe during a patch that the VDDK may not have installed correctly. Meaning the API calls are not being received.

To attempt to fix this can I will update the host to the latest build. By doing this it will install a fresh VDDK that should allow the API calls in.

I will update once that is completed and report 3 months down the line if the issue will have cropped up again.

Regards,
Ice-Dog

StefanSpecht · Post by **StefanSpecht** » Jun 07, 2016 9:19 am this post

Same here.
Some of our VMWare hosts randomly get this issue. Restarting the management agents is fixing it temporarly, but this happens every few days.
We are running Veeam 8 latest update and ESXi 5.5.0 3568722. Only sites that use network mode are affected and espacially those with a lot of VMs on it (= a lot of network traffic)
I suspect this to be a bug somewhere in the ESXi network stack (full buffers etc?) and will open a VMWare support case in the next days.

pidthepiper · Post by **pidthepiper** » Jun 07, 2016 10:36 am this post

Ok I have this issue now too.

ESXi 5.5u3b EP10 3568722 and Veeam 8 Update 3

Its nbd related too

Support Case[ID# 01820005]

pidthepiper · Post by **pidthepiper** » Jun 07, 2016 11:52 am this post

esxcfg-advcfg -g /BufferCache/MaxCapacity
Value of MaxCapacity is 16384
~ # esxcfg-advcfg -g /BufferCache/FlushInterval
Value of FlushInterval is 30000

Are the hosts current settings, which I assume are defaults

mgratla · Post by **mgratla** » Jun 12, 2016 12:46 pm this post

Adding my voice to this one, we've got the same issue. I don't have a case with Veeam as of yet. We're on the latest build of 5.5.

Interestingly if we leave it go on for long enough the host itself eventually restarts it's management adapters.

rayzhang · Post by **rayzhang** » Jun 13, 2016 1:35 am this post

Thanks for your discussions, Now I sloved the same problem like that. I found the vms of veeam failure in a same host ，so I thought maybe is the host problem . Because my hosts had HA，so I migrated the veeam vms to the other host ,and then veeam is backup successfully . I don't know why .Maybe the host need to reboot . I will choose a good time to test if it done .

If your host in HA, you can try to slove it by migration too

pidthepiper · Post by **pidthepiper** » Jun 13, 2016 1:52 pm this post

mgratla wrote:
Interestingly if we leave it go on for long enough the host itself eventually restarts it's management adapters.

Could you explain this a bit more?

pidthepiper · Post by **pidthepiper** » Jun 13, 2016 1:52 pm this post

Also any update on this from Veeam, i have a ticket open with you as its happened to 2 hosts currently and both you and VMware arent really helping

mgratla · Post by **mgratla** » Jun 13, 2016 4:34 pm this post

pidthepiper wrote:
Could you explain this a bit more?

Absolutely.

If left unchecked, after about 48 hours of receiving VDDK failures consistently through Veeam (We backup almost 400 machines and they all retry three times) - We see the host loose connection in vCenter momentarily and come back up. From what I've been able to tell, the host self restarted it's vpxa and vpxd processes. After this happens the jobs run fine for a while

pidthepiper · Post by **pidthepiper** » Jun 13, 2016 4:48 pm this post

Ahhh now that is interesting, so eventually the host sorts itself out, but it takes a few days heh. There has to be something that triggers, but neither Veeam or VMware seem to know. Both Support teams are acting like it s the first time they have heard of this issue, even though I point them to this thread, showing peoples support case numbers

freedor · Jun 17, 2016 10:20 am

Hi guys,

we had the same problem with Veeam B&R 9 and vSphere 6. You don't need to restart your host. Just restarting the management services works.

run: services.sh restart

Hope this help.

pidthepiper · Post by **pidthepiper** » Jun 17, 2016 3:28 pm this post

freedor wrote:Hi guys,

we had the same problem with Veeam B&R 9 and vSphere 6. You don't need to restart your host. Just restarting the management services works.

run: services.sh restart

Hope this help.

yep or just restart the management agents or vMotion the impacted VMs to another host. The problem is the issue reoccurs and you have to do it again.

VMware have asked me to tell Veeam support to contact VMwarae SDK support directly

mgratla · Post by **mgratla** » Jun 20, 2016 1:32 pm this post

We ran into this issue this weekend again. This time it was a different host.

Just as before,. plenty of jobs failed with VDDK error:1 repeatedly, and as before after about two days of this the host disconnected from vCenter and came back within seconds at about 3:47am this morning. As soon as it did that (Which is where I'm suspecting the host eventually just crashed out and restarted it's management processes) backups of machines on that box ran perfectly fine, and without issue.

Vmware has been updated but unfortunately the person I've got working the case is not being particularly great. I was able to get logs this time at least.

Our veeam case for this is 01824972

pidthepiper · Post by **pidthepiper** » Jun 20, 2016 4:52 pm this post

I had the same problem, they have now asked for Veeam to open a SDk request or something with them to work on it further. Veeam escalated it up to their 2nd level support, but I havent heard anything yet

m0ps · Post by **m0ps** » Jun 21, 2016 9:50 am this post

So root of case is still not clarified?

mgratla · Post by **mgratla** » Jun 22, 2016 1:59 pm this post

Just started back up again for us since the last time. First couple of VDDK errors happened this morning - So it looks like we get 1-2 days of stability and then we're back in the thick of it

pidthepiper · Post by **pidthepiper** » Jun 22, 2016 2:05 pm this post

m0ps wrote:So root of case is still not clarified?

Nope nothing at all yet

pidthepiper · Post by **pidthepiper** » Jun 23, 2016 5:58 pm this post

I had the same thing happen again today with 2 vms, on 2 different hosts.

I had a webex with level 1 support, he asked me to svmotion one of the vms and it passed and then the job ran fine, he agreed it needed to go to level 2 support. I provided him with every log I could get, and now I wait. I restarted the management agents on the other host and the vm on there backed up fine.

I kept getting the impression that this was a new issue for them, even though all of us on here have been experiencing it.

PLCHOFFIN · Jun 30, 2016 12:01 pm

Hello everyone,

Same issue here. Restarted the management services on my impacted ESX.
Worked like a charmed

picaroon · Jun 30, 2016 8:00 pm

Veeam advised me to decrease the buffersize of Veeam B&R in the registry to 1MB, but the issue persist.

Post by **mwirz** » Jul 04, 2016 7:46 am this post

Can you please post your case id? i'm going to open a ticket as well and would like to reference on it.

i got several host/vms per week with the same problem. most time i can fix by restarting services. but sometime it ends up with a lots of snapshot-files to be consolidated. but for this i have to migrate to another storage first. it works but can't be the solution...

Post by **Vitaliy S.** » Jul 04, 2016 9:18 am this post

Mike, you can always refer to this topic as well, so that your engineer understands what buffersize you're talking about. Thanks!

pidthepiper · Post by **pidthepiper** » Jul 04, 2016 4:53 pm this post

Veeam Support level 2 have told me they are still awaiting more info from VMware as to the cause of this error

mgratla · Post by **mgratla** » Jul 05, 2016 1:19 pm this post

VMware came back and told me that there was an issue because of a vpxa.cfg file on the ESXi hosts, but they don't have an idea of the cause. They did say that 'too many delta disks or cloned disks depending on other disks which adversely affects the vpxa agent' , which is interesting because there was a single snapshot on each of these machines which is the one Veeam was trying to create.

They wanted me to edit the vpxa file on each of the hosts and change 128 to 256 and restart vpxa then enable sfcdb temporarily for a core dump.

I'm not sure I want to proceed with this change given that their reason for doing it doesn't seem to be correct

I also have been told to change the VddkPreReadBufferSize on my case with the wording that
"it doesn't appear that the root cause of the issue has been found - however, it does appear to be related to ESXi hosts that are added via vCenter: the error itself seems to be produced due to an issue with the datastore or connection between the host and the datastore"

I'm somewhat surprised at that statement, since we had no issue on v7 or v8 whatsoever for nearly two years with no changes to our environment.

I'll be adjusting the VddkPrereadBufferSize value today but if Picaroons response is correct, I doubt we will see much difference

mgratla · Post by **mgratla** » Jul 05, 2016 1:34 pm this post

The VddkPreReadBufferSize fix did not work for us at all either

R&D Forums

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Re: Repeated VDDK error 1 and 2

Who is online