Host-based backup of VMware vSphere VMs.
Post Reply
Matt@Work
Enthusiast
Posts: 51
Liked: 14 times
Joined: May 29, 2013 7:41 am
Full Name: Matt Collier
Location: South Coast
Contact:

Intermittent hang of guest DC's

Post by Matt@Work »

This is a frustrating problem that VMware have provided a workaround for which hasn't worked. Apologies if anyone has come across this already. Thought I'd ask here before I log a ticket.

Intermittently, our domain controllers lock up. Won't respond to CTRL-ALT-DEL's from the VM console, can't ping them or doing anything. Have to forcibly reset them. Sometimes it happens a couple of times a week, sometimes we don't get an occurence for several weeks.

VMware note a problem here https://kb.vmware.com/selfservice/micro ... Id=2079220 and we have implemented this as a workaround. We thought it had resolved the problem, but a spurt of issues this last week has brought it to the front again. Its happened on ESXi 5.1 and 5.5 so above the patch level they say the issue is fixed in.

We have AAIP enabled, guest file indexing and the proper credentials set. Its only affecting the DC's. Has anyone seen this before?
veremin
Product Manager
Posts: 20413
Liked: 2302 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Intermittent hang of guest DC's

Post by veremin »

Is there any correlation between the time DC locks and specific event, such as backup start, end, snapshot commit, etc.? Or the issue is sporadic in its nature? Furthermore, I'm not sure whether the given article relates to the issue anyhow, as it talks about Linux-based VMs and VMware Tools Quiescence. Thanks.
Matt@Work
Enthusiast
Posts: 51
Liked: 14 times
Joined: May 29, 2013 7:41 am
Full Name: Matt Collier
Location: South Coast
Contact:

Re: Intermittent hang of guest DC's

Post by Matt@Work »

I'm looking at the task log for the guest and I suspect its something around this time with this entry

Code: Select all

[02.06.2016 12:11:42] <17> Info     [Oracle backup] Disposing backup performer
[02.06.2016 12:11:42] <17> Info     Auto snapshot: delete freezed snapshot.
When I check the DC, nothing specific jumps out except a lot of VSS errors ID 8230.

Code: Select all

Volume Shadow Copy Service error: Failed resolving account Username with status 1376. Check connection to domain controller and VssAccessControl registry key. 

Operation:
   Gathering Writer Data
   Executing Asynchronous Operation

Context:
   Execution Context: Requestor
   Current State: GatherWriterMetadata

Error-specific details:
   Error: NetLocalGroupGetMemebers(Username), 0x80070560, The specified local group does not exist.
Regarding the link above, its more the resolution rather than the scenario I was referring to. Either way, it hasn't made a difference.

I'm trawling through the VMware log file, only thing I find is

2016-06-02T16:11:03.409Z| vcpu-0| I120: SNAPSHOT: Snapshot_ConsolidateWorkItemDone failed: A required file was not found (7)

2016-06-02T16:11:03.958Z| SnapshotVMXCombiner| I120: SnapshotVMXCombineFinalCb: Done with combine of 2 links, starting from 1 in 210431 usec with error 0x0: The operation completed successfully

2016-06-02T16:11:04.590Z| vcpu-0| I120: Vix: [38070053 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=3, newAppState=1881, success=1 additionalError=0

2016-06-03T06:04:01.493Z| mks| I120: SSL: syscall error 104: Connection reset by peer

2016-06-03T06:04:01.493Z| mks| I120: SOCKET 11 (129) recv error 104: Connection reset by peer

2016-06-03T06:04:13.026Z| vmx| I120: Vix: [36366631 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0
Post Reply

Who is online

Users browsing this forum: No registered users and 16 guests