Intermittent hang of guest DC's

VMware specific discussions

Intermittent hang of guest DC's

Veeam Logoby Matt@Work » Wed Jun 01, 2016 9:26 pm

This is a frustrating problem that VMware have provided a workaround for which hasn't worked. Apologies if anyone has come across this already. Thought I'd ask here before I log a ticket.

Intermittently, our domain controllers lock up. Won't respond to CTRL-ALT-DEL's from the VM console, can't ping them or doing anything. Have to forcibly reset them. Sometimes it happens a couple of times a week, sometimes we don't get an occurence for several weeks.

VMware note a problem here https://kb.vmware.com/selfservice/micro ... Id=2079220 and we have implemented this as a workaround. We thought it had resolved the problem, but a spurt of issues this last week has brought it to the front again. Its happened on ESXi 5.1 and 5.5 so above the patch level they say the issue is fixed in.

We have AAIP enabled, guest file indexing and the proper credentials set. Its only affecting the DC's. Has anyone seen this before?
Matt@Work
Enthusiast
 
Posts: 34
Liked: 13 times
Joined: Wed May 29, 2013 7:41 am
Location: North East UK
Full Name: Matt Collier

Re: Intermittent hang of guest DC's

Veeam Logoby v.Eremin » Thu Jun 02, 2016 10:08 am

Is there any correlation between the time DC locks and specific event, such as backup start, end, snapshot commit, etc.? Or the issue is sporadic in its nature? Furthermore, I'm not sure whether the given article relates to the issue anyhow, as it talks about Linux-based VMs and VMware Tools Quiescence. Thanks.
v.Eremin
Veeam Software
 
Posts: 13266
Liked: 968 times
Joined: Fri Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin

Re: Intermittent hang of guest DC's

Veeam Logoby Matt@Work » Fri Jun 03, 2016 10:15 am

I'm looking at the task log for the guest and I suspect its something around this time with this entry

Code: Select all
[02.06.2016 12:11:42] <17> Info     [Oracle backup] Disposing backup performer
[02.06.2016 12:11:42] <17> Info     Auto snapshot: delete freezed snapshot.

When I check the DC, nothing specific jumps out except a lot of VSS errors ID 8230.

Code: Select all
Volume Shadow Copy Service error: Failed resolving account Username with status 1376. Check connection to domain controller and VssAccessControl registry key.

Operation:
   Gathering Writer Data
   Executing Asynchronous Operation

Context:
   Execution Context: Requestor
   Current State: GatherWriterMetadata

Error-specific details:
   Error: NetLocalGroupGetMemebers(Username), 0x80070560, The specified local group does not exist.

Regarding the link above, its more the resolution rather than the scenario I was referring to. Either way, it hasn't made a difference.

I'm trawling through the VMware log file, only thing I find is

2016-06-02T16:11:03.409Z| vcpu-0| I120: SNAPSHOT: Snapshot_ConsolidateWorkItemDone failed: A required file was not found (7)

2016-06-02T16:11:03.958Z| SnapshotVMXCombiner| I120: SnapshotVMXCombineFinalCb: Done with combine of 2 links, starting from 1 in 210431 usec with error 0x0: The operation completed successfully

2016-06-02T16:11:04.590Z| vcpu-0| I120: Vix: [38070053 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=3, newAppState=1881, success=1 additionalError=0

2016-06-03T06:04:01.493Z| mks| I120: SSL: syscall error 104: Connection reset by peer

2016-06-03T06:04:01.493Z| mks| I120: SOCKET 11 (129) recv error 104: Connection reset by peer

2016-06-03T06:04:13.026Z| vmx| I120: Vix: [36366631 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0
Matt@Work
Enthusiast
 
Posts: 34
Liked: 13 times
Joined: Wed May 29, 2013 7:41 am
Location: North East UK
Full Name: Matt Collier


Return to VMware vSphere



Who is online

Users browsing this forum: No registered users and 11 guests