Discussions specific to the VMware vSphere hypervisor
Post Reply
stfconsulting
Service Provider
Posts: 10
Liked: never
Joined: Jan 31, 2015 9:17 pm
Full Name: S Furman
Contact:

VMWare Hosts Disconnecting

Post by stfconsulting » Jan 31, 2015 9:23 pm

Hey Guys, I posted this on the VMWare forum and I am trying to get to the bottom of the issues we are having with a new VMWare cluster. We are in the process of trying to implement Veeam and we are kind of stuck in the middle trying to troubleshoot this. Full disclosure: I am not really sure if Veeam has anything to do with what is happening however I am interesting in some feedback from everyone. VMWare is looking into this and pulled all the logs. I have been assigned someone from Tier 2 at Veeam however I have not spoken with him yet. Case # 00738466

Battling a tough problem with a new cluster and could use some feedback: We have been having issues where one of the hosts (random) goes into a disconnected state. If you let it sit long enough it will come back by itself. Restarting the vcenter services seems to speed up the process also. Last night I went to put on of the hosts into maintenance mode (to apply the latest patches) and all sorts of bad things happened. Got to the point where 2 hosts became disconnected. Situation got so bad that one of the hosts could not get back into the cluster and we had to shut machines down and re-register them on the cluster hosts. (Fun at 3:00AM) . VMware was a little stumped last night with what was happening so I have to re-engage them next week. Any help / ideas would be much appreciated.

Here is the hardware / details.

-Running 5.5 Build 2302651 (Dell Specific ISO)
-3 x Dell PowerEdge R730 [Boot from Flash] (firmware completely up to date)
-1 x Dell Powervault MD3420 12gb SAS connectivity (dual controller) (firmware current)
-There is a Dell PowerEdge R730XD direct connected to the MD3420 running Windows 2012R2 / Veeam 8.0 Update 1 for backups
-We are not sure if Veeam could be causing this to happen. Trying to get them involved. For now we have Veeam completely disabled.
-A putty session to one of the disconnected hosts that had locked up during a management agent restart came back magically when a veeam replication job was cancelled (was replicating a machine out of the backup repository so I have no idea why that would matter)
-Currently have DRS automation and Application monitoring disabled to mitigate risk
-Starting to move workloads to another cluster to reduce risk

Post Reply

Who is online

Users browsing this forum: rboonstra and 40 guests