Real-time performance monitoring and troubleshooting
Post Reply
rmancinelli
Influencer
Posts: 10
Liked: never
Joined: Sep 15, 2010 9:02 pm
Full Name: Rick Mancinelli
Contact:

Massive Concurrent Heartbeat Lost

Post by rmancinelli »

Every now and then, and seemingly at random, we will get a massive blast of "heart beat lost" messages. Usually within 30 seconds or a minute, these are followed by an equally massive blast of reset messages. We have several hundred VMs, and the lost message is not for all of them. It isn't unique to a given physical host, either.

Any thoughts on what might cause this?
Vitaliy S.
VP, Product Management
Posts: 27364
Liked: 2794 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Massive Concurrent Heartbeat Lost

Post by Vitaliy S. »

The VM heartbeat status alarm depends on how reliable VMware Tools are communicating with vCenter Server. If I were you I would make sure you have the latest tools installed on the VMs. Besides, vCenter Server might be experiencing a heavy load, which might lead to downtimes in API calls processing.

If everything seems to be ok with your VMs you may adjust a corresponding time delay parameter in the "Hearbeat is missing for VM" alarm properties.

By the way, you may also check the Heartbeat Status via Managed Object Browser while connecting to your vCenter Server using browser. Should the corresponding instructions be required, just let me know.

Hope it helps!
mteamjpy
Enthusiast
Posts: 74
Liked: never
Joined: Aug 10, 2011 12:31 pm
Contact:

Re: Massive Concurrent Heartbeat Lost

Post by mteamjpy »

HI vitaly

i am facing same problem on 200 vm +-

first the timing of the mob seems incorrect
can it be an issue ?

what do you want to see in the mob for the hearthbeat issues ?
Vitaliy S.
VP, Product Management
Posts: 27364
Liked: 2794 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Massive Concurrent Heartbeat Lost

Post by Vitaliy S. » 1 person likes this post

Hi Jean-pol,

Could you please tell me if you have a heartbeat alarm triggered on all 200 VMs at the same time? If this is the case, then this certainly indicates that there is a problem in vCenter Server, as all your VMs cannot just lose their heartbeat at the same time.

I'm fairly certain that the timing shouldn't be an issue, but it would still make sense to keep the time on your vCenter Server/Host/MOB in sync.

I was referring to Managed Object Browser as another source of VM Heartbeat loss status, as Veeam ONE uses the corresponding VM property in MOB to generate the alarm.

Thanks!
mteamjpy
Enthusiast
Posts: 74
Liked: never
Joined: Aug 10, 2011 12:31 pm
Contact:

Re: Massive Concurrent Heartbeat Lost

Post by mteamjpy »

Hi Vitaly,

yes itis correct they are trigered almost at the same time .
but veeam monitor do not tell anything bad about my vcenter .

ok could i investigate on this ?

Ps. we have SRM Like DRP solution

Thanks in advances

Jean-pol
Vitaliy S.
VP, Product Management
Posts: 27364
Liked: 2794 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Massive Concurrent Heartbeat Lost

Post by Vitaliy S. » 1 person likes this post

I'm afraid that vCenter Server debug logs is the only source that can shed some light on this behavior. As you can see this issue is not a new one, it has been observed with older versions of VMware vSphere:
http://communities.vmware.com/thread/231717
http://kb.vmware.com/selfservice/micros ... Id=1017091

Please look through the recommendations I've given in the first response to this thread, might help.
mteamjpy
Enthusiast
Posts: 74
Liked: never
Joined: Aug 10, 2011 12:31 pm
Contact:

Re: Massive Concurrent Heartbeat Lost

Post by mteamjpy »

Hi Vitaliy,
Thanks a lot , i know that we can perfom better on vmware tools update ,
I hope openings the eyes of my team about it.
the concerned VM has been updated recently
i will look to the KB and give feedback when i can

thanks
Jean-po l
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest