Massive Concurrent Heartbeat Lost

Real-time performance monitoring and troubleshooting

Massive Concurrent Heartbeat Lost

Veeam Logoby rmancinelli » Wed Nov 10, 2010 1:11 pm

Every now and then, and seemingly at random, we will get a massive blast of "heart beat lost" messages. Usually within 30 seconds or a minute, these are followed by an equally massive blast of reset messages. We have several hundred VMs, and the lost message is not for all of them. It isn't unique to a given physical host, either.

Any thoughts on what might cause this?
rmancinelli
Influencer
 
Posts: 10
Liked: never
Joined: Wed Sep 15, 2010 9:02 pm
Full Name: Rick Mancinelli

Re: Massive Concurrent Heartbeat Lost

Veeam Logoby Vitaliy S. » Wed Nov 10, 2010 1:36 pm

The VM heartbeat status alarm depends on how reliable VMware Tools are communicating with vCenter Server. If I were you I would make sure you have the latest tools installed on the VMs. Besides, vCenter Server might be experiencing a heavy load, which might lead to downtimes in API calls processing.

If everything seems to be ok with your VMs you may adjust a corresponding time delay parameter in the "Hearbeat is missing for VM" alarm properties.

By the way, you may also check the Heartbeat Status via Managed Object Browser while connecting to your vCenter Server using browser. Should the corresponding instructions be required, just let me know.

Hope it helps!
Vitaliy S.
Veeam Software
 
Posts: 19558
Liked: 1102 times
Joined: Mon Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov

Re: Massive Concurrent Heartbeat Lost

Veeam Logoby mteamjpy » Thu Jul 05, 2012 11:39 am

HI vitaly

i am facing same problem on 200 vm +-

first the timing of the mob seems incorrect
can it be an issue ?

what do you want to see in the mob for the hearthbeat issues ?
mteamjpy
Enthusiast
 
Posts: 74
Liked: never
Joined: Wed Aug 10, 2011 12:31 pm

Re: Massive Concurrent Heartbeat Lost

Veeam Logoby Vitaliy S. » Thu Jul 05, 2012 4:57 pm 1 person likes this post

Hi Jean-pol,

Could you please tell me if you have a heartbeat alarm triggered on all 200 VMs at the same time? If this is the case, then this certainly indicates that there is a problem in vCenter Server, as all your VMs cannot just lose their heartbeat at the same time.

I'm fairly certain that the timing shouldn't be an issue, but it would still make sense to keep the time on your vCenter Server/Host/MOB in sync.

I was referring to Managed Object Browser as another source of VM Heartbeat loss status, as Veeam ONE uses the corresponding VM property in MOB to generate the alarm.

Thanks!
Vitaliy S.
Veeam Software
 
Posts: 19558
Liked: 1102 times
Joined: Mon Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov

Re: Massive Concurrent Heartbeat Lost

Veeam Logoby mteamjpy » Wed Jul 11, 2012 11:53 am

Hi Vitaly,

yes itis correct they are trigered almost at the same time .
but veeam monitor do not tell anything bad about my vcenter .

ok could i investigate on this ?

Ps. we have SRM Like DRP solution

Thanks in advances

Jean-pol
mteamjpy
Enthusiast
 
Posts: 74
Liked: never
Joined: Wed Aug 10, 2011 12:31 pm

Re: Massive Concurrent Heartbeat Lost

Veeam Logoby Vitaliy S. » Wed Jul 11, 2012 2:20 pm 1 person likes this post

I'm afraid that vCenter Server debug logs is the only source that can shed some light on this behavior. As you can see this issue is not a new one, it has been observed with older versions of VMware vSphere:
http://communities.vmware.com/thread/231717
http://kb.vmware.com/selfservice/micros ... Id=1017091

Please look through the recommendations I've given in the first response to this thread, might help.
Vitaliy S.
Veeam Software
 
Posts: 19558
Liked: 1102 times
Joined: Mon Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov

Re: Massive Concurrent Heartbeat Lost

Veeam Logoby mteamjpy » Wed Jul 11, 2012 2:48 pm

Hi Vitaliy,
Thanks a lot , i know that we can perfom better on vmware tools update ,
I hope openings the eyes of my team about it.
the concerned VM has been updated recently
i will look to the KB and give feedback when i can

thanks
Jean-po l
mteamjpy
Enthusiast
 
Posts: 74
Liked: never
Joined: Wed Aug 10, 2011 12:31 pm


Return to Monitoring



Who is online

Users browsing this forum: No registered users and 4 guests