Unstable 5.7 Collectors

khitsgmbh · Post by **khitsgmbh** » Feb 21, 2012 7:57 am this post

Hello,

we have some heavy stability issues with some of our collectors and i want to know, how we can tune the system to prevent stability issues in the future:

The environment:
- 1 x VirtualCenter (around 200 cores, 700 VMs)
- 1 x Veeam EM
- 6 x Veeam Collectors, divided into 2 Failovergroups with 3 Collectors each
- each Collector is a VM with 4 vCPU and 4 GB RAM. CPU and RAM usage between 50-70%. nworks load on collector between 30-50%
- Veeam Version 5.7
- VC event queue size: 350

The situation:
In one of the FGs the VM guys have some serious problems with storage and nics. The result is, that those objects generates hundreds and thousands of events. This causes a lot of trouble at the SCOM Agent on the collector: a lot of WMI events like 10376,10378 (Module was unable to convert WMI setting) and steady heartbeat failures.

My question:
I can not influence the stability of the VM Hosts (i.e. the real root cause) nor that the VM guys put their machines into maintenance... How can we reliable configure the collectors and the underlying SCOM agents in a way, that they work even in situations with high event counts? IMHO our nWorks design should easyly handle the load...

Any hints and suggestions are highly appreciated,

Dirk

Post by **Alec King** » Feb 28, 2012 2:43 pm this post

Hi Dirk,

Well, that's an interesting problem.....if you cannot immediately fix the root cause (event storm in VC) then we can examine some options.

I'd like some more details -
How many events do you see from these hosts? (approx)
What are the exact events/alerts?

We can perhaps tune the rules that respond to those events. Either disable them (for the problem Hosts), or maybe introduce something like event correlation, were multiple events will be rolled up into one alert.

I wonder also if these events are causing follow-on problems for our Collector, such as discovery thrashing. If storage and/or networking components are going online & offline rapidly, this could cause us to constantly re-discover the Hosts....this means very high CPU on Collector, and other problems follow from that.

Might be best if you could open a case with our support team - send over the logfiles, and we can dive into deeper analysis.

Thanks!
Alec

sepj12927 · Post by **sepj12927** » May 31, 2012 9:51 am this post

Hi,

Did you resolve this issue?

I'm suffering from a similar issue.

/Per

Post by **Alec King** » Jun 01, 2012 8:12 am this post

Hi Per,

Can you share more details of your problem? Do you have an event flood from vCenter, or other issue?

Cheers,
Alec

sepj12927 · Post by **sepj12927** » Jun 04, 2012 8:13 am this post

Hi,

We receive multiple events like this on our EMS.

=================================
Module was unable to convert WMI setting .\timestamp

One or more workflows were affected by this.

Workflow name: nworks.VMware.VEM.VMGUESTVIRTUALDISK.Collect.freePct
Instance name: C:\
Instance ID: {8579BA65-3D36-E5EC-5EFF-14749032E061}
Management group: MGMT_Group
=================================

At the time of the above event in the OpsMgr log there are on average 5-6 events pe minute in the nworks log.

/Per

vBPav · Post by **vBPav** » Jun 05, 2012 1:13 am this post

Per,

I would recommend submitting this to our support team @ http://cp.veeam.com. I recommend also exporting your OpsMgr event logs on each of your Veeam Collectors, zipping them, and submitting them with the case. Please post back once you get an answer from support. Good luck!

thomaxx · Post by **thomaxx** » Jan 09, 2013 12:36 pm this post

Hi Guys,

any solution here. I have the same Problem with one of my nworks Collectors.
I see in the nworks Event Log 10-15 Entries with
[UserLoginSessionEvent] User username@Servername logged in
[UserLogoutSessionEvent] User username logged out
The HealthService.exe is using 50% CPU and in the Operations Manager EventLog on the Collector Server i see Event's like this
Module was unable to convert WMI setting .\timestamp

One or more workflows were affected by this.

Workflow name: nworks.VMware.VEM.VMHOSTDISK.Monitor.totalWriteLatency
Instance name: vmhba2:C0:T1:L22
Instance ID: {CF745E74-DAE5-E9FA-F937-0B85DD4122E9}
Management group: BRZ

Cheers,
Thomas

Post by **Vitaliy S.** » Jan 10, 2013 10:57 am this post

Hi Thomas,

Unfortunately, none of the posters above mentioned his support ticket number, so I cannot check the resolution for you. Could you please open a support ticket and post your case number here, so I could update this topic with the resolution for future readers?

Thanks!

R&D Forums

Unstable 5.7 Collectors

Re: Unstable 5.7 Collectors

Re: Unstable 5.7 Collectors

Re: Unstable 5.7 Collectors

Re: Unstable 5.7 Collectors

Re: Unstable 5.7 Collectors

Re: Unstable 5.7 Collectors

Re: Unstable 5.7 Collectors

Who is online