-
- Novice
- Posts: 4
- Liked: never
- Joined: Nov 07, 2011 5:22 pm
- Contact:
Unstable 5.7 Collectors
Hello,
we have some heavy stability issues with some of our collectors and i want to know, how we can tune the system to prevent stability issues in the future:
The environment:
- 1 x VirtualCenter (around 200 cores, 700 VMs)
- 1 x Veeam EM
- 6 x Veeam Collectors, divided into 2 Failovergroups with 3 Collectors each
- each Collector is a VM with 4 vCPU and 4 GB RAM. CPU and RAM usage between 50-70%. nworks load on collector between 30-50%
- Veeam Version 5.7
- VC event queue size: 350
The situation:
In one of the FGs the VM guys have some serious problems with storage and nics. The result is, that those objects generates hundreds and thousands of events. This causes a lot of trouble at the SCOM Agent on the collector: a lot of WMI events like 10376,10378 (Module was unable to convert WMI setting) and steady heartbeat failures.
My question:
I can not influence the stability of the VM Hosts (i.e. the real root cause) nor that the VM guys put their machines into maintenance... How can we reliable configure the collectors and the underlying SCOM agents in a way, that they work even in situations with high event counts? IMHO our nWorks design should easyly handle the load...
Any hints and suggestions are highly appreciated,
Dirk
we have some heavy stability issues with some of our collectors and i want to know, how we can tune the system to prevent stability issues in the future:
The environment:
- 1 x VirtualCenter (around 200 cores, 700 VMs)
- 1 x Veeam EM
- 6 x Veeam Collectors, divided into 2 Failovergroups with 3 Collectors each
- each Collector is a VM with 4 vCPU and 4 GB RAM. CPU and RAM usage between 50-70%. nworks load on collector between 30-50%
- Veeam Version 5.7
- VC event queue size: 350
The situation:
In one of the FGs the VM guys have some serious problems with storage and nics. The result is, that those objects generates hundreds and thousands of events. This causes a lot of trouble at the SCOM Agent on the collector: a lot of WMI events like 10376,10378 (Module was unable to convert WMI setting) and steady heartbeat failures.
My question:
I can not influence the stability of the VM Hosts (i.e. the real root cause) nor that the VM guys put their machines into maintenance... How can we reliable configure the collectors and the underlying SCOM agents in a way, that they work even in situations with high event counts? IMHO our nWorks design should easyly handle the load...
Any hints and suggestions are highly appreciated,
Dirk
-
- VP, Product Management
- Posts: 1495
- Liked: 382 times
- Joined: Jan 01, 2006 1:01 am
- Contact:
Re: Unstable 5.7 Collectors
Hi Dirk,
Well, that's an interesting problem.....if you cannot immediately fix the root cause (event storm in VC) then we can examine some options.
I'd like some more details -
How many events do you see from these hosts? (approx)
What are the exact events/alerts?
We can perhaps tune the rules that respond to those events. Either disable them (for the problem Hosts), or maybe introduce something like event correlation, were multiple events will be rolled up into one alert.
I wonder also if these events are causing follow-on problems for our Collector, such as discovery thrashing. If storage and/or networking components are going online & offline rapidly, this could cause us to constantly re-discover the Hosts....this means very high CPU on Collector, and other problems follow from that.
Might be best if you could open a case with our support team - send over the logfiles, and we can dive into deeper analysis.
Thanks!
Alec
Well, that's an interesting problem.....if you cannot immediately fix the root cause (event storm in VC) then we can examine some options.
I'd like some more details -
How many events do you see from these hosts? (approx)
What are the exact events/alerts?
We can perhaps tune the rules that respond to those events. Either disable them (for the problem Hosts), or maybe introduce something like event correlation, were multiple events will be rolled up into one alert.
I wonder also if these events are causing follow-on problems for our Collector, such as discovery thrashing. If storage and/or networking components are going online & offline rapidly, this could cause us to constantly re-discover the Hosts....this means very high CPU on Collector, and other problems follow from that.
Might be best if you could open a case with our support team - send over the logfiles, and we can dive into deeper analysis.
Thanks!
Alec
-
- Enthusiast
- Posts: 25
- Liked: never
- Joined: May 31, 2012 9:48 am
- Full Name: Per J
- Contact:
Re: Unstable 5.7 Collectors
Hi,
Did you resolve this issue?
I'm suffering from a similar issue.
/Per
Did you resolve this issue?
I'm suffering from a similar issue.
/Per
-
- VP, Product Management
- Posts: 1495
- Liked: 382 times
- Joined: Jan 01, 2006 1:01 am
- Contact:
Re: Unstable 5.7 Collectors
Hi Per,
Can you share more details of your problem? Do you have an event flood from vCenter, or other issue?
Cheers,
Alec
Can you share more details of your problem? Do you have an event flood from vCenter, or other issue?
Cheers,
Alec
-
- Enthusiast
- Posts: 25
- Liked: never
- Joined: May 31, 2012 9:48 am
- Full Name: Per J
- Contact:
Re: Unstable 5.7 Collectors
Hi,
We receive multiple events like this on our EMS.
=================================
Module was unable to convert WMI setting .\timestamp
One or more workflows were affected by this.
Workflow name: nworks.VMware.VEM.VMGUESTVIRTUALDISK.Collect.freePct
Instance name: C:\
Instance ID: {8579BA65-3D36-E5EC-5EFF-14749032E061}
Management group: MGMT_Group
=================================
At the time of the above event in the OpsMgr log there are on average 5-6 events pe minute in the nworks log.
/Per
We receive multiple events like this on our EMS.
=================================
Module was unable to convert WMI setting .\timestamp
One or more workflows were affected by this.
Workflow name: nworks.VMware.VEM.VMGUESTVIRTUALDISK.Collect.freePct
Instance name: C:\
Instance ID: {8579BA65-3D36-E5EC-5EFF-14749032E061}
Management group: MGMT_Group
=================================
At the time of the above event in the OpsMgr log there are on average 5-6 events pe minute in the nworks log.
/Per
-
- Expert
- Posts: 181
- Liked: 13 times
- Joined: Jan 13, 2010 6:08 pm
- Full Name: Brian Pavnick
- Contact:
Re: Unstable 5.7 Collectors
Per,
I would recommend submitting this to our support team @ http://cp.veeam.com. I recommend also exporting your OpsMgr event logs on each of your Veeam Collectors, zipping them, and submitting them with the case. Please post back once you get an answer from support. Good luck!
I would recommend submitting this to our support team @ http://cp.veeam.com. I recommend also exporting your OpsMgr event logs on each of your Veeam Collectors, zipping them, and submitting them with the case. Please post back once you get an answer from support. Good luck!
Brian Pavnick | Cireson| Solutions Architect
- Follow me on Twitter @ vbpav
- Reach me on e-mail @ brian.pavnick@cireson.com
- Follow me on Twitter @ vbpav
- Reach me on e-mail @ brian.pavnick@cireson.com
-
- Novice
- Posts: 3
- Liked: never
- Joined: Nov 03, 2009 3:21 pm
- Full Name: Thomas Loicht
- Contact:
Re: Unstable 5.7 Collectors
Hi Guys,
any solution here. I have the same Problem with one of my nworks Collectors.
I see in the nworks Event Log 10-15 Entries with
[UserLoginSessionEvent] User username@Servername logged in
[UserLogoutSessionEvent] User username logged out
The HealthService.exe is using 50% CPU and in the Operations Manager EventLog on the Collector Server i see Event's like this
Module was unable to convert WMI setting .\timestamp
One or more workflows were affected by this.
Workflow name: nworks.VMware.VEM.VMHOSTDISK.Monitor.totalWriteLatency
Instance name: vmhba2:C0:T1:L22
Instance ID: {CF745E74-DAE5-E9FA-F937-0B85DD4122E9}
Management group: BRZ
Cheers,
Thomas
any solution here. I have the same Problem with one of my nworks Collectors.
I see in the nworks Event Log 10-15 Entries with
[UserLoginSessionEvent] User username@Servername logged in
[UserLogoutSessionEvent] User username logged out
The HealthService.exe is using 50% CPU and in the Operations Manager EventLog on the Collector Server i see Event's like this
Module was unable to convert WMI setting .\timestamp
One or more workflows were affected by this.
Workflow name: nworks.VMware.VEM.VMHOSTDISK.Monitor.totalWriteLatency
Instance name: vmhba2:C0:T1:L22
Instance ID: {CF745E74-DAE5-E9FA-F937-0B85DD4122E9}
Management group: BRZ
Cheers,
Thomas
-
- VP, Product Management
- Posts: 27347
- Liked: 2785 times
- Joined: Mar 30, 2009 9:13 am
- Full Name: Vitaliy Safarov
- Contact:
Re: Unstable 5.7 Collectors
Hi Thomas,
Unfortunately, none of the posters above mentioned his support ticket number, so I cannot check the resolution for you. Could you please open a support ticket and post your case number here, so I could update this topic with the resolution for future readers?
Thanks!
Unfortunately, none of the posters above mentioned his support ticket number, so I cannot check the resolution for you. Could you please open a support ticket and post your case number here, so I could update this topic with the resolution for future readers?
Thanks!
Who is online
Users browsing this forum: No registered users and 1 guest