we have some heavy stability issues with some of our collectors and i want to know, how we can tune the system to prevent stability issues in the future:
- 1 x VirtualCenter (around 200 cores, 700 VMs)
- 1 x Veeam EM
- 6 x Veeam Collectors, divided into 2 Failovergroups with 3 Collectors each
- each Collector is a VM with 4 vCPU and 4 GB RAM. CPU and RAM usage between 50-70%. nworks load on collector between 30-50%
- Veeam Version 5.7
- VC event queue size: 350
In one of the FGs the VM guys have some serious problems with storage and nics. The result is, that those objects generates hundreds and thousands of events. This causes a lot of trouble at the SCOM Agent on the collector: a lot of WMI events like 10376,10378 (Module was unable to convert WMI setting) and steady heartbeat failures.
I can not influence the stability of the VM Hosts (i.e. the real root cause) nor that the VM guys put their machines into maintenance... How can we reliable configure the collectors and the underlying SCOM agents in a way, that they work even in situations with high event counts? IMHO our nWorks design should easyly handle the load...
Any hints and suggestions are highly appreciated,