Period of time when no data is colected

AdrianC · Post by **AdrianC** » Jan 22, 2012 7:13 pm this post

Hello,
I have nworks installed directly on the vCenter server. and curently i have some problems, with the data colected from the server.
I have long periods of time when no data is collected. See picture from the link: http://www.2shared.com/photo/cfcX7iKo/d ... ected.html
I have already executed the task "Configure OpsMgr Agent"

Also I have seen a lot of events unloading monitors ( here is an example )
Log Name: Operations Manager
Source: HealthService
Date: 22.1.2012 5:24:16
Event ID: 1103
Task Category: Health Service
Level: Warning
Keywords: Classic
User: N/A
Computer: <<<>>>
Description:
Summary: 11352 rule(s)/monitor(s) failed and got unloaded, 1 of them reached the failure limit that prevents automatic reload. Management group "SCOM". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).

Other events:
Log Name: Operations Manager
Source: Health Service Modules
Date: 22.1.2012 5:40:13
Event ID: 11052
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: <<<>>>>
Description:Module was unable to convert parameter to a double value
Original parameter: '$Data/Property[@Name='diskPressure']$'
Parameter after $Data replacement: ''
Error: 0x80020005
Details: Type mismatch.
One or more workflows were affected by this.
Workflow name: nworks.VMware.VEM.VMHOSTDATASTORE.Monitor.diskPressure
Instance name: helesx13-local
Instance ID: {CF956502-3288-1572-1A7D-8AAA25B0EC62}
Management group: SCOM

Log Name: Operations Manager
Source: Health Service Modules
Date: 22.1.2012 5:41:43
Event ID: 26013
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: <<>>>
Description:
The nworksEventLog Event Log on computer <<>>> appears to have "wrapped" or been cleared while the Windows Event Log Provider was not active or behind in processing events. This error occurs when the provider is inactive for a period of time in which more events are logged than the event log can contain or the log is cleared. Some events were likely lost. To avoid this error in the future, make your event log larger or ensure that the agent service is not stopped for long periods.
One or more workflows were affected by this.
Workflow name: many
Instance name: many
Instance ID: many
Management group: SCOM

PS: I just increased the values of the registry keys: Persistence Version Store Maximum and Persistence Cache Maximum maybe it will have an effect... other ideeas?

AdrianC · Post by **AdrianC** » Jan 23, 2012 10:14 am this post

I seen that the data is not colected when the CPU is higly ussed.
http://www.2shared.com/photo/MW-7zZg2/picture.html
- Light Blue: nworks Collector: Ops Mgr Agent HealthService CPU Usage; Object: Process; Coutner: % Processor Time; Instance:HealthService
- Blue: Collect agent processor utilization; Object: Health Service; Coutner: agent porcessor ussage
- Dark Blue: nworks Collector: Collect vCenter Service % Processor Time; Object: Process; Coutner: % Processor Time; Instance: vpxd
- Yellow: nworks Collector: Ops Mgr Monitoring process CPU Usage
- Green: Disk space

Please help!

AdrianC · Post by **AdrianC** » Jan 23, 2012 1:14 pm this post

A new update
Date and Time: 23.1.2012 3:07:34
Log Name: Operations Manager
Source: Health Service Modules
Event Number: 26017
Level: 2
Logging Computer: <<>>
User: N/A
Description:
The Windows Event Log Provider monitoring the nworksEventLog Event Log is <<>> minutes behind in processing events. This can occur when the provider is restarted after being offline for some time, or there are too many events to be handled by the workflow. One or more workflows were affected by this. Workflow name: many Instance name: many Instance ID: many Management group: <<>>

so.. any hints?

Post by **Alec King** » Jan 24, 2012 7:43 am this post

Hi Adrian,

I would say that either -
1. the Configure Ops Mgr Agent task has not completed correctly, so the correct settings are still not applied
2. the Ops Mgr agent is overloaded anyway, because you have too many Hosts + VMs monitored by this Collector

How many Hosts and VMs are monitored?
If you look in nworks UI, what is the 'Object Count' for this system? (Enterprise Manager tab, click on Collector server)
Can you also confirm how much CPU and RAM this server has? It needs two CPU minimum, and I would recommend 4GB RAM minimum.

In general we recommend dedicating a VM to the Collector Role, not sharing with other applications such as vCenter.....

Have you engaged with our excellent Support team?

They can always assist with checking your configuration.

Cheers
Alec

Post by **Alec King** » Jan 24, 2012 7:46 am this post

One more point! Have you applied the OVERRIDES to the Ops Mgr agent?
Additional to the Configure Agent task, there are required overrides for built-in SCOM rules. These overrides stop the agent from auto-restarting when it uses higher than usual memory and cpu. The agent restarting could also cause the gaps in data.

In the nworks MP Resource Kit (available here http://www.veeam.com/vmware-microsoft-e ... urces.html) there is a pre-built MP of the required overrides.

Cheers
Alec

AdrianC · Post by **AdrianC** » Jan 25, 2012 7:10 am this post

Much better now

regarding the perfroamnces.. still i ahve the problem with the nworksEventLog Event Log beeing processed.. any ideeas on that also?
Clear event log.. tryed once.. did not worked..

Post by **Alec King** » Jan 25, 2012 7:38 am this post

Hi Adrian,
So did you apply the overrides MP? Is that what helped?

You still might have an overloaded Ops Mgr agent....does the server still show high CPU/Memory use?

AdrianC · Post by **AdrianC** » Jan 27, 2012 7:11 am this post

Yes, the import of the overide did the trik for the collection of the performance counter, I do not see the spikes any more. But the alert processing backlog from the nworks event log still exists. A part of the status is down in this post.
27.1.2012 4:54 Error Warning Still Processing Backlogged Events (Warning)
27.1.2012 3:38 Warning Error Still Processing Backlogged Events (Error)
27.1.2012 3:28 Error Warning Still Processing Backlogged Events (Warning)
26.1.2012 17:10 Warning Error Still Processing Backlogged Events (Error)
26.1.2012 17:00 Error Warning Still Processing Backlogged Events (Warning)
26.1.2012 15:32 Warning Error Still Processing Backlogged Events (Error)
26.1.2012 15:22 Error Warning Still Processing Backlogged Events (Warning)
26.1.2012 5:02 Warning Error Still Processing Backlogged Events (Error)
26.1.2012 4:52 Error Warning Still Processing Backlogged Events (Warning)
26.1.2012 3:39 Warning Error Still Processing Backlogged Events (Error)
26.1.2012 3:19 Error Warning Still Processing Backlogged Events (Warning)
25.1.2012 17:00 Warning Error Still Processing Backlogged Events (Error)
25.1.2012 16:50 Error Warning Still Processing Backlogged Events (Warning)
25.1.2012 15:28 Warning Error Still Processing Backlogged Events (Error)
25.1.2012 15:18 Error Warning Still Processing Backlogged Events (Warning)
25.1.2012 9:50 Warning Error Still Processing Backlogged Events (Error)

Could it be due to memory problem. The server has 8GB of RAM. Available Mbytes is between 1,5GB and 500MB. I chekced also Resource monitr and appears that the Strandby is 926MB and free between 35MB and 0MB

Regarding the CPU.. i do not see big spikes it looks normal.

Post by **Alec King** » Jan 27, 2012 7:47 am this post

Hi Adrian,

OK, let's review your architecture....

This server is also the vCenter server, correct? Is the SQL database also on this server?

Are there any errors on the Management Server that this SCOM agent reports too? And what is the CPU and RAM on the Mgt Server?

And same for the Root Management Server - are there any errors in the Operations Manager event log? And how much CPU + RAM does it have?

How many SCOM agents report to your management server(s), and how many hosts and VMs are in vCenter?

Thanks!
Alec

AdrianC · Post by **AdrianC** » Feb 01, 2012 10:43 am this post

The MS/RMS does not have any errors. (RMS and MS have 2 CPU and 4GB of RAM. 340 SCOM Agents)
Connected Servers in nworks: Total ESX:30 ; Monitored ESX:8 (2 clusters each one with 4 ESX); Unmonitored ESX:22

CPU Spikes: From the Process/ %Processor Time the only one that spikes over 70% from time to time (2 times/day)is nworks Coletro: OpsMgr Agent Health Service CPU Ussage.

Regarding the SQL, not shure exactly (what is the DB name..I did not installed it and do not know how to find it) do you think this could be a problem?
I think the DB is on the same SQL Server like all the rest of his DB`s (including SCOM DB) ( don`t ask me who tought this.. but it has 31 DB`s on 1 SQL Server that has insuficient RAM)

Post by **Alec King** » Feb 09, 2012 7:05 am this post

Hi Adrian,

OK - if you have one SQL server, and it has 31 databases on it (!!) - especially databases with constant high disk activity, such as SCOM database (and data warehouse also?) - and you say it has insufficient RAM - then that could be your problem. SCOM will rely on good database performance (and of course, nworks will rely on SCOM!)

The first idea might be to try moving the SCOM database(s) to a server with more RAM, plenty CPU and fast disks. FYI the databases are called OperationsManager (operational 'real-time' database, perf data for console charts, and alerts etc) and OperationsManagerDW (data warehouse, or reporting database, if you installed reporting option)
Good article here - http://blogs.technet.com/b/kevinholman/ ... ience.aspx
Moving the databases is not easy! You might prefer to just re-install SCOM

. But if you are still seeing these gaps in data, you should perhaps review the back-end SQL performance.

Have you opened a case with our support team? If you could do that, and send the logs over (especially the Operations Manager event logs from RMS/MS/Collectors) then we can analyse your issue better.

thanks!
Alec

AdrianC · Post by **AdrianC** » Mar 27, 2012 8:22 pm this post

Hi,
Updated all application to 5.7 and importing the last MP.. => problem fixed

Post by **Alec King** » Mar 28, 2012 7:21 am this post

Thanks for the update Adrian! Good to know all is A-OK now. Happy monitoring!

R&D Forums

Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Re: Period of time when no data is colected

Who is online