Period of time when no data is colected

Unleash the power of System Center for vSphere and Hyper-V | Veeam Task Manager for Hyper-V

Period of time when no data is colected

Veeam Logoby AdrianC » Sun Jan 22, 2012 7:13 pm

Hello,
I have nworks installed directly on the vCenter server. and curently i have some problems, with the data colected from the server.
I have long periods of time when no data is collected. See picture from the link: http://www.2shared.com/photo/cfcX7iKo/data_colected.html
I have already executed the task "Configure OpsMgr Agent"

Also I have seen a lot of events unloading monitors ( here is an example )
Log Name: Operations Manager
Source: HealthService
Date: 22.1.2012 5:24:16
Event ID: 1103
Task Category: Health Service
Level: Warning
Keywords: Classic
User: N/A
Computer: <<<>>>
Description:
Summary: 11352 rule(s)/monitor(s) failed and got unloaded, 1 of them reached the failure limit that prevents automatic reload. Management group "SCOM". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).


Other events:
Log Name: Operations Manager
Source: Health Service Modules
Date: 22.1.2012 5:40:13
Event ID: 11052
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: <<<>>>>
Description:Module was unable to convert parameter to a double value
Original parameter: '$Data/Property[@Name='diskPressure']$'
Parameter after $Data replacement: ''
Error: 0x80020005
Details: Type mismatch.
One or more workflows were affected by this.
Workflow name: nworks.VMware.VEM.VMHOSTDATASTORE.Monitor.diskPressure
Instance name: helesx13-local
Instance ID: {CF956502-3288-1572-1A7D-8AAA25B0EC62}
Management group: SCOM


Log Name: Operations Manager
Source: Health Service Modules
Date: 22.1.2012 5:41:43
Event ID: 26013
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: <<>>>
Description:
The nworksEventLog Event Log on computer <<>>> appears to have "wrapped" or been cleared while the Windows Event Log Provider was not active or behind in processing events. This error occurs when the provider is inactive for a period of time in which more events are logged than the event log can contain or the log is cleared. Some events were likely lost. To avoid this error in the future, make your event log larger or ensure that the agent service is not stopped for long periods.
One or more workflows were affected by this.
Workflow name: many
Instance name: many
Instance ID: many
Management group: SCOM


PS: I just increased the values of the registry keys: Persistence Version Store Maximum and Persistence Cache Maximum maybe it will have an effect... other ideeas?
AdrianC
Novice
 
Posts: 7
Liked: never
Joined: Sun Jan 22, 2012 6:53 pm
Full Name: Adrian Chirtoc

Re: Period of time when no data is colected

Veeam Logoby AdrianC » Mon Jan 23, 2012 10:14 am

I seen that the data is not colected when the CPU is higly ussed.
http://www.2shared.com/photo/MW-7zZg2/picture.html
- Light Blue: nworks Collector: Ops Mgr Agent HealthService CPU Usage; Object: Process; Coutner: % Processor Time; Instance:HealthService
- Blue: Collect agent processor utilization; Object: Health Service; Coutner: agent porcessor ussage
- Dark Blue: nworks Collector: Collect vCenter Service % Processor Time; Object: Process; Coutner: % Processor Time; Instance: vpxd
- Yellow: nworks Collector: Ops Mgr Monitoring process CPU Usage
- Green: Disk space

Please help!
AdrianC
Novice
 
Posts: 7
Liked: never
Joined: Sun Jan 22, 2012 6:53 pm
Full Name: Adrian Chirtoc

Re: Period of time when no data is colected

Veeam Logoby AdrianC » Mon Jan 23, 2012 1:14 pm

A new update
Date and Time: 23.1.2012 3:07:34
Log Name: Operations Manager
Source: Health Service Modules
Event Number: 26017
Level: 2
Logging Computer: <<>>
User: N/A
Description:
The Windows Event Log Provider monitoring the nworksEventLog Event Log is <<>> minutes behind in processing events. This can occur when the provider is restarted after being offline for some time, or there are too many events to be handled by the workflow. One or more workflows were affected by this. Workflow name: many Instance name: many Instance ID: many Management group: <<>>

so.. any hints?
AdrianC
Novice
 
Posts: 7
Liked: never
Joined: Sun Jan 22, 2012 6:53 pm
Full Name: Adrian Chirtoc

Re: Period of time when no data is colected

Veeam Logoby Alec King » Tue Jan 24, 2012 7:43 am

Hi Adrian,

I would say that either -
1. the Configure Ops Mgr Agent task has not completed correctly, so the correct settings are still not applied
2. the Ops Mgr agent is overloaded anyway, because you have too many Hosts + VMs monitored by this Collector

How many Hosts and VMs are monitored?
If you look in nworks UI, what is the 'Object Count' for this system? (Enterprise Manager tab, click on Collector server)
Can you also confirm how much CPU and RAM this server has? It needs two CPU minimum, and I would recommend 4GB RAM minimum.

In general we recommend dedicating a VM to the Collector Role, not sharing with other applications such as vCenter.....

Have you engaged with our excellent Support team? :) They can always assist with checking your configuration.

Cheers
Alec
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am

Re: Period of time when no data is colected

Veeam Logoby Alec King » Tue Jan 24, 2012 7:46 am

One more point! Have you applied the OVERRIDES to the Ops Mgr agent?
Additional to the Configure Agent task, there are required overrides for built-in SCOM rules. These overrides stop the agent from auto-restarting when it uses higher than usual memory and cpu. The agent restarting could also cause the gaps in data.

In the nworks MP Resource Kit (available here http://www.veeam.com/vmware-microsoft-esx-monitoring/resources.html) there is a pre-built MP of the required overrides.

Cheers
Alec
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am

Re: Period of time when no data is colected

Veeam Logoby AdrianC » Wed Jan 25, 2012 7:10 am

Much better now :) regarding the perfroamnces.. still i ahve the problem with the nworksEventLog Event Log beeing processed.. any ideeas on that also?
Clear event log.. tryed once.. did not worked..
AdrianC
Novice
 
Posts: 7
Liked: never
Joined: Sun Jan 22, 2012 6:53 pm
Full Name: Adrian Chirtoc

Re: Period of time when no data is colected

Veeam Logoby Alec King » Wed Jan 25, 2012 7:38 am

Hi Adrian,
So did you apply the overrides MP? Is that what helped?

You still might have an overloaded Ops Mgr agent....does the server still show high CPU/Memory use?
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am

Re: Period of time when no data is colected

Veeam Logoby AdrianC » Fri Jan 27, 2012 7:11 am

Yes, the import of the overide did the trik for the collection of the performance counter, I do not see the spikes any more. But the alert processing backlog from the nworks event log still exists. A part of the status is down in this post.
27.1.2012 4:54ErrorWarningStill Processing Backlogged Events (Warning)
27.1.2012 3:38WarningErrorStill Processing Backlogged Events (Error)
27.1.2012 3:28ErrorWarningStill Processing Backlogged Events (Warning)
26.1.2012 17:10WarningErrorStill Processing Backlogged Events (Error)
26.1.2012 17:00ErrorWarningStill Processing Backlogged Events (Warning)
26.1.2012 15:32WarningErrorStill Processing Backlogged Events (Error)
26.1.2012 15:22ErrorWarningStill Processing Backlogged Events (Warning)
26.1.2012 5:02WarningErrorStill Processing Backlogged Events (Error)
26.1.2012 4:52ErrorWarningStill Processing Backlogged Events (Warning)
26.1.2012 3:39WarningErrorStill Processing Backlogged Events (Error)
26.1.2012 3:19ErrorWarningStill Processing Backlogged Events (Warning)
25.1.2012 17:00WarningErrorStill Processing Backlogged Events (Error)
25.1.2012 16:50ErrorWarningStill Processing Backlogged Events (Warning)
25.1.2012 15:28WarningErrorStill Processing Backlogged Events (Error)
25.1.2012 15:18ErrorWarningStill Processing Backlogged Events (Warning)
25.1.2012 9:50WarningErrorStill Processing Backlogged Events (Error)


Could it be due to memory problem. The server has 8GB of RAM. Available Mbytes is between 1,5GB and 500MB. I chekced also Resource monitr and appears that the Strandby is 926MB and free between 35MB and 0MB :)
Regarding the CPU.. i do not see big spikes it looks normal.
AdrianC
Novice
 
Posts: 7
Liked: never
Joined: Sun Jan 22, 2012 6:53 pm
Full Name: Adrian Chirtoc

Re: Period of time when no data is colected

Veeam Logoby Alec King » Fri Jan 27, 2012 7:47 am

Hi Adrian,

OK, let's review your architecture....

This server is also the vCenter server, correct? Is the SQL database also on this server?

Are there any errors on the Management Server that this SCOM agent reports too? And what is the CPU and RAM on the Mgt Server?

And same for the Root Management Server - are there any errors in the Operations Manager event log? And how much CPU + RAM does it have?

How many SCOM agents report to your management server(s), and how many hosts and VMs are in vCenter?

Thanks!
Alec
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am

Re: Period of time when no data is colected

Veeam Logoby AdrianC » Wed Feb 01, 2012 10:43 am

The MS/RMS does not have any errors. (RMS and MS have 2 CPU and 4GB of RAM. 340 SCOM Agents)
Connected Servers in nworks: Total ESX:30 ; Monitored ESX:8 (2 clusters each one with 4 ESX); Unmonitored ESX:22

CPU Spikes: From the Process/ %Processor Time the only one that spikes over 70% from time to time (2 times/day)is nworks Coletro: OpsMgr Agent Health Service CPU Ussage.

Regarding the SQL, not shure exactly (what is the DB name..I did not installed it and do not know how to find it) do you think this could be a problem?
I think the DB is on the same SQL Server like all the rest of his DB`s (including SCOM DB) ( don`t ask me who tought this.. but it has 31 DB`s on 1 SQL Server that has insuficient RAM)
AdrianC
Novice
 
Posts: 7
Liked: never
Joined: Sun Jan 22, 2012 6:53 pm
Full Name: Adrian Chirtoc

Re: Period of time when no data is colected

Veeam Logoby Alec King » Thu Feb 09, 2012 7:05 am

Hi Adrian,

OK - if you have one SQL server, and it has 31 databases on it (!!) - especially databases with constant high disk activity, such as SCOM database (and data warehouse also?) - and you say it has insufficient RAM - then that could be your problem. SCOM will rely on good database performance (and of course, nworks will rely on SCOM!)

The first idea might be to try moving the SCOM database(s) to a server with more RAM, plenty CPU and fast disks. FYI the databases are called OperationsManager (operational 'real-time' database, perf data for console charts, and alerts etc) and OperationsManagerDW (data warehouse, or reporting database, if you installed reporting option)
Good article here - http://blogs.technet.com/b/kevinholman/ ... ience.aspx
Moving the databases is not easy! You might prefer to just re-install SCOM ;-). But if you are still seeing these gaps in data, you should perhaps review the back-end SQL performance.

Have you opened a case with our support team? If you could do that, and send the logs over (especially the Operations Manager event logs from RMS/MS/Collectors) then we can analyse your issue better.

thanks!
Alec
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am

Re: Period of time when no data is colected

Veeam Logoby AdrianC » Tue Mar 27, 2012 8:22 pm

Hi,
Updated all application to 5.7 and importing the last MP.. => problem fixed :)
AdrianC
Novice
 
Posts: 7
Liked: never
Joined: Sun Jan 22, 2012 6:53 pm
Full Name: Adrian Chirtoc

Re: Period of time when no data is colected

Veeam Logoby Alec King » Wed Mar 28, 2012 7:21 am

Thanks for the update Adrian! Good to know all is A-OK now. Happy monitoring! :)
Alec King
Veeam Software
 
Posts: 700
Liked: 116 times
Joined: Sun Jan 01, 2006 1:01 am


Return to Veeam Management Pack for Microsoft System Center



Who is online

Users browsing this forum: No registered users and 2 guests