Real-time performance monitoring and troubleshooting
Post Reply
Daveyd
Veteran
Posts: 283
Liked: 11 times
Joined: May 20, 2010 4:17 pm
Full Name: Dave DeLollis
Contact:

Disk command aborts

Post by Daveyd »

I am trying to figure out why were are getting sporadic disk command aborts on each of our ESX hosts. In the Veeam Monitor, if I choose the Datacenter and look at the Disk tab, I see numerical values for the disk command abort counter for each of the Datastores. However, if I go into one of the Datastores in Veeam Monitor and go into the Disk Issues tab, all VMs are listed however there are no aborts listed during the same time period. Any reason for that?
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Disk command aborts

Post by Vitaliy S. »

Hi Dave, It would help if you could post some screenshots, as I don't have any disk command aborts in my lab, I am a lucky guy ;) Thanks!
Daveyd
Veteran
Posts: 283
Liked: 11 times
Joined: May 20, 2010 4:17 pm
Full Name: Dave DeLollis
Contact:

Re: Disk command aborts

Post by Daveyd »

Here is the past day viewing the Disk aborts at the Host level...

Image



And at the Datastore level


Image
Daveyd
Veteran
Posts: 283
Liked: 11 times
Joined: May 20, 2010 4:17 pm
Full Name: Dave DeLollis
Contact:

Re: Disk command aborts

Post by Daveyd »

Also, I think the numbers are a little off on this screenshot. My maximum range from 6600- 9281ms but the axis numbers show 10s of thousands??

Image
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Disk command aborts

Post by Vitaliy S. »

Dave, the only reason I can think of is that you had this host previously connected to this datastore and right now you've removed this datastore from hosts storage inventory.

Basically, if you choose particular datastore, it will scan existing connection to the hosts and will display the graph based on results you currently have, in other words historical data will be retained within a host object, not a datastore object.

As regards Disk I/O tab, then datastore latency is displayed as a stacked graph, because arithmetic mean cannot be used as an indicator of an average disk latency combined from multiple hosts.

Hope this makes sense.
Daveyd
Veteran
Posts: 283
Liked: 11 times
Joined: May 20, 2010 4:17 pm
Full Name: Dave DeLollis
Contact:

Re: Disk command aborts

Post by Daveyd »

When I choose to run a Trend Report to view all SCSI aborts that happened during the past week,the report only shows daily totals. I wanted to see at what times during the day, for a week period, I was receiving the most aborts. Is that possible?

Also, any channce a real time zoom feature will be incorporated? I am demoing other products and I LOVE the ability that some have to look at a graph, that has a day or weeks worth of data, hold down my left mouse button and highlight a section of the graph and it will zoom into that specific time frame.
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Disk command aborts

Post by Vitaliy S. »

Hi Dave,

Currently, this is not possible. Could you please help me to understand the use case for that? Will the corresponding alarm for SCSI aborts (with the exact time and number of aborts) be what you're looking for?

Real time zoom would be possible with a limited set of counters, because you might know that Veeam Monitor stores more than 60 performance metrics for each and every object, so keeping real-time data for every object and every counter might make your database unmanageable.

I would love to hear what counters you're mostly interested in, so we could incorporate this functionality to the next releases.

Thanks!
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: Disk command aborts

Post by joergr »

Wow, Dave, that topic seems interesting to me. please give details about the esxi version and patch level, san system/vendor/firmware, the connection, the hba´s (doesn't matter if iscsi or fc, just provide vendor, model and firmware), the switches used for san (doesn't matter if iscsi or fc, just provide vendor, model and firmware).

If there is no HBA, and you might use SW iSCSI or SW FCOE, please also report the card and the driver (for example intel x520, dual, driver 2.0.84.9).

Please also check out if the hosts were encountering cpu pitches or very high network load during the storage latency/blackout phases.

If you don´t have everything at your hand, please at least tell me what you know in mind.

best regards
Joerg
Daveyd
Veteran
Posts: 283
Liked: 11 times
Joined: May 20, 2010 4:17 pm
Full Name: Dave DeLollis
Contact:

Re: Disk command aborts

Post by Daveyd »

We are currently running ESX 4.0U3. Each server is a HP DL380 G6 with 2x HP FC1142SR 4Gb FC HBAs. They are all on the latest BIOS, 2.15 and latest EFI, 2.2.0. The HBAs are using VMware's driver 8.02.01-k1-vmw48-4vmw. We have a pair of EMC CX4s (1 in Prod, 1 in DR) running FLARE30 I believe. We are also using EMC RecoverPoint appliances.

During the events there are no abnormal spikes in CPU or network utilization on the Hosts. Working with EMC, we think we have isolated the issue to RecoverPoint. When I see scsi aborts in the Veeam monitor, I see cooresponding disconnect alerts on the RecoverPoint appliances.

Vitaliy, I was looking for a trend report that would show me hourly data for each day of the week, for 1 week. I wanted to be able to look at a report that showed me, on Monday between 8-9am I have xxx scsi aborts and 7-8pm I had xxx aborts, On Tuesday between 4-5pm I had xxx aborts....and so on for an entire week. That would be a nice report to see if issues are happening between specific hours every day or specific hours every other day, etc. The trend report that the Monitor produces now just shows me that I had xxx aborts that day.

The zoom feature....On one product I am demoing, SolarWinds Virtualization Manager, I can take an active graph of say 7 days worth of a specific metric, say CPU utilization and zoom in on that 37day graph down to a specific hour on a specific day without having to open new graphs.
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: Disk command aborts

Post by joergr »

Hi Dave,

thanks. Do you see these aborts when replication takes place or randomly? I barely remember a case years ago where aborts where occurring during replication, but on the dr site...let me google it....

...found, this is the old thread http://communities.vmware.com/thread/78 ... 5&tstart=0
Maybe not your scenario but all that comes to my head actually, i am much more familiar with Equallogic ;-)

Best regards,
Joerg
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Disk command aborts

Post by Vitaliy S. »

I see... but would you like to have this ability (real-time zoom) for all performance metrics we have or only for a few of them (like CPU Usage, Memory Usage etc.)?
Daveyd
Veteran
Posts: 283
Liked: 11 times
Joined: May 20, 2010 4:17 pm
Full Name: Dave DeLollis
Contact:

Re: Disk command aborts

Post by Daveyd »

Vitaliy S. wrote:I see... but would you like to have this ability (real-time zoom) for all performance metrics we have or only for a few of them (like CPU Usage, Memory Usage etc.)?
As a customer, the more the better...as a developer, whatever is realistic :)
Daveyd
Veteran
Posts: 283
Liked: 11 times
Joined: May 20, 2010 4:17 pm
Full Name: Dave DeLollis
Contact:

Re: Disk command aborts

Post by Daveyd »

Another feature request...I ran a HTML report on all my VMs to show their read and write rates for my VMs during a specific time period. While it did produce a nice report, it would be nice if would be a sortable report. I would have like to seen the VMs with the highest MBps listed first then the rese in decending order. The report shows all the VMs but in no particular order and its hard to look at 50 VMs and see which ones hit the highest throughput during that timeframe since its list both KBps and MBps
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Disk command aborts

Post by Vitaliy S. »

Makes perfect sense, I agree. I hope you will be really impressed with our new performance reports that will be shipped with v6. Anyway, thanks for the feedback!
Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests