Disk command aborts

Daveyd · Post by **Daveyd** » Oct 13, 2011 8:34 pm this post

I am trying to figure out why were are getting sporadic disk command aborts on each of our ESX hosts. In the Veeam Monitor, if I choose the Datacenter and look at the Disk tab, I see numerical values for the disk command abort counter for each of the Datastores. However, if I go into one of the Datastores in Veeam Monitor and go into the Disk Issues tab, all VMs are listed however there are no aborts listed during the same time period. Any reason for that?

Post by **Vitaliy S.** » Oct 13, 2011 9:13 pm this post

Hi Dave, It would help if you could post some screenshots, as I don't have any disk command aborts in my lab, I am a lucky guy

Thanks!

Daveyd · Post by **Daveyd** » Oct 17, 2011 6:05 pm this post

Here is the past day viewing the Disk aborts at the Host level...

And at the Datastore level

Daveyd · Post by **Daveyd** » Oct 17, 2011 6:41 pm this post

Also, I think the numbers are a little off on this screenshot. My maximum range from 6600- 9281ms but the axis numbers show 10s of thousands??

Post by **Vitaliy S.** » Oct 18, 2011 10:31 am this post

Dave, the only reason I can think of is that you had this host previously connected to this datastore and right now you've removed this datastore from hosts storage inventory.

Basically, if you choose particular datastore, it will scan existing connection to the hosts and will display the graph based on results you currently have, in other words historical data will be retained within a host object, not a datastore object.

As regards Disk I/O tab, then datastore latency is displayed as a stacked graph, because arithmetic mean cannot be used as an indicator of an average disk latency combined from multiple hosts.

Hope this makes sense.

Daveyd · Post by **Daveyd** » Oct 21, 2011 6:53 pm this post

When I choose to run a Trend Report to view all SCSI aborts that happened during the past week,the report only shows daily totals. I wanted to see at what times during the day, for a week period, I was receiving the most aborts. Is that possible?

Also, any channce a real time zoom feature will be incorporated? I am demoing other products and I LOVE the ability that some have to look at a graph, that has a day or weeks worth of data, hold down my left mouse button and highlight a section of the graph and it will zoom into that specific time frame.

Post by **Vitaliy S.** » Oct 22, 2011 12:24 pm this post

Hi Dave,

Currently, this is not possible. Could you please help me to understand the use case for that? Will the corresponding alarm for SCSI aborts (with the exact time and number of aborts) be what you're looking for?

Real time zoom would be possible with a limited set of counters, because you might know that Veeam Monitor stores more than 60 performance metrics for each and every object, so keeping real-time data for every object and every counter might make your database unmanageable.

I would love to hear what counters you're mostly interested in, so we could incorporate this functionality to the next releases.

Thanks!

joergr · Post by **joergr** » Oct 24, 2011 1:17 pm this post

Wow, Dave, that topic seems interesting to me. please give details about the esxi version and patch level, san system/vendor/firmware, the connection, the hba´s (doesn't matter if iscsi or fc, just provide vendor, model and firmware), the switches used for san (doesn't matter if iscsi or fc, just provide vendor, model and firmware).

If there is no HBA, and you might use SW iSCSI or SW FCOE, please also report the card and the driver (for example intel x520, dual, driver 2.0.84.9).

Please also check out if the hosts were encountering cpu pitches or very high network load during the storage latency/blackout phases.

If you don´t have everything at your hand, please at least tell me what you know in mind.

best regards
Joerg

Daveyd · Post by **Daveyd** » Oct 24, 2011 3:59 pm this post

We are currently running ESX 4.0U3. Each server is a HP DL380 G6 with 2x HP FC1142SR 4Gb FC HBAs. They are all on the latest BIOS, 2.15 and latest EFI, 2.2.0. The HBAs are using VMware's driver 8.02.01-k1-vmw48-4vmw. We have a pair of EMC CX4s (1 in Prod, 1 in DR) running FLARE30 I believe. We are also using EMC RecoverPoint appliances.

During the events there are no abnormal spikes in CPU or network utilization on the Hosts. Working with EMC, we think we have isolated the issue to RecoverPoint. When I see scsi aborts in the Veeam monitor, I see cooresponding disconnect alerts on the RecoverPoint appliances.

Vitaliy, I was looking for a trend report that would show me hourly data for each day of the week, for 1 week. I wanted to be able to look at a report that showed me, on Monday between 8-9am I have xxx scsi aborts and 7-8pm I had xxx aborts, On Tuesday between 4-5pm I had xxx aborts....and so on for an entire week. That would be a nice report to see if issues are happening between specific hours every day or specific hours every other day, etc. The trend report that the Monitor produces now just shows me that I had xxx aborts that day.

The zoom feature....On one product I am demoing, SolarWinds Virtualization Manager, I can take an active graph of say 7 days worth of a specific metric, say CPU utilization and zoom in on that 37day graph down to a specific hour on a specific day without having to open new graphs.

joergr · Post by **joergr** » Oct 24, 2011 4:55 pm this post

Hi Dave,

thanks. Do you see these aborts when replication takes place or randomly? I barely remember a case years ago where aborts where occurring during replication, but on the dr site...let me google it....

...found, this is the old thread http://communities.vmware.com/thread/78 ... 5&tstart=0
Maybe not your scenario but all that comes to my head actually, i am much more familiar with Equallogic

Best regards,
Joerg

Post by **Vitaliy S.** » Oct 24, 2011 7:14 pm this post

I see... but would you like to have this ability (real-time zoom) for all performance metrics we have or only for a few of them (like CPU Usage, Memory Usage etc.)?

Daveyd · Post by **Daveyd** » Oct 25, 2011 1:37 pm this post

Vitaliy S. wrote:I see... but would you like to have this ability (real-time zoom) for all performance metrics we have or only for a few of them (like CPU Usage, Memory Usage etc.)?

As a customer, the more the better...as a developer, whatever is realistic

Daveyd · Post by **Daveyd** » Oct 26, 2011 9:32 pm this post

Another feature request...I ran a HTML report on all my VMs to show their read and write rates for my VMs during a specific time period. While it did produce a nice report, it would be nice if would be a sortable report. I would have like to seen the VMs with the highest MBps listed first then the rese in decending order. The report shows all the VMs but in no particular order and its hard to look at 50 VMs and see which ones hit the highest throughput during that timeframe since its list both KBps and MBps

Post by **Vitaliy S.** » Oct 26, 2011 9:52 pm this post

Makes perfect sense, I agree. I hope you will be really impressed with our new performance reports that will be shipped with v6. Anyway, thanks for the feedback!

R&D Forums

Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Re: Disk command aborts

Who is online