Discussions specific to the VMware vSphere hypervisor
Post Reply
lobo519
Expert
Posts: 298
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Retransmit Errors

Post by lobo519 » Feb 27, 2012 1:08 am

I recently migrated to an Equallogic 6100. Since then we have been receiving Retransmit alerts from SAN HQ. Usually its just over the operational limits at 1 - 1.5%. They only occur during our backup window. We are running vSphere 5 and Veeam v6. We run two backup jobs using Hot add mode (one back local one to remote site) both use the same proxy.

Has anyone see this type of behavior before? Any suggestions?

I currently have a ticket open with Dell.

Jfmoots
Veeam Software
Posts: 214
Liked: 26 times
Joined: Oct 28, 2011 3:26 pm
Full Name: James Moots
Location: Ohio, United States
Contact:

Re: Retransmit Errors

Post by Jfmoots » Feb 27, 2012 1:31 am

lobo519 wrote:I recently migrated to an Equallogic 6100. Since then we have been receiving Retransmit alerts from SAN HQ. Usually its just over the operational limits at 1 - 1.5%. They only occur during our backup window. We are running vSphere 5 and Veeam v6. We run two backup jobs using Hot add mode (one back local one to remote site) both use the same proxy.

Has anyone see this type of behavior before? Any suggestions?

I currently have a ticket open with Dell.
I had 1 - 2% retransmits on an Equallogic group in my prior position. I worked with Dell and they pointed at my network switches. Ultimately, they said the 1 - 2% was not bad enough to warrant replacing the switches but that the only way to get rid of them was to invest in an iSCSI optimized switch.

What kind of switches are you running? I know a company that had some really nice high dollar switches and tossed them because of iSCSI retransmits. They ended up buying some HP switches with really impressive port buffers and they're now forever sold on HP switches :) I'm switch agnostic, but there are different switches that perform better under certain circumstances.

I suspect your re-transmits are occruing because of the sustained traffic on those ports and the switches inability to keep up. Without looking at the entire set-up, I'm just guessing but with the information you've provided it's my best guess.

lobo519
Expert
Posts: 298
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Re: Retransmit Errors

Post by lobo519 » Feb 27, 2012 2:59 am

Interesting... We are using Force10 switches recommended by Dell for use with Equallogic. If they tell me its the switches I am going to have some words with Dell. They also mentioned that they may want to turn off jumbo frames and see if the retransmits stop. Haven't gotten to that point yet.

Jfmoots
Veeam Software
Posts: 214
Liked: 26 times
Joined: Oct 28, 2011 3:26 pm
Full Name: James Moots
Location: Ohio, United States
Contact:

Re: Retransmit Errors

Post by Jfmoots » Feb 27, 2012 3:14 am

Those are nice switches. I had Extreme switches. Maybe just some tweaking...

My retransmits were the same with/without Jumbo Frames on. Mine occurred in sync with my every 5 minute Equallogic replication traffic. I would liken that to Veeam replication traffic. Large amounts of traffic at full speed and the switch seemed to struggle to keep up. It kept up, but I must have been right on the verge of its capability.

One important thing to note... The re-transmits aren't causing a problem, right? I chased the problem for a long time and actually left my job before nailing it down. I wondered to myself if SanHQ was complaining about something that wasn't really a problem. My Dell/Equallogic tech wouldn't commit to 1-2% retransmits being abnormal. Did your say it was a problem or did he ask if it was causing any trouble?

tsightler
VP, Product Management
Posts: 5421
Liked: 2243 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Retransmit Errors

Post by tsightler » Feb 27, 2012 4:39 am

I also had Dell Equallogic equipment at my previous job and SANHQ would complain about >1% during periods of very high loads, such as during backups, although we could reproduce them with artificial loads such as a mutli-threaded sequential read benchmark. Are you using software iSCSI? If so, you need to make sure that you have enabled flow control on the physical NICs that are handling the traffic.

That being said, I was never able to completely eliminate the warnings, even with full flow control throughout the network. The problem seemed to be more prominent with software iSCSI, as opposed to systems that had hardware iSCSI HBA's. The retransmit rate was almost always just barely over the threshold, typically 1.2% or so. We saw this at multiple sites with multiple switch models from Cisco and Brocade, mostly Cisco 3750 and 4948 switches but two sites had Brocade FCX stackables and one used a Brocade SX chassis switch. I was never really convinced it was a serious issue and thought that SANHQ was a little to sensitive in reporting this issue.

lobo519
Expert
Posts: 298
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Re: Retransmit Errors

Post by lobo519 » Feb 27, 2012 1:37 pm

Thanks for the info!

It doesn't seem to be causing any issues that I can see. I was wondering if it wasn't just too sensitive myself. Any idea if you can adjust the threshold?

We are using software iSCSI and Flow control is enabled on the switches and I believe is the default for ESXi (will double check now).

lobo519
Expert
Posts: 298
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Re: Retransmit Errors

Post by lobo519 » Mar 06, 2012 6:49 pm

I have been working with Dell on this problem. They insisted that because I had a MD3000i on the network as my Equallogic that this was causing the re transmit errors. I went ahead and separated the MD on to another network and vnic/vswitch. We still got the re transmit errors - but again, only during backups.

lobo519
Expert
Posts: 298
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Re: Retransmit Errors

Post by lobo519 » Mar 12, 2012 3:14 pm

Apparently level2 and 3 at Equallogic support have never heard of the Virtual appliance mode backup.

I continue to see retransmit errors.

Again it only occurs during backup....

There were a few posts above but has anyone else seen this issue? As others have mentioned, I'm not so sure that this is actually a problem.

Thanks!

Gostev
SVP, Product Management
Posts: 24804
Liked: 3566 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Retransmit Errors

Post by Gostev » Mar 13, 2012 5:58 pm

The other, more technical name for the virtual appliance mode backup is "hot add" backup.

lobo519
Expert
Posts: 298
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Re: Retransmit Errors

Post by lobo519 » Mar 13, 2012 6:01 pm

That's actually the term I used with them. :wink:

Gostev
SVP, Product Management
Posts: 24804
Liked: 3566 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Retransmit Errors

Post by Gostev » Mar 13, 2012 6:22 pm

Considering that hot add backup is nothing more than reading the data from the virtual disk from within the VM that this virtual disk is attached to, this does not present any special workload on the storage infrastructure whatsoever. Obviously, this is exactly what every VM is doing when it boots up and runs.

tsightler
VP, Product Management
Posts: 5421
Liked: 2243 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Retransmit Errors

Post by tsightler » Mar 13, 2012 11:35 pm

If you really want to see if you can reproduce this without backups, you should run some type of "read-only" benchmark against the disks for 30 minutes or so. Stress the disks out and see what happens. In our case we only received these alarms during backup because it was the only time we had sustained read loads that were 100's of MB/s, but if we faked up a benchmark that did the same thing, we could reproduce the error. Once again, you can certainly keep working this in the hope of eventually getting an answer from Dell, but it's only possible impact is performance, and if you're happy with the performance you are seeing, then a 1% retry rate is unlikely to make much of a difference (perhaps 1% difference).

Jfmoots
Veeam Software
Posts: 214
Liked: 26 times
Joined: Oct 28, 2011 3:26 pm
Full Name: James Moots
Location: Ohio, United States
Contact:

Re: Retransmit Errors

Post by Jfmoots » Mar 14, 2012 12:12 am

Switch Port buffer too small. Throw away old switches. Get new switches with larger port buffers :mrgreen:

lobo519
Expert
Posts: 298
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Re: Retransmit Errors

Post by lobo519 » Mar 14, 2012 3:30 pm

FYI for anyone that cares.

Dell support had me disable Delayed ACK on the ESXi servers. seems to have resolved the issue.

Jfmoots
Veeam Software
Posts: 214
Liked: 26 times
Joined: Oct 28, 2011 3:26 pm
Full Name: James Moots
Location: Ohio, United States
Contact:

Re: Retransmit Errors

Post by Jfmoots » Mar 14, 2012 4:47 pm

lobo519 wrote:FYI for anyone that cares.

Dell support had me disable Delayed ACK on the ESXi servers. seems to have resolved the issue.
I tried that, too. I hope it actually works for you. It made no difference in my environment.

lobo519
Expert
Posts: 298
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Re: Retransmit Errors

Post by lobo519 » Mar 14, 2012 4:58 pm

One other thing that I changed was enabling the Managment Network in the group manager. We only have one array so I didn't think it was unnecessary but they had me enable it because according to support, the array will still try to send ISCSI traffic to the dedicated management port if the Management network isn't configured in the group manager.

Starman
Enthusiast
Posts: 44
Liked: 10 times
Joined: Sep 27, 2011 5:11 pm
Full Name: Todd Leavitt
Contact:

Re: Retransmit Errors

Post by Starman » Oct 29, 2013 5:56 pm

Did anyone have any words on the follow up to this? I actually have a different issue. My Veeam backups don't trigged TCP retransmits at all however my Equalogic array replication to my DR site over a 30 meg point to point fiber spikes it at 5%+ when its running. So I don't think Veeam has anything to do with it. I've also spent a lot of time on the phone with Dell and they just tell me its because Im going from 3 bonded gigabit nics to a 30 meg connection (lame answer). I also have Force 10 switch's and have configured my Equalogic (ACK) per their white papers.

Post Reply

Who is online

Users browsing this forum: No registered users and 21 guests