-
- Veteran
- Posts: 315
- Liked: 38 times
- Joined: Sep 29, 2010 3:37 pm
- Contact:
Retransmit Errors
I recently migrated to an Equallogic 6100. Since then we have been receiving Retransmit alerts from SAN HQ. Usually its just over the operational limits at 1 - 1.5%. They only occur during our backup window. We are running vSphere 5 and Veeam v6. We run two backup jobs using Hot add mode (one back local one to remote site) both use the same proxy.
Has anyone see this type of behavior before? Any suggestions?
I currently have a ticket open with Dell.
Has anyone see this type of behavior before? Any suggestions?
I currently have a ticket open with Dell.
-
- Veeam Software
- Posts: 215
- Liked: 26 times
- Joined: Oct 28, 2011 3:26 pm
- Full Name: James Moots
- Location: Ohio, United States
- Contact:
Re: Retransmit Errors
I had 1 - 2% retransmits on an Equallogic group in my prior position. I worked with Dell and they pointed at my network switches. Ultimately, they said the 1 - 2% was not bad enough to warrant replacing the switches but that the only way to get rid of them was to invest in an iSCSI optimized switch.lobo519 wrote:I recently migrated to an Equallogic 6100. Since then we have been receiving Retransmit alerts from SAN HQ. Usually its just over the operational limits at 1 - 1.5%. They only occur during our backup window. We are running vSphere 5 and Veeam v6. We run two backup jobs using Hot add mode (one back local one to remote site) both use the same proxy.
Has anyone see this type of behavior before? Any suggestions?
I currently have a ticket open with Dell.
What kind of switches are you running? I know a company that had some really nice high dollar switches and tossed them because of iSCSI retransmits. They ended up buying some HP switches with really impressive port buffers and they're now forever sold on HP switches I'm switch agnostic, but there are different switches that perform better under certain circumstances.
I suspect your re-transmits are occruing because of the sustained traffic on those ports and the switches inability to keep up. Without looking at the entire set-up, I'm just guessing but with the information you've provided it's my best guess.
-
- Veteran
- Posts: 315
- Liked: 38 times
- Joined: Sep 29, 2010 3:37 pm
- Contact:
Re: Retransmit Errors
Interesting... We are using Force10 switches recommended by Dell for use with Equallogic. If they tell me its the switches I am going to have some words with Dell. They also mentioned that they may want to turn off jumbo frames and see if the retransmits stop. Haven't gotten to that point yet.
-
- Veeam Software
- Posts: 215
- Liked: 26 times
- Joined: Oct 28, 2011 3:26 pm
- Full Name: James Moots
- Location: Ohio, United States
- Contact:
Re: Retransmit Errors
Those are nice switches. I had Extreme switches. Maybe just some tweaking...
My retransmits were the same with/without Jumbo Frames on. Mine occurred in sync with my every 5 minute Equallogic replication traffic. I would liken that to Veeam replication traffic. Large amounts of traffic at full speed and the switch seemed to struggle to keep up. It kept up, but I must have been right on the verge of its capability.
One important thing to note... The re-transmits aren't causing a problem, right? I chased the problem for a long time and actually left my job before nailing it down. I wondered to myself if SanHQ was complaining about something that wasn't really a problem. My Dell/Equallogic tech wouldn't commit to 1-2% retransmits being abnormal. Did your say it was a problem or did he ask if it was causing any trouble?
My retransmits were the same with/without Jumbo Frames on. Mine occurred in sync with my every 5 minute Equallogic replication traffic. I would liken that to Veeam replication traffic. Large amounts of traffic at full speed and the switch seemed to struggle to keep up. It kept up, but I must have been right on the verge of its capability.
One important thing to note... The re-transmits aren't causing a problem, right? I chased the problem for a long time and actually left my job before nailing it down. I wondered to myself if SanHQ was complaining about something that wasn't really a problem. My Dell/Equallogic tech wouldn't commit to 1-2% retransmits being abnormal. Did your say it was a problem or did he ask if it was causing any trouble?
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Retransmit Errors
I also had Dell Equallogic equipment at my previous job and SANHQ would complain about >1% during periods of very high loads, such as during backups, although we could reproduce them with artificial loads such as a mutli-threaded sequential read benchmark. Are you using software iSCSI? If so, you need to make sure that you have enabled flow control on the physical NICs that are handling the traffic.
That being said, I was never able to completely eliminate the warnings, even with full flow control throughout the network. The problem seemed to be more prominent with software iSCSI, as opposed to systems that had hardware iSCSI HBA's. The retransmit rate was almost always just barely over the threshold, typically 1.2% or so. We saw this at multiple sites with multiple switch models from Cisco and Brocade, mostly Cisco 3750 and 4948 switches but two sites had Brocade FCX stackables and one used a Brocade SX chassis switch. I was never really convinced it was a serious issue and thought that SANHQ was a little to sensitive in reporting this issue.
That being said, I was never able to completely eliminate the warnings, even with full flow control throughout the network. The problem seemed to be more prominent with software iSCSI, as opposed to systems that had hardware iSCSI HBA's. The retransmit rate was almost always just barely over the threshold, typically 1.2% or so. We saw this at multiple sites with multiple switch models from Cisco and Brocade, mostly Cisco 3750 and 4948 switches but two sites had Brocade FCX stackables and one used a Brocade SX chassis switch. I was never really convinced it was a serious issue and thought that SANHQ was a little to sensitive in reporting this issue.
-
- Veteran
- Posts: 315
- Liked: 38 times
- Joined: Sep 29, 2010 3:37 pm
- Contact:
Re: Retransmit Errors
Thanks for the info!
It doesn't seem to be causing any issues that I can see. I was wondering if it wasn't just too sensitive myself. Any idea if you can adjust the threshold?
We are using software iSCSI and Flow control is enabled on the switches and I believe is the default for ESXi (will double check now).
It doesn't seem to be causing any issues that I can see. I was wondering if it wasn't just too sensitive myself. Any idea if you can adjust the threshold?
We are using software iSCSI and Flow control is enabled on the switches and I believe is the default for ESXi (will double check now).
-
- Veteran
- Posts: 315
- Liked: 38 times
- Joined: Sep 29, 2010 3:37 pm
- Contact:
Re: Retransmit Errors
I have been working with Dell on this problem. They insisted that because I had a MD3000i on the network as my Equallogic that this was causing the re transmit errors. I went ahead and separated the MD on to another network and vnic/vswitch. We still got the re transmit errors - but again, only during backups.
-
- Veteran
- Posts: 315
- Liked: 38 times
- Joined: Sep 29, 2010 3:37 pm
- Contact:
Re: Retransmit Errors
Apparently level2 and 3 at Equallogic support have never heard of the Virtual appliance mode backup.
I continue to see retransmit errors.
Again it only occurs during backup....
There were a few posts above but has anyone else seen this issue? As others have mentioned, I'm not so sure that this is actually a problem.
Thanks!
I continue to see retransmit errors.
Again it only occurs during backup....
There were a few posts above but has anyone else seen this issue? As others have mentioned, I'm not so sure that this is actually a problem.
Thanks!
-
- Chief Product Officer
- Posts: 31803
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Retransmit Errors
The other, more technical name for the virtual appliance mode backup is "hot add" backup.
-
- Veteran
- Posts: 315
- Liked: 38 times
- Joined: Sep 29, 2010 3:37 pm
- Contact:
Re: Retransmit Errors
That's actually the term I used with them.
-
- Chief Product Officer
- Posts: 31803
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Retransmit Errors
Considering that hot add backup is nothing more than reading the data from the virtual disk from within the VM that this virtual disk is attached to, this does not present any special workload on the storage infrastructure whatsoever. Obviously, this is exactly what every VM is doing when it boots up and runs.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Retransmit Errors
If you really want to see if you can reproduce this without backups, you should run some type of "read-only" benchmark against the disks for 30 minutes or so. Stress the disks out and see what happens. In our case we only received these alarms during backup because it was the only time we had sustained read loads that were 100's of MB/s, but if we faked up a benchmark that did the same thing, we could reproduce the error. Once again, you can certainly keep working this in the hope of eventually getting an answer from Dell, but it's only possible impact is performance, and if you're happy with the performance you are seeing, then a 1% retry rate is unlikely to make much of a difference (perhaps 1% difference).
-
- Veeam Software
- Posts: 215
- Liked: 26 times
- Joined: Oct 28, 2011 3:26 pm
- Full Name: James Moots
- Location: Ohio, United States
- Contact:
Re: Retransmit Errors
Switch Port buffer too small. Throw away old switches. Get new switches with larger port buffers
-
- Veteran
- Posts: 315
- Liked: 38 times
- Joined: Sep 29, 2010 3:37 pm
- Contact:
Re: Retransmit Errors
FYI for anyone that cares.
Dell support had me disable Delayed ACK on the ESXi servers. seems to have resolved the issue.
Dell support had me disable Delayed ACK on the ESXi servers. seems to have resolved the issue.
-
- Veeam Software
- Posts: 215
- Liked: 26 times
- Joined: Oct 28, 2011 3:26 pm
- Full Name: James Moots
- Location: Ohio, United States
- Contact:
Re: Retransmit Errors
I tried that, too. I hope it actually works for you. It made no difference in my environment.lobo519 wrote:FYI for anyone that cares.
Dell support had me disable Delayed ACK on the ESXi servers. seems to have resolved the issue.
-
- Veteran
- Posts: 315
- Liked: 38 times
- Joined: Sep 29, 2010 3:37 pm
- Contact:
Re: Retransmit Errors
One other thing that I changed was enabling the Managment Network in the group manager. We only have one array so I didn't think it was unnecessary but they had me enable it because according to support, the array will still try to send ISCSI traffic to the dedicated management port if the Management network isn't configured in the group manager.
-
- Enthusiast
- Posts: 44
- Liked: 10 times
- Joined: Sep 27, 2011 5:11 pm
- Full Name: Todd Leavitt
- Contact:
Re: Retransmit Errors
Did anyone have any words on the follow up to this? I actually have a different issue. My Veeam backups don't trigged TCP retransmits at all however my Equalogic array replication to my DR site over a 30 meg point to point fiber spikes it at 5%+ when its running. So I don't think Veeam has anything to do with it. I've also spent a lot of time on the phone with Dell and they just tell me its because Im going from 3 bonded gigabit nics to a 30 meg connection (lame answer). I also have Force 10 switch's and have configured my Equalogic (ACK) per their white papers.
Who is online
Users browsing this forum: MaartenA and 66 guests