-
- Veeam ProPartner
- Posts: 566
- Liked: 103 times
- Joined: Dec 29, 2009 12:48 pm
- Full Name: Marco Novelli
- Location: Asti - Italy
- Contact:
The word from Gostev on network traffic corruption
Now I want know who is the NIC vendor Gostev is talking about... Maybe Broadcom?
"Traffic corruption is typically caused by malfunctioning network equipment - for example, a router with a bad RAM module with a sticky bit. And the more hops network traffic has to pass – such as in case of backup or replication over WAN – the bigger the chance of corruption is. Now, as you probably already know, TCP specification does include a checksum to mitigate the risks of errors being introduced into a TCP segment during its travel across the network, and this is what should be catching any data corruption - at least in theory. However, during our investigation, we have found that NICs from one very well-known hardware vendor were passing TCP packets with corrupted payload onto the OS, instead of rejecting them - despite those packets having invalid checksums! This was absolutely shocking finding for the team (less shocking for me, as I have already heard all sorts of bad feedback on this vendor’s networking hardware before – including on Veeam forums)."
Marco
"Traffic corruption is typically caused by malfunctioning network equipment - for example, a router with a bad RAM module with a sticky bit. And the more hops network traffic has to pass – such as in case of backup or replication over WAN – the bigger the chance of corruption is. Now, as you probably already know, TCP specification does include a checksum to mitigate the risks of errors being introduced into a TCP segment during its travel across the network, and this is what should be catching any data corruption - at least in theory. However, during our investigation, we have found that NICs from one very well-known hardware vendor were passing TCP packets with corrupted payload onto the OS, instead of rejecting them - despite those packets having invalid checksums! This was absolutely shocking finding for the team (less shocking for me, as I have already heard all sorts of bad feedback on this vendor’s networking hardware before – including on Veeam forums)."
Marco
-
- Enthusiast
- Posts: 68
- Liked: 2 times
- Joined: Jun 14, 2012 10:56 am
- Full Name: Jamie Pert
- Location: twitter.com/jam1epert
- Contact:
Re: THE WORD FROM GOSTEV
name and shame for the benefit of the Veeam community who may be currently diagnosing a problem where this is the cause!
@jam1epert on Twitter
-
- Chief Product Officer
- Posts: 31812
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: THE WORD FROM GOSTEV
Marco, yep - you guessed it. But, was not it easy to guess?
-
- Veeam ProPartner
- Posts: 566
- Liked: 103 times
- Joined: Dec 29, 2009 12:48 pm
- Full Name: Marco Novelli
- Location: Asti - Italy
- Contact:
Re: The word from Gostev on network traffic corruption
That's a shame... we sell Dell hardware and all server are equipped with Broadcom NIC
Some months ago we spent some time trying to get Broadcom NICs working with jumbo frame on an iSCSI network... without luck
Enabling jumbo frames got freezing ESXi hypervisor
Marco
Some months ago we spent some time trying to get Broadcom NICs working with jumbo frame on an iSCSI network... without luck
Enabling jumbo frames got freezing ESXi hypervisor
Marco
-
- Chief Product Officer
- Posts: 31812
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: The word from Gostev on network traffic corruption
Well, we cannot know if all Broadcom NIC models are affected by this issue, or just certain ones - those which happened to be used in that specific environment. But I don't know what were those models anyway.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: The word from Gostev on network traffic corruption
Over the years there have been a myriad of known issues with various NICs and TCP offload features such as TSO, which move the segmentation (and thus checksumming) down to the hardware layer. It's likely that the problem was specific to some combination of nic hardware/firmware, hypervisor version, and perhaps even specific OS/driver combination.
For example, quite a few years ago on our Dell R610s we saw major issues with TCP checksums when using the VMXNET3 driver, but not with E1000, but this only occurred on Linux systems. We could not reproduce the behavior on Windows boxes, and we could resolve the issue by disabling TSO within the Linux OS using ethtool. That being said, the problem was eventually corrected with a firmware update to the onboard NIC.
For example, quite a few years ago on our Dell R610s we saw major issues with TCP checksums when using the VMXNET3 driver, but not with E1000, but this only occurred on Linux systems. We could not reproduce the behavior on Windows boxes, and we could resolve the issue by disabling TSO within the Linux OS using ethtool. That being said, the problem was eventually corrected with a firmware update to the onboard NIC.
-
- VeeaMVP
- Posts: 6166
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: The word from Gostev on network traffic corruption
I've seen recently dramatic improvements between firmware releases on broadcom nics, and as Tom said even weird problems literally disappear applying those firmware. No real way to narrow results to specific ESXi/VM/nic versions. What I can say for sure is Intel chipset (in vSphere at least) are really more stable and predictable ones.
Sadly also HP or SuperMicro (the servers we use in our datacenter) come with broadcom onboard...
Luca.
Sadly also HP or SuperMicro (the servers we use in our datacenter) come with broadcom onboard...
Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Who is online
Users browsing this forum: marcio.defreitas, merrill.davis and 123 guests