Comprehensive data protection for all workloads
Post Reply
m.novelli
Veeam ProPartner
Posts: 566
Liked: 103 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

The word from Gostev on network traffic corruption

Post by m.novelli » 1 person likes this post

Now I want know who is the NIC vendor Gostev is talking about... Maybe Broadcom? 8)

"Traffic corruption is typically caused by malfunctioning network equipment - for example, a router with a bad RAM module with a sticky bit. And the more hops network traffic has to pass – such as in case of backup or replication over WAN – the bigger the chance of corruption is. Now, as you probably already know, TCP specification does include a checksum to mitigate the risks of errors being introduced into a TCP segment during its travel across the network, and this is what should be catching any data corruption - at least in theory. However, during our investigation, we have found that NICs from one very well-known hardware vendor were passing TCP packets with corrupted payload onto the OS, instead of rejecting them - despite those packets having invalid checksums! This was absolutely shocking finding for the team (less shocking for me, as I have already heard all sorts of bad feedback on this vendor’s networking hardware before – including on Veeam forums)."

Marco
Jamie Pert
Enthusiast
Posts: 68
Liked: 2 times
Joined: Jun 14, 2012 10:56 am
Full Name: Jamie Pert
Location: twitter.com/jam1epert
Contact:

Re: THE WORD FROM GOSTEV

Post by Jamie Pert »

name and shame for the benefit of the Veeam community who may be currently diagnosing a problem where this is the cause!
@jam1epert on Twitter
Gostev
Chief Product Officer
Posts: 31812
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: THE WORD FROM GOSTEV

Post by Gostev »

Marco, yep - you guessed it. But, was not it easy to guess? ;)
m.novelli
Veeam ProPartner
Posts: 566
Liked: 103 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: The word from Gostev on network traffic corruption

Post by m.novelli »

That's a shame... we sell Dell hardware and all server are equipped with Broadcom NIC
Some months ago we spent some time trying to get Broadcom NICs working with jumbo frame on an iSCSI network... without luck
Enabling jumbo frames got freezing ESXi hypervisor

Marco
Gostev
Chief Product Officer
Posts: 31812
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: The word from Gostev on network traffic corruption

Post by Gostev »

Well, we cannot know if all Broadcom NIC models are affected by this issue, or just certain ones - those which happened to be used in that specific environment. But I don't know what were those models anyway.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: The word from Gostev on network traffic corruption

Post by tsightler » 1 person likes this post

Over the years there have been a myriad of known issues with various NICs and TCP offload features such as TSO, which move the segmentation (and thus checksumming) down to the hardware layer. It's likely that the problem was specific to some combination of nic hardware/firmware, hypervisor version, and perhaps even specific OS/driver combination.

For example, quite a few years ago on our Dell R610s we saw major issues with TCP checksums when using the VMXNET3 driver, but not with E1000, but this only occurred on Linux systems. We could not reproduce the behavior on Windows boxes, and we could resolve the issue by disabling TSO within the Linux OS using ethtool. That being said, the problem was eventually corrected with a firmware update to the onboard NIC.
dellock6
VeeaMVP
Posts: 6166
Liked: 1971 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: The word from Gostev on network traffic corruption

Post by dellock6 » 1 person likes this post

I've seen recently dramatic improvements between firmware releases on broadcom nics, and as Tom said even weird problems literally disappear applying those firmware. No real way to narrow results to specific ESXi/VM/nic versions. What I can say for sure is Intel chipset (in vSphere at least) are really more stable and predictable ones.
Sadly also HP or SuperMicro (the servers we use in our datacenter) come with broadcom onboard...

Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Post Reply

Who is online

Users browsing this forum: marcio.defreitas, merrill.davis and 123 guests