-
- Novice
- Posts: 8
- Liked: never
- Joined: Jun 11, 2010 9:28 pm
- Full Name: Peter Mott
- Location: New Zealand
- Contact:
Win2012 Proxy can cause backup corruption
We have identified that the following configuration causes backups to be corrupted, with Veeam indicating success for the jobs (no warnings either).
Veeam Standard Edition 7.0.0.871
Veeam proxy is Windows Server 2012 R2 running on ESXi 5.0 with E1000E network adapter and windows NIC settings at defaults (TCP offloading enabled).
The cause and workaround for this issue is documented here:
http://kb.vmware.com/selfservice/micros ... Id=2058692
Specifically, to correct this issue we set the following on the proxy servers NIC:
IPv4 Checksum Offload = disabled
Large Send Offload (IPv4) = disabled
TCP Checksum Offload = disabled
We restarted the proxy server after making the above changes.
We recommend checking the integrity of all backups that your Windows 2012 proxy could have participated in if you are running ESXi 5.0 or ESXi 5.1
c:\program files\veeam\backup and replication\backup\veeam.backup.validator.exe /backup:"job-name-here"
If you find a corrupted backup, select the job under Backup Disk and right click "remove from backups". This forces a break in the chain (isolating the corrupted backups). Start the backup which will automatically be a full. Redo the verification check to ensure you have a good backup. Do this for every job. If you have many jobs, its going to take a long time. The old retention points are still there, but you can no longer trust them.
Our experience is that the nature of the corruption prevented a full virtual machine restore. But individual file restores and even instant restore did work. There is no guarantee that just because a VM boots, that it does not have corruption on the disk somewhere. The validator is likely the only diagnostic you can safely trust. We don't run Enterprise Edition, so have not tested SureBackup. However we understand it uses the same technology as instant restore. My suspicion is SureBackup would not alert you to this hidden and very significant problem.
We discovered this issue doing routine test restores. We did not have to wait for a genuine need for a restore to discover that the backup was unusable. For the sake of your career, I recommend everybody who reads this post carefully checks their configuration and if vulnerable, take urgent action.
Thank you for your time.
Veeam Standard Edition 7.0.0.871
Veeam proxy is Windows Server 2012 R2 running on ESXi 5.0 with E1000E network adapter and windows NIC settings at defaults (TCP offloading enabled).
The cause and workaround for this issue is documented here:
http://kb.vmware.com/selfservice/micros ... Id=2058692
Specifically, to correct this issue we set the following on the proxy servers NIC:
IPv4 Checksum Offload = disabled
Large Send Offload (IPv4) = disabled
TCP Checksum Offload = disabled
We restarted the proxy server after making the above changes.
We recommend checking the integrity of all backups that your Windows 2012 proxy could have participated in if you are running ESXi 5.0 or ESXi 5.1
c:\program files\veeam\backup and replication\backup\veeam.backup.validator.exe /backup:"job-name-here"
If you find a corrupted backup, select the job under Backup Disk and right click "remove from backups". This forces a break in the chain (isolating the corrupted backups). Start the backup which will automatically be a full. Redo the verification check to ensure you have a good backup. Do this for every job. If you have many jobs, its going to take a long time. The old retention points are still there, but you can no longer trust them.
Our experience is that the nature of the corruption prevented a full virtual machine restore. But individual file restores and even instant restore did work. There is no guarantee that just because a VM boots, that it does not have corruption on the disk somewhere. The validator is likely the only diagnostic you can safely trust. We don't run Enterprise Edition, so have not tested SureBackup. However we understand it uses the same technology as instant restore. My suspicion is SureBackup would not alert you to this hidden and very significant problem.
We discovered this issue doing routine test restores. We did not have to wait for a genuine need for a restore to discover that the backup was unusable. For the sake of your career, I recommend everybody who reads this post carefully checks their configuration and if vulnerable, take urgent action.
Thank you for your time.
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Win2012 Proxy can cause backup corruption
Peter, thanks for sharing, first of all. The data corruption issue involving Windows Server 2012 and vSphere 5.0/5.1 has been widely discussed and highlighted in the weekly community digest almost a year ago. I will take the liberty to quote it here for those who has joined later:
All that said, could you please tell what kind of target repository do you have (the one that stored corrupt backups) and also check whether the 'Use multiple upload streams per job' check box is selected in the Traffic Throttling dialog? This would help us identify why are you still affected by the said issue (i.e. whether the mentioned network traffic verification is enabled).
Due to other issues that the default network adapter has (on vSphere 5.5 as well), perhaps VMXNET3 looks a safer choice these days.Gostev wrote:POSSIBLE DATA CORRUPTION ISSUE, MUST READ! Sorry for starting this way, but this is a big deal, and I wanted to make sure you catch this when scanning through the weekend spam. Apparently, VMware issued a support KB article on this issue a few weeks ago, but it totally flew under the radar (I have not seen a single tweet or blog about this). In short, any data flowing through the VM network stack may get corrupted - including file copies, remote clients interactions with databases, any client-server or multi-tiered apps.
The scariest part is that the scope of this issue is very significant. In fact, we might as well be facing the biggest data corruption issue in the history of virtualization. The issue may occur on any Windows Server 2012 VM with the default (E1000E) vNIC adaptor running on ESXi 5.0 and 5.1, which makes it probably around 20% of all VMs in the world. The easiest workaround is to change the vNIC type to VMXNET3 or E1000 (you should be able to apply this change in bulk with a PowerCLI script), or disable TCP Segmentation Offload in the guest operating system. Keep in mind that changing vNIC type may result in change of DHCP address, because the OS will see that as the new network adapter, so this may affect some applications. As such, disabling TCP Segmentation Offload may sometimes be a better choice, however this increases VM CPU usage.
Specifically to backups, even if some of your backup infrastructure components are running in a Windows Server 2012 VM, you should be safe if you are using Veeam Backup & Replication 6.5 or later. This was the version when we added inline network traffic verification to work around some unrelated data corruption issues involving faulty network equipment that we have observed in support. I had a big story about this in a weekly digest over one year ago. However, unfortunately your actual production data may already be corrupted, and unless you still have backups going all the way back to your vSphere 5.x or Windows Server 2012 upgrade times, this might be one of those cases of unrecoverable data loss... and worst of all, without running a compare against a copy of data that is known to be "good", it is impossible to say which specific parts of data are corrupted...
As per VMware support KB, the investigation is still on-going, so I would not yet jump to a conclusion that this is a bug with VMware. For example, we did see one mysterious data corruption issue during weeks of automated stress testing of our Windows Server 2012 support. We call it "10 bad bits mystery" internally, and it was affecting network transfers on both physical and virtual hardware. Unfortunately, the issue was impossible to reproduce reliably, so our investigation with Microsoft went nowhere (and we already had the problem covered with our network traffic verification anyway). But, if anyone from VMware R&D or support are reading this, feel free to reach out to me to discuss the data corruption pattern, as well as factors facilitating the issue surfacing – as this could be the same issue.
All that said, could you please tell what kind of target repository do you have (the one that stored corrupt backups) and also check whether the 'Use multiple upload streams per job' check box is selected in the Traffic Throttling dialog? This would help us identify why are you still affected by the said issue (i.e. whether the mentioned network traffic verification is enabled).
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Win2012 Proxy can cause backup corruption
Additionally, regarding your suspicion:
There's an ability to enable backup consistency validation in SureBackup job right for this kind of cases (the same engine will perform a CRC check to identify possible file corruption).swizzle wrote:My suspicion is SureBackup would not alert you to this hidden and very significant problem.
-
- Novice
- Posts: 8
- Liked: never
- Joined: Jun 11, 2010 9:28 pm
- Full Name: Peter Mott
- Location: New Zealand
- Contact:
Re: Win2012 Proxy can cause backup corruption
Hi Alexander,
Please refer open case #00639712
'Use multiple upload streams per job' is unchecked. Our lab tests confirm that Gostev's comment "you should be safe if you are using Veeam Backup & Replication 6.5 or later. This was the version when we added inline network traffic verification to work around some unrelated data corruption issues involving faulty network equipment that we have observed in support." is incorrect, or does not apply to our configuration.
I had seen the above post, and concluded because we had installed VBR 7.x, there was no need to disable TCP offloading or change the adapter from E1000E, thinking Veeam would detect the problem and throw warnings. This was a bad call on my part.
Perhaps Veeam could review how the network traffic verification works and include improvements to detect this condition?
Please refer open case #00639712
'Use multiple upload streams per job' is unchecked. Our lab tests confirm that Gostev's comment "you should be safe if you are using Veeam Backup & Replication 6.5 or later. This was the version when we added inline network traffic verification to work around some unrelated data corruption issues involving faulty network equipment that we have observed in support." is incorrect, or does not apply to our configuration.
I had seen the above post, and concluded because we had installed VBR 7.x, there was no need to disable TCP offloading or change the adapter from E1000E, thinking Veeam would detect the problem and throw warnings. This was a bad call on my part.
Perhaps Veeam could review how the network traffic verification works and include improvements to detect this condition?
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Win2012 Proxy can cause backup corruption
That's why I asked you about the check box. There's a statement in the v7 release notes document:swizzle wrote:'Use multiple upload streams per job' is unchecked. Our lab tests confirm that Gostev's comment "you should be safe if you are using Veeam Backup & Replication 6.5 or later. This was the version when we added inline network traffic verification to work around some unrelated data corruption issues involving faulty network equipment that we have observed in support." is incorrect, or does not apply to our configuration.
Network traffic verification does not function when the Use multiple upload streams per job option is disabled.
-
- Novice
- Posts: 8
- Liked: never
- Joined: Jun 11, 2010 9:28 pm
- Full Name: Peter Mott
- Location: New Zealand
- Contact:
Re: Win2012 Proxy can cause backup corruption
It's listed as a known issue. Do you know when this will be corrected?Network traffic verification does not function when the Use multiple upload streams per job option is disabled.
Interestingly Veeam reports "Network traffic verification detected no corrupted blocks" even when the feature is apparently disabled. Very misleading. Can that be recorded as as known issue as well?
We had disabled "Use multiple upload streams per job" when trying to correct an earlier problem (a ticket was raised at the time). It could be that the original fault was due to the TCP offloading issue, and that disabling "Use multiple upload streams per job" had the effect of stopping the network verification, hiding the symptom without correcting anything. We will run some more tests to see if this is the case.
If it is Veeam's intention to permanently have network verification disabled when multiple upload streams per job is disabled, perhaps a warning message when selecting that option is appropriate?
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Win2012 Proxy can cause backup corruption
Currently I don't have any information regarding when it is going to be fixed. I'm passing your feedback to R&D, thanks.
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Win2012 Proxy can cause backup corruption
Btw, what you could do to work around this after upgrading to v8, is set the number of streams to 1 (v8 will allow to explicitly specify the number of allowed connections). In this case, network traffic verification will be enabled, however just a single upload stream will be used.
-
- Novice
- Posts: 8
- Liked: never
- Joined: Jun 11, 2010 9:28 pm
- Full Name: Peter Mott
- Location: New Zealand
- Contact:
Re: Win2012 Proxy can cause backup corruption
Hi,
Ultimately this is a trust issue. Network verification is there to provide additional confidence that the backup is likely to be good. When the application reports that network verification is success when it is not even enabled, you have what in the bricks and mortar world would be considered deceptive behaviour.
I sure hope it can be fixed soon. In any event, we are now getting 100% good backups.
Thanks for your time.
Ultimately this is a trust issue. Network verification is there to provide additional confidence that the backup is likely to be good. When the application reports that network verification is success when it is not even enabled, you have what in the bricks and mortar world would be considered deceptive behaviour.
I sure hope it can be fixed soon. In any event, we are now getting 100% good backups.
Thanks for your time.
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Win2012 Proxy can cause backup corruption
I agree, this definitely will be addressed further down the road.
-
- Veeam Vanguard
- Posts: 395
- Liked: 169 times
- Joined: Nov 17, 2010 11:42 am
- Full Name: Eric Machabert
- Location: France
- Contact:
Re: Win2012 Proxy can cause backup corruption
This is why it is important to read this forum, weekly digest and vmware related newsletter, so as following best practices when configuring virtual machines.
Some people are still facing year old issues and geting themselves in a bad situation.
Some people are still facing year old issues and geting themselves in a bad situation.
Veeamizing your IT since 2009/ Veeam Vanguard 2015 - 2023
Who is online
Users browsing this forum: Bing [Bot], oscarm and 165 guests