vSphere CBT bug with QueryChangedDiskAreas("*")

Discussions specific to VMware vSphere hypervisor

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby ChrisDriver » Thu Apr 26, 2018 10:13 am 1 person likes this post

I am really surprised this issue is not getting more attention.

Unless I am reading this wrong, Veeam backup jobs with 'Use changed block tracking data (recommended)' selected can cause corrupted backups. The corruption is not easy to spot and test restores using instant VM recovery or SureBackup don't always reveal the corruption.

Bearing in mind when configuring a Veeam backup job, using CBT data is recommended by default, why isn't anyone apart from a few people freaking out about this issue?!

Are Veeam planning to offer any advice regarding this issue? Are there any workarounds? Is it sufficient to edit Veeam backup jobs and uncheck 'Use changed block tracking data (recommended)' ?
ChrisDriver
Novice
 
Posts: 7
Liked: 1 time
Joined: Tue Feb 26, 2013 4:47 pm
Full Name: Chris Driver

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby Perdesthai » Thu Apr 26, 2018 4:51 pm

We freaked out enough to disable CBT use in all our jobs but until we have more info on the problem there wasn't really anything to say.
Perdesthai
Novice
 
Posts: 3
Liked: never
Joined: Fri Oct 03, 2014 1:05 pm

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby Gostev » Thu Apr 26, 2018 7:31 pm

ChrisDriver wrote:Is it sufficient to edit Veeam backup jobs and uncheck 'Use changed block tracking data (recommended)' ?

In addition to that, after having disabled CBT you must also run an Active Full backup.
Gostev
Veeam Software
 
Posts: 22053
Liked: 2564 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby mloeckle » Mon Apr 30, 2018 5:08 pm

Gostev wrote:In addition to that, after having disabled CBT you must also run an Active Full backup.

Why is an active full backup required? Doesn't disabling CBT cause Veeam to read the entire VMDK and compare that to what's already been backed up? If something was missed because of CBT, would it not be corrected in this scenario?
mloeckle
Novice
 
Posts: 7
Liked: 9 times
Joined: Thu May 30, 2013 10:04 pm
Full Name: Michael Loeckle

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby Gostev » Mon Apr 30, 2018 8:39 pm 1 person likes this post

You are right, in theory it should not be needed in the world where VADP can be trusted. However, since now there's a possibility that the bug is in VADP itself (or rather in VADP/CBT interop), it does make sense to perform an Active Full after CBT has been disabled. Otherwise, who knows - may be "poisoned" VADP may not be seeing issues even when doing full scans - for example, due to not even attempting to read VMDK areas it "thinks" are unallocated. And I am just trying to give something as bulletproof as possible for those who are concerned and are trying to be as much on a safe side as possible.

Personally, I would not do anything beyond SureBackup with app integrity checker test scripts on some most critical VMs. This would be enough for me to draw the conclusion that my deployment is unaffected by this issue, forget about it and move on. Why - because many facts tell me the scope of the issue must be quite small. I have been wrong before, but this is what my intuition and experience tell me in this case.

By the way, I have not provided any updates for the past couple of weeks just because there're no significant ones. VMware continues troubleshooting and investigation, collecting lots of data from the affected VM during the longest Webexes... which makes me really thankful to the affected customer for all of his patience with this matter :D
Gostev
Veeam Software
 
Posts: 22053
Liked: 2564 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby staskorz » Tue May 08, 2018 8:57 am 1 person likes this post

An update from Mr. Gostev:

Finally, I have a significant update on the QueryChangedDiskAreas API bug in vSphere CBT. Please do treat this information as "work in progress" update – normally, I would have hold off sharing this until the official VMware KB article. However, this issue is just too high profile and has too many people not sleeping well over it – so, I could not pass sharing these good news (good for the majority of you – those NOT using VVols). Besides, I think VMware engineers have nailed it anyway, as intuitively VVols has been the primary suspect for me due to being relatively new technology, making such teething issues somewhat expected. Plus, I am convinced many more Veeam customers would have been reporting actual corruptions due to this issue, if it was not limited to some not so common deployment scenario.

Long story short, in their testing VMware VADP QC team was able to reproduce an issue which looks to be similar to the issue that is being investigated. Essentially, they observed CBT stop tracking changes after performing a regular VMotion (host change only) for the VMs located on a VVols datastore. And they've reproduced the issue on storage devices from two different vendors, meaning the issue is most likely not a storage-specific one (apparently CBT kernel module simply stops recording any changes after vMotion). On a bright side, all other datastore types – VMFS, NFS and VSAN – were also tested and found to be NOT affected by the issue... did I just hear a worldwide sigh of relief? And VVols users - sorry for the bad news, I'll keep you updated as we learn more from VMware VADP and VVols teams.


TL;DR:
  1. Only affects VVOLs
  2. Not specific Nimble - (reproducible with 2 different storage vendors)

which makes me really thankful to the affected customer for all of his patience with this matter :D

:wink:
staskorz
Enthusiast
 
Posts: 26
Liked: 17 times
Joined: Sun Aug 06, 2017 10:12 am
Full Name: Stas Korzovsky

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby Gostev » Tue May 08, 2018 9:07 am 1 person likes this post

Seriously, thank you. And thanks for reposting an update from the digest in this thread - I totally forgot to do it since I've been on the road for the past few days.
Gostev
Veeam Software
 
Posts: 22053
Liked: 2564 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby staskorz » Tue May 08, 2018 9:18 am 1 person likes this post

Thank you! It's really not an obvious thing for a top manager to be personally involved in a specific case to such degree.
staskorz
Enthusiast
 
Posts: 26
Liked: 17 times
Joined: Sun Aug 06, 2017 10:12 am
Full Name: Stas Korzovsky

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby ITP-Stan » Wed May 16, 2018 2:47 pm 1 person likes this post

That's why Anton is the best!
ITP-Stan
Service Provider
 
Posts: 83
Liked: 10 times
Joined: Mon Feb 18, 2013 10:45 am
Full Name: Stan (IF-IT4U)

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Veeam Logoby staskorz » Thu May 17, 2018 1:31 pm

Just got an update from VMware: a fix is on the way, currently planned to be officially released at the end of July 2018. That's for ESXi 6.0.

Bare in mind ESXi versions 6.5 and 6.7 are also affected - this fix will be also ported to their respective update releases.
staskorz
Enthusiast
 
Posts: 26
Liked: 17 times
Joined: Sun Aug 06, 2017 10:12 am
Full Name: Stas Korzovsky

Previous

Return to VMware vSphere



Who is online

Users browsing this forum: No registered users and 8 guests