Discussions specific to the VMware vSphere hypervisor
ChrisDriver
Novice
Posts: 7
Liked: 1 time
Joined: Feb 26, 2013 4:47 pm
Full Name: Chris Driver
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by ChrisDriver » Apr 26, 2018 10:13 am 1 person likes this post

I am really surprised this issue is not getting more attention.

Unless I am reading this wrong, Veeam backup jobs with 'Use changed block tracking data (recommended)' selected can cause corrupted backups. The corruption is not easy to spot and test restores using instant VM recovery or SureBackup don't always reveal the corruption.

Bearing in mind when configuring a Veeam backup job, using CBT data is recommended by default, why isn't anyone apart from a few people freaking out about this issue?!

Are Veeam planning to offer any advice regarding this issue? Are there any workarounds? Is it sufficient to edit Veeam backup jobs and uncheck 'Use changed block tracking data (recommended)' ?

Perdesthai
Novice
Posts: 4
Liked: never
Joined: Oct 03, 2014 1:05 pm
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Perdesthai » Apr 26, 2018 4:51 pm

We freaked out enough to disable CBT use in all our jobs but until we have more info on the problem there wasn't really anything to say.

Gostev
Veeam Software
Posts: 22995
Liked: 2890 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev » Apr 26, 2018 7:31 pm

ChrisDriver wrote:Is it sufficient to edit Veeam backup jobs and uncheck 'Use changed block tracking data (recommended)' ?
In addition to that, after having disabled CBT you must also run an Active Full backup.

mloeckle
Novice
Posts: 7
Liked: 9 times
Joined: May 30, 2013 10:04 pm
Full Name: Michael Loeckle
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by mloeckle » Apr 30, 2018 5:08 pm

Gostev wrote:In addition to that, after having disabled CBT you must also run an Active Full backup.
Why is an active full backup required? Doesn't disabling CBT cause Veeam to read the entire VMDK and compare that to what's already been backed up? If something was missed because of CBT, would it not be corrected in this scenario?

Gostev
Veeam Software
Posts: 22995
Liked: 2890 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev » Apr 30, 2018 8:39 pm 1 person likes this post

You are right, in theory it should not be needed in the world where VADP can be trusted. However, since now there's a possibility that the bug is in VADP itself (or rather in VADP/CBT interop), it does make sense to perform an Active Full after CBT has been disabled. Otherwise, who knows - may be "poisoned" VADP may not be seeing issues even when doing full scans - for example, due to not even attempting to read VMDK areas it "thinks" are unallocated. And I am just trying to give something as bulletproof as possible for those who are concerned and are trying to be as much on a safe side as possible.

Personally, I would not do anything beyond SureBackup with app integrity checker test scripts on some most critical VMs. This would be enough for me to draw the conclusion that my deployment is unaffected by this issue, forget about it and move on. Why - because many facts tell me the scope of the issue must be quite small. I have been wrong before, but this is what my intuition and experience tell me in this case.

By the way, I have not provided any updates for the past couple of weeks just because there're no significant ones. VMware continues troubleshooting and investigation, collecting lots of data from the affected VM during the longest Webexes... which makes me really thankful to the affected customer for all of his patience with this matter :D

staskorz
Enthusiast
Posts: 26
Liked: 18 times
Joined: Aug 06, 2017 10:12 am
Full Name: Stas Korzovsky
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by staskorz » May 08, 2018 8:57 am 1 person likes this post

An update from Mr. Gostev:
Finally, I have a significant update on the QueryChangedDiskAreas API bug in vSphere CBT. Please do treat this information as "work in progress" update – normally, I would have hold off sharing this until the official VMware KB article. However, this issue is just too high profile and has too many people not sleeping well over it – so, I could not pass sharing these good news (good for the majority of you – those NOT using VVols). Besides, I think VMware engineers have nailed it anyway, as intuitively VVols has been the primary suspect for me due to being relatively new technology, making such teething issues somewhat expected. Plus, I am convinced many more Veeam customers would have been reporting actual corruptions due to this issue, if it was not limited to some not so common deployment scenario.

Long story short, in their testing VMware VADP QC team was able to reproduce an issue which looks to be similar to the issue that is being investigated. Essentially, they observed CBT stop tracking changes after performing a regular VMotion (host change only) for the VMs located on a VVols datastore. And they've reproduced the issue on storage devices from two different vendors, meaning the issue is most likely not a storage-specific one (apparently CBT kernel module simply stops recording any changes after vMotion). On a bright side, all other datastore types – VMFS, NFS and VSAN – were also tested and found to be NOT affected by the issue... did I just hear a worldwide sigh of relief? And VVols users - sorry for the bad news, I'll keep you updated as we learn more from VMware VADP and VVols teams.
TL;DR:
  1. Only affects VVOLs
  2. Not specific Nimble - (reproducible with 2 different storage vendors)
which makes me really thankful to the affected customer for all of his patience with this matter :D
:wink:

Gostev
Veeam Software
Posts: 22995
Liked: 2890 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev » May 08, 2018 9:07 am 1 person likes this post

Seriously, thank you. And thanks for reposting an update from the digest in this thread - I totally forgot to do it since I've been on the road for the past few days.

staskorz
Enthusiast
Posts: 26
Liked: 18 times
Joined: Aug 06, 2017 10:12 am
Full Name: Stas Korzovsky
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by staskorz » May 08, 2018 9:18 am 1 person likes this post

Thank you! It's really not an obvious thing for a top manager to be personally involved in a specific case to such degree.

ITP-Stan
Service Provider
Posts: 88
Liked: 10 times
Joined: Feb 18, 2013 10:45 am
Full Name: Stan (IF-IT4U)
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by ITP-Stan » May 16, 2018 2:47 pm 1 person likes this post

That's why Anton is the best!

staskorz
Enthusiast
Posts: 26
Liked: 18 times
Joined: Aug 06, 2017 10:12 am
Full Name: Stas Korzovsky
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by staskorz » May 17, 2018 1:31 pm

Just got an update from VMware: a fix is on the way, currently planned to be officially released at the end of July 2018. That's for ESXi 6.0.

Bear in mind ESXi versions 6.5 and 6.7 are also affected - this fix will be also ported to their respective update releases.

jsprinkleisg
Service Provider
Posts: 18
Liked: 4 times
Joined: Dec 09, 2009 9:59 pm
Full Name: James Sprinkle
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by jsprinkleisg » Jun 21, 2018 6:01 pm 1 person likes this post

For reference, here's the link to the VMware KB article:
https://kb.vmware.com/kb/55800

Until the fix is available, what VBR users can do to work around is this:

For each VM on VVols
  • Disable automatic vMotions
  • Reset CBT and perform a full backup
  • Reset CBT again after any manual vMotion of the VM
CBT can be reset on running VMs using VMware PowerCLI. See example commands at Veeam's KB article here, or search the web for other examples.

F182
Service Provider
Posts: 18
Liked: 2 times
Joined: Jun 03, 2018 3:13 pm
Full Name: Farzon David Almaneih
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by F182 » Jun 22, 2018 7:24 am

So I have clients with a mix of 5.x and 6.x. If none of them have VVols, can we have CBT enabled safely? We are moving from StorageCraft to Veeam and this issue has been a major pucker factor for us.

Gostev
Veeam Software
Posts: 22995
Liked: 2890 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev » Jun 22, 2018 9:33 am

Sure.

ctg49
Influencer
Posts: 21
Liked: 9 times
Joined: Feb 14, 2018 1:47 pm
Full Name: Chris Garlington
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by ctg49 » Aug 28, 2018 6:10 pm 4 people like this post

I thought I'd bump this, as VMware has updated their KB article related to this issue specifically citing a resolution available in misc versions of vSphere ESXi that are upcoming:
https://kb.vmware.com/s/article/55800
See resolution: "This issue is resolved in VMware vSphere 6.0p07, 6.0p08, 6.5p03, 6.5u2, and 6.7u1."

Hopefully in the coming weeks/months (I think 6.7U1 at least is slated for OCT) we'll be seeing a fix come out. After some vetting, I'll be excited to start migrating everything to VVOLs.

jsprinkleisg
Service Provider
Posts: 18
Liked: 4 times
Joined: Dec 09, 2009 9:59 pm
Full Name: James Sprinkle
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by jsprinkleisg » Oct 10, 2018 5:35 pm

ctg49 wrote:
Aug 28, 2018 6:10 pm
https://kb.vmware.com/s/article/55800
See resolution: "This issue is resolved in VMware vSphere 6.0p07, 6.0p08, 6.5p03, 6.5u2, and 6.7u1."
They've removed this resolved-in list from the KB article. Now the resolution section says this:
This issue is resolved in ESXi600-201807001, available at VMware Downloads.

Note: This is a known issue affecting VMware ESXi 6.5.x and 6.7.x.
So, fixed for 6.0, but apparently no fix yet for 6.5 or 6.7. I thought "6.5u2" being in the list was suspect anyway, because that version was released way back in May.

Post Reply

Who is online

Users browsing this forum: Baidu [Spider], Google [Bot], victor.perezdemingo and 37 guests