Host-based backup of VMware vSphere VMs.
ChrisDriver
Novice
Posts: 7
Liked: 1 time
Joined: Feb 26, 2013 4:47 pm
Full Name: Chris Driver
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by ChrisDriver » 1 person likes this post

I am really surprised this issue is not getting more attention.

Unless I am reading this wrong, Veeam backup jobs with 'Use changed block tracking data (recommended)' selected can cause corrupted backups. The corruption is not easy to spot and test restores using instant VM recovery or SureBackup don't always reveal the corruption.

Bearing in mind when configuring a Veeam backup job, using CBT data is recommended by default, why isn't anyone apart from a few people freaking out about this issue?!

Are Veeam planning to offer any advice regarding this issue? Are there any workarounds? Is it sufficient to edit Veeam backup jobs and uncheck 'Use changed block tracking data (recommended)' ?
Perdesthai
Novice
Posts: 4
Liked: never
Joined: Oct 03, 2014 1:05 pm

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Perdesthai »

We freaked out enough to disable CBT use in all our jobs but until we have more info on the problem there wasn't really anything to say.
Gostev
Chief Product Officer
Posts: 31798
Liked: 7297 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev »

ChrisDriver wrote:Is it sufficient to edit Veeam backup jobs and uncheck 'Use changed block tracking data (recommended)' ?
In addition to that, after having disabled CBT you must also run an Active Full backup.
mloeckle
Service Provider
Posts: 7
Liked: 9 times
Joined: May 30, 2013 10:04 pm
Full Name: Michael Loeckle
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by mloeckle »

Gostev wrote:In addition to that, after having disabled CBT you must also run an Active Full backup.
Why is an active full backup required? Doesn't disabling CBT cause Veeam to read the entire VMDK and compare that to what's already been backed up? If something was missed because of CBT, would it not be corrected in this scenario?
Gostev
Chief Product Officer
Posts: 31798
Liked: 7297 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev » 1 person likes this post

You are right, in theory it should not be needed in the world where VADP can be trusted. However, since now there's a possibility that the bug is in VADP itself (or rather in VADP/CBT interop), it does make sense to perform an Active Full after CBT has been disabled. Otherwise, who knows - may be "poisoned" VADP may not be seeing issues even when doing full scans - for example, due to not even attempting to read VMDK areas it "thinks" are unallocated. And I am just trying to give something as bulletproof as possible for those who are concerned and are trying to be as much on a safe side as possible.

Personally, I would not do anything beyond SureBackup with app integrity checker test scripts on some most critical VMs. This would be enough for me to draw the conclusion that my deployment is unaffected by this issue, forget about it and move on. Why - because many facts tell me the scope of the issue must be quite small. I have been wrong before, but this is what my intuition and experience tell me in this case.

By the way, I have not provided any updates for the past couple of weeks just because there're no significant ones. VMware continues troubleshooting and investigation, collecting lots of data from the affected VM during the longest Webexes... which makes me really thankful to the affected customer for all of his patience with this matter :D
staskorz
Enthusiast
Posts: 26
Liked: 18 times
Joined: Aug 06, 2017 10:12 am
Full Name: Stas Korzovsky
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by staskorz » 1 person likes this post

An update from Mr. Gostev:
Finally, I have a significant update on the QueryChangedDiskAreas API bug in vSphere CBT. Please do treat this information as "work in progress" update – normally, I would have hold off sharing this until the official VMware KB article. However, this issue is just too high profile and has too many people not sleeping well over it – so, I could not pass sharing these good news (good for the majority of you – those NOT using VVols). Besides, I think VMware engineers have nailed it anyway, as intuitively VVols has been the primary suspect for me due to being relatively new technology, making such teething issues somewhat expected. Plus, I am convinced many more Veeam customers would have been reporting actual corruptions due to this issue, if it was not limited to some not so common deployment scenario.

Long story short, in their testing VMware VADP QC team was able to reproduce an issue which looks to be similar to the issue that is being investigated. Essentially, they observed CBT stop tracking changes after performing a regular VMotion (host change only) for the VMs located on a VVols datastore. And they've reproduced the issue on storage devices from two different vendors, meaning the issue is most likely not a storage-specific one (apparently CBT kernel module simply stops recording any changes after vMotion). On a bright side, all other datastore types – VMFS, NFS and VSAN – were also tested and found to be NOT affected by the issue... did I just hear a worldwide sigh of relief? And VVols users - sorry for the bad news, I'll keep you updated as we learn more from VMware VADP and VVols teams.
TL;DR:
  1. Only affects VVOLs
  2. Not specific Nimble - (reproducible with 2 different storage vendors)
which makes me really thankful to the affected customer for all of his patience with this matter :D
:wink:
Gostev
Chief Product Officer
Posts: 31798
Liked: 7297 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev » 1 person likes this post

Seriously, thank you. And thanks for reposting an update from the digest in this thread - I totally forgot to do it since I've been on the road for the past few days.
staskorz
Enthusiast
Posts: 26
Liked: 18 times
Joined: Aug 06, 2017 10:12 am
Full Name: Stas Korzovsky
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by staskorz » 1 person likes this post

Thank you! It's really not an obvious thing for a top manager to be personally involved in a specific case to such degree.
ITP-Stan
Expert
Posts: 214
Liked: 61 times
Joined: Feb 18, 2013 10:45 am
Full Name: Stan G
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by ITP-Stan » 1 person likes this post

That's why Anton is the best!
staskorz
Enthusiast
Posts: 26
Liked: 18 times
Joined: Aug 06, 2017 10:12 am
Full Name: Stas Korzovsky
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by staskorz »

Just got an update from VMware: a fix is on the way, currently planned to be officially released at the end of July 2018. That's for ESXi 6.0.

Bear in mind ESXi versions 6.5 and 6.7 are also affected - this fix will be also ported to their respective update releases.
jsprinkleisg
Service Provider
Posts: 26
Liked: 4 times
Joined: Dec 09, 2009 9:59 pm
Full Name: James Sprinkle
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by jsprinkleisg » 1 person likes this post

For reference, here's the link to the VMware KB article:
https://kb.vmware.com/kb/55800

Until the fix is available, what VBR users can do to work around is this:

For each VM on VVols
  • Disable automatic vMotions
  • Reset CBT and perform a full backup
  • Reset CBT again after any manual vMotion of the VM
CBT can be reset on running VMs using VMware PowerCLI. See example commands at Veeam's KB article here, or search the web for other examples.
F182
Service Provider
Posts: 19
Liked: 3 times
Joined: Jun 03, 2018 3:13 pm
Full Name: Farzon David Almaneih
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by F182 »

So I have clients with a mix of 5.x and 6.x. If none of them have VVols, can we have CBT enabled safely? We are moving from StorageCraft to Veeam and this issue has been a major pucker factor for us.
Gostev
Chief Product Officer
Posts: 31798
Liked: 7297 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev »

Sure.
ctg49
Enthusiast
Posts: 65
Liked: 45 times
Joined: Feb 14, 2018 1:47 pm
Full Name: Chris Garlington
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by ctg49 » 4 people like this post

I thought I'd bump this, as VMware has updated their KB article related to this issue specifically citing a resolution available in misc versions of vSphere ESXi that are upcoming:
https://kb.vmware.com/s/article/55800
See resolution: "This issue is resolved in VMware vSphere 6.0p07, 6.0p08, 6.5p03, 6.5u2, and 6.7u1."

Hopefully in the coming weeks/months (I think 6.7U1 at least is slated for OCT) we'll be seeing a fix come out. After some vetting, I'll be excited to start migrating everything to VVOLs.
jsprinkleisg
Service Provider
Posts: 26
Liked: 4 times
Joined: Dec 09, 2009 9:59 pm
Full Name: James Sprinkle
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by jsprinkleisg »

ctg49 wrote: Aug 28, 2018 6:10 pm https://kb.vmware.com/s/article/55800
See resolution: "This issue is resolved in VMware vSphere 6.0p07, 6.0p08, 6.5p03, 6.5u2, and 6.7u1."
They've removed this resolved-in list from the KB article. Now the resolution section says this:
This issue is resolved in ESXi600-201807001, available at VMware Downloads.

Note: This is a known issue affecting VMware ESXi 6.5.x and 6.7.x.
So, fixed for 6.0, but apparently no fix yet for 6.5 or 6.7. I thought "6.5u2" being in the list was suspect anyway, because that version was released way back in May.
ctg49
Enthusiast
Posts: 65
Liked: 45 times
Joined: Feb 14, 2018 1:47 pm
Full Name: Chris Garlington
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by ctg49 » 1 person likes this post

So, 6.7U1 has now been released, and specifically cited in the ESXi release notes is this:

PR 2119610: Migration of a virtual machine with a Filesystem Device Switch (FDS) on a vSphere Virtual Volumes datastore by using VMware vSphere vMotion might cause multiple issues
If you use vSphere vMotion to migrate a virtual machine with file device filters from a vSphere Virtual Volumes datastore to another host, and the virtual machine has either of the Changed Block Tracking (CBT), VMware vSphere Flash Read Cache (VFRC) or I/O filters enabled, the migration might cause issues with any of the features. During the migration, the file device filters might not be correctly transferred to the host. As a result, you might see corrupted incremental backups in CBT, performance degradation of VFRC and cache I/O filters, corrupted replication I/O filters, and disk corruption, when cache I/O filters are configured in write-back mode. You might also see issues with the virtual machine encryption.

Тhis issue is resolved in this release.

This looks like the fix we've been waiting for for VVOLs and CBT, which makes me thrilled to hear. I look forward to hearing how it tests out.

Unfortunately, it also sounds like they introduced something rather gamebreaking (based on the sticky) so holding off on any upgrades until that's resolved.
rboynton
Enthusiast
Posts: 60
Liked: 14 times
Joined: Jun 25, 2015 12:59 am
Full Name: Rick Boynton
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by rboynton »

It certainly would have been good news if they did not break something with VCenter. Seems a bit strange that their QA did not catch that before the release. I would have thought that the pre-release would have been sent to vendors like Veeam for testing before the public release. I'm glad VMWare was focused on getting the CBT/VVol issue resolved, but they really need to slow down and make sure what they produce is vetted.
ctg49
Enthusiast
Posts: 65
Liked: 45 times
Joined: Feb 14, 2018 1:47 pm
Full Name: Chris Garlington
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by ctg49 » 1 person likes this post

I suspect that like many major vendors (looking at you, MS), they just don't particularly care if their updates break integration with other products. They're the 'rock' and expect others to be the 'river' working around them, as it were.
cfizz34
Expert
Posts: 128
Liked: 14 times
Joined: Jul 02, 2010 2:57 pm
Full Name: Chad
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by cfizz34 »

Would CBT be a culprit to breaking instant recovery from a pure array snapshot? Any time I try to vmotion a VM guest off a snapshot and have not powered it on, it fails and veeam support is suggesting turning off CBT to fix it (or test it that is).
Gostev
Chief Product Officer
Posts: 31798
Liked: 7297 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev »

No, this is some completely unrelated issue to what is being discussed in this topic.
cfizz34
Expert
Posts: 128
Liked: 14 times
Joined: Jul 02, 2010 2:57 pm
Full Name: Chad
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by cfizz34 »

Side note, even though not related to this thread there is a bug be reported to vmware development in regards to CBT (this is on the vmware side, not veeam side).
Gostev
Chief Product Officer
Posts: 31798
Liked: 7297 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev » 1 person likes this post

Do tell more!
cfizz34
Expert
Posts: 128
Liked: 14 times
Joined: Jul 02, 2010 2:57 pm
Full Name: Chad
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by cfizz34 »

You cannot vmotion a vmware guest if the snapshot was taken while the vm was powered on. If you want to vmotion it, you must power it on (then off if you like) and that point you are able to vmotion it off the snapshot.
Gostev
Chief Product Officer
Posts: 31798
Liked: 7297 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by Gostev »

Hmm... but you said the issue was with CBT, not with vMotion? Was it just a typo?
cfizz34
Expert
Posts: 128
Liked: 14 times
Joined: Jul 02, 2010 2:57 pm
Full Name: Chad
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by cfizz34 »

you can't vmotion the vm guest off unless you power it on first. vmware and veeam are going back and forth to figure out the issue.

"12/17/2019 11:16:55 AM :: Relocating VM Error: Error caused by file /vmfs/volumes/5df90e91-36276908-6519-a4badb1e0a94/DATASTORE/VM_NAME.vmdk
"12/17/2019 11:17:23 AM :: Failed to process VM CIname-TEST at 2019-12-17T111723 Error: Error caused by file /vmfs/volumes/5df90e91-36276908-6519-a4badb1e0a94/DATASTORE/VM_NAME.vmdk"
cfizz34
Expert
Posts: 128
Liked: 14 times
Joined: Jul 02, 2010 2:57 pm
Full Name: Chad
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by cfizz34 »

lots of email going back and forth but wanted to post here in case anyone else was running into this....

As VMware Engineering informed that in vsphere 6.7, the ctk file will be checked before opening. It is more appropriate to report the ctk file unclean error than not to check it at that point
cfizz34
Expert
Posts: 128
Liked: 14 times
Joined: Jul 02, 2010 2:57 pm
Full Name: Chad
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by cfizz34 »

veeam support tech is spinning up his lab to assist vmware in this since they had some questions.
cfizz34
Expert
Posts: 128
Liked: 14 times
Joined: Jul 02, 2010 2:57 pm
Full Name: Chad
Contact:

Re: vSphere CBT bug with QueryChangedDiskAreas("*")

Post by cfizz34 »

Vmware took so long and kept asking for more logs which they should have already and keep getting confused on the goal that I have up on them and requested to be removed from the email thread so I'm not sure if it will ever get resolved.

Here are the my case numbers if ever needed: 04011970 & 04051725
Post Reply

Who is online

Users browsing this forum: dbeerts, Semrush [Bot] and 89 guests