-
- Novice
- Posts: 7
- Liked: 1 time
- Joined: Feb 26, 2013 4:47 pm
- Full Name: Chris Driver
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
I am really surprised this issue is not getting more attention.
Unless I am reading this wrong, Veeam backup jobs with 'Use changed block tracking data (recommended)' selected can cause corrupted backups. The corruption is not easy to spot and test restores using instant VM recovery or SureBackup don't always reveal the corruption.
Bearing in mind when configuring a Veeam backup job, using CBT data is recommended by default, why isn't anyone apart from a few people freaking out about this issue?!
Are Veeam planning to offer any advice regarding this issue? Are there any workarounds? Is it sufficient to edit Veeam backup jobs and uncheck 'Use changed block tracking data (recommended)' ?
Unless I am reading this wrong, Veeam backup jobs with 'Use changed block tracking data (recommended)' selected can cause corrupted backups. The corruption is not easy to spot and test restores using instant VM recovery or SureBackup don't always reveal the corruption.
Bearing in mind when configuring a Veeam backup job, using CBT data is recommended by default, why isn't anyone apart from a few people freaking out about this issue?!
Are Veeam planning to offer any advice regarding this issue? Are there any workarounds? Is it sufficient to edit Veeam backup jobs and uncheck 'Use changed block tracking data (recommended)' ?
-
- Novice
- Posts: 4
- Liked: never
- Joined: Oct 03, 2014 1:05 pm
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
We freaked out enough to disable CBT use in all our jobs but until we have more info on the problem there wasn't really anything to say.
-
- Chief Product Officer
- Posts: 31798
- Liked: 7297 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
In addition to that, after having disabled CBT you must also run an Active Full backup.ChrisDriver wrote:Is it sufficient to edit Veeam backup jobs and uncheck 'Use changed block tracking data (recommended)' ?
-
- Service Provider
- Posts: 7
- Liked: 9 times
- Joined: May 30, 2013 10:04 pm
- Full Name: Michael Loeckle
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
Why is an active full backup required? Doesn't disabling CBT cause Veeam to read the entire VMDK and compare that to what's already been backed up? If something was missed because of CBT, would it not be corrected in this scenario?Gostev wrote:In addition to that, after having disabled CBT you must also run an Active Full backup.
-
- Chief Product Officer
- Posts: 31798
- Liked: 7297 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
You are right, in theory it should not be needed in the world where VADP can be trusted. However, since now there's a possibility that the bug is in VADP itself (or rather in VADP/CBT interop), it does make sense to perform an Active Full after CBT has been disabled. Otherwise, who knows - may be "poisoned" VADP may not be seeing issues even when doing full scans - for example, due to not even attempting to read VMDK areas it "thinks" are unallocated. And I am just trying to give something as bulletproof as possible for those who are concerned and are trying to be as much on a safe side as possible.
Personally, I would not do anything beyond SureBackup with app integrity checker test scripts on some most critical VMs. This would be enough for me to draw the conclusion that my deployment is unaffected by this issue, forget about it and move on. Why - because many facts tell me the scope of the issue must be quite small. I have been wrong before, but this is what my intuition and experience tell me in this case.
By the way, I have not provided any updates for the past couple of weeks just because there're no significant ones. VMware continues troubleshooting and investigation, collecting lots of data from the affected VM during the longest Webexes... which makes me really thankful to the affected customer for all of his patience with this matter
Personally, I would not do anything beyond SureBackup with app integrity checker test scripts on some most critical VMs. This would be enough for me to draw the conclusion that my deployment is unaffected by this issue, forget about it and move on. Why - because many facts tell me the scope of the issue must be quite small. I have been wrong before, but this is what my intuition and experience tell me in this case.
By the way, I have not provided any updates for the past couple of weeks just because there're no significant ones. VMware continues troubleshooting and investigation, collecting lots of data from the affected VM during the longest Webexes... which makes me really thankful to the affected customer for all of his patience with this matter
-
- Enthusiast
- Posts: 26
- Liked: 18 times
- Joined: Aug 06, 2017 10:12 am
- Full Name: Stas Korzovsky
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
An update from Mr. Gostev:
TL;DR:Finally, I have a significant update on the QueryChangedDiskAreas API bug in vSphere CBT. Please do treat this information as "work in progress" update – normally, I would have hold off sharing this until the official VMware KB article. However, this issue is just too high profile and has too many people not sleeping well over it – so, I could not pass sharing these good news (good for the majority of you – those NOT using VVols). Besides, I think VMware engineers have nailed it anyway, as intuitively VVols has been the primary suspect for me due to being relatively new technology, making such teething issues somewhat expected. Plus, I am convinced many more Veeam customers would have been reporting actual corruptions due to this issue, if it was not limited to some not so common deployment scenario.
Long story short, in their testing VMware VADP QC team was able to reproduce an issue which looks to be similar to the issue that is being investigated. Essentially, they observed CBT stop tracking changes after performing a regular VMotion (host change only) for the VMs located on a VVols datastore. And they've reproduced the issue on storage devices from two different vendors, meaning the issue is most likely not a storage-specific one (apparently CBT kernel module simply stops recording any changes after vMotion). On a bright side, all other datastore types – VMFS, NFS and VSAN – were also tested and found to be NOT affected by the issue... did I just hear a worldwide sigh of relief? And VVols users - sorry for the bad news, I'll keep you updated as we learn more from VMware VADP and VVols teams.
- Only affects VVOLs
- Not specific Nimble - (reproducible with 2 different storage vendors)
which makes me really thankful to the affected customer for all of his patience with this matter
-
- Chief Product Officer
- Posts: 31798
- Liked: 7297 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
Seriously, thank you. And thanks for reposting an update from the digest in this thread - I totally forgot to do it since I've been on the road for the past few days.
-
- Enthusiast
- Posts: 26
- Liked: 18 times
- Joined: Aug 06, 2017 10:12 am
- Full Name: Stas Korzovsky
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
Thank you! It's really not an obvious thing for a top manager to be personally involved in a specific case to such degree.
-
- Expert
- Posts: 214
- Liked: 61 times
- Joined: Feb 18, 2013 10:45 am
- Full Name: Stan G
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
That's why Anton is the best!
-
- Enthusiast
- Posts: 26
- Liked: 18 times
- Joined: Aug 06, 2017 10:12 am
- Full Name: Stas Korzovsky
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
Just got an update from VMware: a fix is on the way, currently planned to be officially released at the end of July 2018. That's for ESXi 6.0.
Bear in mind ESXi versions 6.5 and 6.7 are also affected - this fix will be also ported to their respective update releases.
Bear in mind ESXi versions 6.5 and 6.7 are also affected - this fix will be also ported to their respective update releases.
-
- Service Provider
- Posts: 26
- Liked: 4 times
- Joined: Dec 09, 2009 9:59 pm
- Full Name: James Sprinkle
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
For reference, here's the link to the VMware KB article:
https://kb.vmware.com/kb/55800
Until the fix is available, what VBR users can do to work around is this:
For each VM on VVols
https://kb.vmware.com/kb/55800
Until the fix is available, what VBR users can do to work around is this:
For each VM on VVols
- Disable automatic vMotions
- Reset CBT and perform a full backup
- Reset CBT again after any manual vMotion of the VM
-
- Service Provider
- Posts: 19
- Liked: 3 times
- Joined: Jun 03, 2018 3:13 pm
- Full Name: Farzon David Almaneih
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
So I have clients with a mix of 5.x and 6.x. If none of them have VVols, can we have CBT enabled safely? We are moving from StorageCraft to Veeam and this issue has been a major pucker factor for us.
-
- Chief Product Officer
- Posts: 31798
- Liked: 7297 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
-
- Enthusiast
- Posts: 65
- Liked: 45 times
- Joined: Feb 14, 2018 1:47 pm
- Full Name: Chris Garlington
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
I thought I'd bump this, as VMware has updated their KB article related to this issue specifically citing a resolution available in misc versions of vSphere ESXi that are upcoming:
https://kb.vmware.com/s/article/55800
See resolution: "This issue is resolved in VMware vSphere 6.0p07, 6.0p08, 6.5p03, 6.5u2, and 6.7u1."
Hopefully in the coming weeks/months (I think 6.7U1 at least is slated for OCT) we'll be seeing a fix come out. After some vetting, I'll be excited to start migrating everything to VVOLs.
https://kb.vmware.com/s/article/55800
See resolution: "This issue is resolved in VMware vSphere 6.0p07, 6.0p08, 6.5p03, 6.5u2, and 6.7u1."
Hopefully in the coming weeks/months (I think 6.7U1 at least is slated for OCT) we'll be seeing a fix come out. After some vetting, I'll be excited to start migrating everything to VVOLs.
-
- Service Provider
- Posts: 26
- Liked: 4 times
- Joined: Dec 09, 2009 9:59 pm
- Full Name: James Sprinkle
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
They've removed this resolved-in list from the KB article. Now the resolution section says this:ctg49 wrote: ↑Aug 28, 2018 6:10 pm https://kb.vmware.com/s/article/55800
See resolution: "This issue is resolved in VMware vSphere 6.0p07, 6.0p08, 6.5p03, 6.5u2, and 6.7u1."
So, fixed for 6.0, but apparently no fix yet for 6.5 or 6.7. I thought "6.5u2" being in the list was suspect anyway, because that version was released way back in May.This issue is resolved in ESXi600-201807001, available at VMware Downloads.
Note: This is a known issue affecting VMware ESXi 6.5.x and 6.7.x.
-
- Enthusiast
- Posts: 65
- Liked: 45 times
- Joined: Feb 14, 2018 1:47 pm
- Full Name: Chris Garlington
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
So, 6.7U1 has now been released, and specifically cited in the ESXi release notes is this:
PR 2119610: Migration of a virtual machine with a Filesystem Device Switch (FDS) on a vSphere Virtual Volumes datastore by using VMware vSphere vMotion might cause multiple issues
If you use vSphere vMotion to migrate a virtual machine with file device filters from a vSphere Virtual Volumes datastore to another host, and the virtual machine has either of the Changed Block Tracking (CBT), VMware vSphere Flash Read Cache (VFRC) or I/O filters enabled, the migration might cause issues with any of the features. During the migration, the file device filters might not be correctly transferred to the host. As a result, you might see corrupted incremental backups in CBT, performance degradation of VFRC and cache I/O filters, corrupted replication I/O filters, and disk corruption, when cache I/O filters are configured in write-back mode. You might also see issues with the virtual machine encryption.
Тhis issue is resolved in this release.
This looks like the fix we've been waiting for for VVOLs and CBT, which makes me thrilled to hear. I look forward to hearing how it tests out.
Unfortunately, it also sounds like they introduced something rather gamebreaking (based on the sticky) so holding off on any upgrades until that's resolved.
PR 2119610: Migration of a virtual machine with a Filesystem Device Switch (FDS) on a vSphere Virtual Volumes datastore by using VMware vSphere vMotion might cause multiple issues
If you use vSphere vMotion to migrate a virtual machine with file device filters from a vSphere Virtual Volumes datastore to another host, and the virtual machine has either of the Changed Block Tracking (CBT), VMware vSphere Flash Read Cache (VFRC) or I/O filters enabled, the migration might cause issues with any of the features. During the migration, the file device filters might not be correctly transferred to the host. As a result, you might see corrupted incremental backups in CBT, performance degradation of VFRC and cache I/O filters, corrupted replication I/O filters, and disk corruption, when cache I/O filters are configured in write-back mode. You might also see issues with the virtual machine encryption.
Тhis issue is resolved in this release.
This looks like the fix we've been waiting for for VVOLs and CBT, which makes me thrilled to hear. I look forward to hearing how it tests out.
Unfortunately, it also sounds like they introduced something rather gamebreaking (based on the sticky) so holding off on any upgrades until that's resolved.
-
- Enthusiast
- Posts: 60
- Liked: 14 times
- Joined: Jun 25, 2015 12:59 am
- Full Name: Rick Boynton
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
It certainly would have been good news if they did not break something with VCenter. Seems a bit strange that their QA did not catch that before the release. I would have thought that the pre-release would have been sent to vendors like Veeam for testing before the public release. I'm glad VMWare was focused on getting the CBT/VVol issue resolved, but they really need to slow down and make sure what they produce is vetted.
-
- Enthusiast
- Posts: 65
- Liked: 45 times
- Joined: Feb 14, 2018 1:47 pm
- Full Name: Chris Garlington
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
I suspect that like many major vendors (looking at you, MS), they just don't particularly care if their updates break integration with other products. They're the 'rock' and expect others to be the 'river' working around them, as it were.
-
- Expert
- Posts: 128
- Liked: 14 times
- Joined: Jul 02, 2010 2:57 pm
- Full Name: Chad
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
Would CBT be a culprit to breaking instant recovery from a pure array snapshot? Any time I try to vmotion a VM guest off a snapshot and have not powered it on, it fails and veeam support is suggesting turning off CBT to fix it (or test it that is).
-
- Chief Product Officer
- Posts: 31798
- Liked: 7297 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
No, this is some completely unrelated issue to what is being discussed in this topic.
-
- Expert
- Posts: 128
- Liked: 14 times
- Joined: Jul 02, 2010 2:57 pm
- Full Name: Chad
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
Side note, even though not related to this thread there is a bug be reported to vmware development in regards to CBT (this is on the vmware side, not veeam side).
-
- Chief Product Officer
- Posts: 31798
- Liked: 7297 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
Do tell more!
-
- Expert
- Posts: 128
- Liked: 14 times
- Joined: Jul 02, 2010 2:57 pm
- Full Name: Chad
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
You cannot vmotion a vmware guest if the snapshot was taken while the vm was powered on. If you want to vmotion it, you must power it on (then off if you like) and that point you are able to vmotion it off the snapshot.
-
- Chief Product Officer
- Posts: 31798
- Liked: 7297 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
Hmm... but you said the issue was with CBT, not with vMotion? Was it just a typo?
-
- Expert
- Posts: 128
- Liked: 14 times
- Joined: Jul 02, 2010 2:57 pm
- Full Name: Chad
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
you can't vmotion the vm guest off unless you power it on first. vmware and veeam are going back and forth to figure out the issue.
"12/17/2019 11:16:55 AM :: Relocating VM Error: Error caused by file /vmfs/volumes/5df90e91-36276908-6519-a4badb1e0a94/DATASTORE/VM_NAME.vmdk
"12/17/2019 11:17:23 AM :: Failed to process VM CIname-TEST at 2019-12-17T111723 Error: Error caused by file /vmfs/volumes/5df90e91-36276908-6519-a4badb1e0a94/DATASTORE/VM_NAME.vmdk"
"12/17/2019 11:16:55 AM :: Relocating VM Error: Error caused by file /vmfs/volumes/5df90e91-36276908-6519-a4badb1e0a94/DATASTORE/VM_NAME.vmdk
"12/17/2019 11:17:23 AM :: Failed to process VM CIname-TEST at 2019-12-17T111723 Error: Error caused by file /vmfs/volumes/5df90e91-36276908-6519-a4badb1e0a94/DATASTORE/VM_NAME.vmdk"
-
- Expert
- Posts: 128
- Liked: 14 times
- Joined: Jul 02, 2010 2:57 pm
- Full Name: Chad
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
lots of email going back and forth but wanted to post here in case anyone else was running into this....
As VMware Engineering informed that in vsphere 6.7, the ctk file will be checked before opening. It is more appropriate to report the ctk file unclean error than not to check it at that point
As VMware Engineering informed that in vsphere 6.7, the ctk file will be checked before opening. It is more appropriate to report the ctk file unclean error than not to check it at that point
-
- Expert
- Posts: 128
- Liked: 14 times
- Joined: Jul 02, 2010 2:57 pm
- Full Name: Chad
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
veeam support tech is spinning up his lab to assist vmware in this since they had some questions.
-
- Expert
- Posts: 128
- Liked: 14 times
- Joined: Jul 02, 2010 2:57 pm
- Full Name: Chad
- Contact:
Re: vSphere CBT bug with QueryChangedDiskAreas("*")
Vmware took so long and kept asking for more logs which they should have already and keep getting confused on the goal that I have up on them and requested to be removed from the email thread so I'm not sure if it will ever get resolved.
Here are the my case numbers if ever needed: 04011970 & 04051725
Here are the my case numbers if ever needed: 04011970 & 04051725
Who is online
Users browsing this forum: dbeerts, Semrush [Bot] and 89 guests