-
- Veteran
- Posts: 381
- Liked: 38 times
- Joined: Oct 17, 2013 10:02 am
- Full Name: Mark
- Location: UK
- Contact:
Re: VMware CBT bug KB 2090639
Hi
I've regularly expanded data disks since using Veeam, and every time I do, the Job detects that the disk has changed so disables CBT for the job.
e.g.
14/10/2014 02:11:57 :: Disk [10Gblabla] BLA-SQL-PUB-01/BLA-SQL-PUB-01_3.vmdk size changed. Changed block tracking is disabled.
So if that happens, I don't need to worry about this bug?
I've regularly expanded data disks since using Veeam, and every time I do, the Job detects that the disk has changed so disables CBT for the job.
e.g.
14/10/2014 02:11:57 :: Disk [10Gblabla] BLA-SQL-PUB-01/BLA-SQL-PUB-01_3.vmdk size changed. Changed block tracking is disabled.
So if that happens, I don't need to worry about this bug?
-
- Expert
- Posts: 235
- Liked: 37 times
- Joined: Aug 06, 2013 10:40 am
- Full Name: Peter Jansen
- Contact:
Re: VMware CBT bug KB 2090639
So, to summarize what we know till now :
- expanding with more then 128GB can cause the bug, no matter the size of the vmdk
- you don't need to disable CBT (needing VM power off )
- CBT reset for the VM is sufficient and can be acclompished by a storage vmotion (which up untill Vsphere 5.5 causes a CBT reset and thus a full backup)
- Anyway, to get rid of the potential bug a full backup is inevitable so CBT tables are properly fixed.
- and everytime that you expand > 128 GB you need to do that (until the bug is fixed).
Correct?
- expanding with more then 128GB can cause the bug, no matter the size of the vmdk
- you don't need to disable CBT (needing VM power off )
- CBT reset for the VM is sufficient and can be acclompished by a storage vmotion (which up untill Vsphere 5.5 causes a CBT reset and thus a full backup)
- Anyway, to get rid of the potential bug a full backup is inevitable so CBT tables are properly fixed.
- and everytime that you expand > 128 GB you need to do that (until the bug is fixed).
Correct?
-
- Veteran
- Posts: 381
- Liked: 38 times
- Joined: Oct 17, 2013 10:02 am
- Full Name: Mark
- Location: UK
- Contact:
Re: VMware CBT bug KB 2090639
I'm not aware that storage vmotion on 5.0/5.1 resets CBT, we do this a lot and I'm sure it doesn't force a full backup in Veeam.CBT reset for the VM is sufficient and can be acclompished by a storage vmotion (which up untill Vsphere 5.5 causes a CBT reset and thus a full backup)
-
- Enthusiast
- Posts: 41
- Liked: 1 time
- Joined: Sep 07, 2009 11:58 am
- Full Name: Dirk Reimold
- Contact:
Re: VMware CBT bug KB 2090639
That will take a Little pressure out for us, since we usually did not expand a disk with mor than 50 GB in one step.Gostev wrote: One other thing that we have confirmed by now is that the size of virtual disk before or after expansion does not seem to matter. What matters is whether the virtual disk was increased for more than 128GB in size at once. For example, 200GB>300GB expansion is fine, but 200GB>350GB will cause CBT bug.
I think what now would be helpful is a timeframe when a fix for that error could be available, so that everyone could make a decision between waiting for a fix and manually resetting CBT.
For me it is still hard to accept that I do replicate a fileserver to our standby datacenter + backup that VM to an offsite repository and in a failure of the original VM I could end up having lost everything - not to mention that I all my history backups may be corrupt.
Thanks
Dirk
-
- Expert
- Posts: 235
- Liked: 37 times
- Joined: Aug 06, 2013 10:40 am
- Full Name: Peter Jansen
- Contact:
Re: VMware CBT bug KB 2090639
lando_uk wrote: I'm not aware that storage vmotion on 5.0/5.1 resets CBT, we do this a lot and I'm sure it doesn't force a full backup in Veeam.
I stumbled into that after moving 1,6 TB of VM's...
kb.vmware.com/kb/2048201
-
- Influencer
- Posts: 23
- Liked: 4 times
- Joined: Jan 16, 2011 10:24 am
- Full Name: Wouter
- Contact:
Re: VMware CBT bug KB 2090639
me tooisaako wrote:Just subscribing to this thread awaiting more info.
-
- Influencer
- Posts: 23
- Liked: 7 times
- Joined: Jun 13, 2010 10:36 pm
- Contact:
Re: VMware CBT bug KB 2090639
Anton, a poster on reddit stated VMware support told him the followingGostev wrote:What matters is whether the virtual disk was increased for more than 128GB in size at once. For example, 200GB>300GB expansion is fine, but 200GB>350GB will cause CBT bug.
http://www.reddit.com/r/sysadmin/commen ... ?context=3Just had this in from VMware support "Yes the vmdk which is extended by 20 GB ten times will be affected with this issue as the expansion of disk is more than 128 GB when added together."
Can you please confirm if this is, or is not, the case.
Thanks
-
- Veteran
- Posts: 381
- Liked: 38 times
- Joined: Oct 17, 2013 10:02 am
- Full Name: Mark
- Location: UK
- Contact:
Re: VMware CBT bug KB 2090639
I've not seen this my myself, and I just backtracked trough my logs to check a VM that I moved other week. - It didn't reset CBT with a storage vmotion.Peejay62 wrote:I stumbled into that after moving 1,6 TB of VM's...
kb.vmware.com/kb/2048201
Only time I've seen Veeam trigger is CBT reset is when you expand a vdisk, or if the VM has a new ID (if you do a old vcenter to new vcenter quick migration)
-
- Chief Product Officer
- Posts: 31783
- Liked: 7283 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: VMware CBT bug KB 2090639
Hi Ratcha, as I have noted earlier, it will take us some time to test all possible scenarios. Unlike VMware, we don't have access to the source code, and so can only find out things empirically through testing. I cannot comment on specific statements right now, but we will post the summary about this issue as soon as all the required testing is completed. Thanks!Ratcha wrote:Can you please confirm if this is, or is not, the case.
-
- Veteran
- Posts: 391
- Liked: 39 times
- Joined: Jun 08, 2010 2:01 pm
- Full Name: Joerg Riether
- Contact:
Re: VMware CBT bug KB 2090639
Dirk, CBT relies completely on vmware apis and vmware technology. There is no "fix" VEEAM can provide. VEEAM can only provide tricks or workarounds - for example checking if the size has changed and then do force a cbt reset.Reimold wrote: I think what now would be helpful is a timeframe when a fix for that error could be available, so that everyone could make a decision between waiting for a fix and manually resetting CBT.
For me it is still hard to accept that I do replicate a fileserver to our standby datacenter + backup that VM to an offsite repository and in a failure of the original VM I could end up having lost everything - not to mention that I all my history backups may be corrupt.
So - this is just to clarify - this is a vmware issue. VMware has to come with a fix.
Best regards,
Joerg
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Nov 06, 2013 4:01 pm
- Full Name: Paul Bacon
- Contact:
Re: VMware CBT bug KB 2090639
Does anyone know if running an Instant Recovery job for a particular VM could be used as a way to verify if the issue is occurring on that particular server?
Paul
Paul
-
- Chief Product Officer
- Posts: 31783
- Liked: 7283 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: VMware CBT bug KB 2090639
Paul, see previous page of this topic:
Reimold wrote:Vmware-Support has just updated my ticket:
- to check the backup of a VM is OK they suggest to do a "Veeam Instant VM Recovery" and the do a chkdisk /fsck on the expanded drive
- VMware is working on a fix - but not timeframe yet
-
- Enthusiast
- Posts: 41
- Liked: 1 time
- Joined: Sep 07, 2009 11:58 am
- Full Name: Dirk Reimold
- Contact:
Re: VMware CBT bug KB 2090639
Joerg,joergr wrote: Dirk, CBT relies completely on vmware apis and vmware technology. There is no "fix" VEEAM can provide. VEEAM can only provide tricks or workarounds - for example checking if the size has changed and then do force a cbt reset.
So - this is just to clarify - this is a vmware issue. VMware has to come with a fix.
Best regards,
Joerg
I am completly aware that this is a VMware bug. But since Goestev talks about "a hot fix for both 7.0 Patch 4 and 8.0 code branches that will reset CBT automatically upon detecting source virtual disk size Change" I have picked that up to ask for more Information.
Dirk
-
- Chief Product Officer
- Posts: 31783
- Liked: 7283 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: VMware CBT bug KB 2090639
Yes, that's exactly the plan and what we are building right now. We will need a few days to implement and test this.joergr wrote:There is no "fix" VEEAM can provide. VEEAM can only provide tricks or workarounds - for example checking if the size has changed and then do force a cbt reset.
-
- Veteran
- Posts: 381
- Liked: 38 times
- Joined: Oct 17, 2013 10:02 am
- Full Name: Mark
- Location: UK
- Contact:
Re: VMware CBT bug KB 2090639
I thought it already did this, it does for me anyway... ( see earlier post )Gostev wrote: Yes, that's exactly the plan and what we are building right now. We will need a few days to implement and test this.
-
- Veteran
- Posts: 391
- Liked: 39 times
- Joined: Jun 08, 2010 2:01 pm
- Full Name: Joerg Riether
- Contact:
Re: VMware CBT bug KB 2090639
@lando_uk: You have to differentiate between veeam using or not using cbt and vmware providing the cbt tracking apis and technologies. At this time, when b+r recognizes a size change with cbt enabled vdisk it backups this disk in wise foresight at one time without using cbt. BUT the original vmware cbt data (the CTK files) are still in place. Thus - this is no real help here - at least no 100% because we need to make sure the actual vmware cbt data is undergoing a real reset.
-
- Veeam Vanguard
- Posts: 103
- Liked: 17 times
- Joined: Aug 05, 2014 1:13 pm
- Full Name: Dean lewis
- Contact:
Re: VMware CBT bug KB 2090639
I believe that Veeam took the right action to notify people via email.
This issue is clearly quite big, if you have it, but VMware say that you might match the criteria of the issue but not experience it.
Joy.
I've wrote a blog post here about it and keeping it updated,
I've also contacted other backup vendors to get their comments if they are affected by it too.
http://www.educationalcentre.co.uk/majo ... r-backups/
Dean
This issue is clearly quite big, if you have it, but VMware say that you might match the criteria of the issue but not experience it.
Joy.
I've wrote a blog post here about it and keeping it updated,
I've also contacted other backup vendors to get their comments if they are affected by it too.
http://www.educationalcentre.co.uk/majo ... r-backups/
Dean
Technical Architect
Veeam Certified Architect
Veeam Vanguard
Veeam Certified Architect
Veeam Vanguard
- Personal Technical Blog - www.veducate.co.uk
- Twitter - @saintdle
-
- Veteran
- Posts: 338
- Liked: 35 times
- Joined: Jan 20, 2012 2:36 pm
- Full Name: Christensen Farms
- Contact:
Re: VMware CBT bug KB 2090639
So do I have this right?
The issue is a VMWare issue as we all know, but VEEAM is working on a workaround fix and we can expect a VEEAM patch for both version 7 and the upcoming v8 that will help us resolve this issue from a backup perspective? And this patch we can expect to be out in the near future such as a week or two?
Like others, if I can get the VEEAM patch within a week or two, I would choose not to go through all the other more intrusive suggestions that may or may not work. There have been a lot of posts about other ways that others suggest do or might fix the issue. I'd rather have something concrete from VEEAM or VMWare that states it will in fact resolve the issue.
Thanks VEEAM for working on this and taking care of your customers even though this isn't an issue due to VEEAM itself.
The issue is a VMWare issue as we all know, but VEEAM is working on a workaround fix and we can expect a VEEAM patch for both version 7 and the upcoming v8 that will help us resolve this issue from a backup perspective? And this patch we can expect to be out in the near future such as a week or two?
Like others, if I can get the VEEAM patch within a week or two, I would choose not to go through all the other more intrusive suggestions that may or may not work. There have been a lot of posts about other ways that others suggest do or might fix the issue. I'd rather have something concrete from VEEAM or VMWare that states it will in fact resolve the issue.
Thanks VEEAM for working on this and taking care of your customers even though this isn't an issue due to VEEAM itself.
-
- Chief Product Officer
- Posts: 31783
- Liked: 7283 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: VMware CBT bug KB 2090639
More news based on more testing.
Multiple small disk size increases exhibit confusing behavior. Extending 175GB disk by 45GB resulted in no issues, however increasing it by 45GB once again broke CBT scope despite total increase over two action was 90GB (which is less than 128GB). Having access to source code would help, but without one, the pattern is hard to understand. Thus, we have decided to stop guessing around data corruption (risky stuff) and will recommend CBT reset after any disk size increase at all.
We have also confirmed that Active Full is not required, as the following job run fixes everything by identifying and transferring all the data that was missing previously on the target due to the incorrect CBT scope at source.
Our plan is to provide a patch for both v7 and v8 within 1-2 weeks from now. This patch will make jobs to automatically reset CBT on a processed VM upon detecting a change of virtual disk configuration.
Multiple small disk size increases exhibit confusing behavior. Extending 175GB disk by 45GB resulted in no issues, however increasing it by 45GB once again broke CBT scope despite total increase over two action was 90GB (which is less than 128GB). Having access to source code would help, but without one, the pattern is hard to understand. Thus, we have decided to stop guessing around data corruption (risky stuff) and will recommend CBT reset after any disk size increase at all.
We have also confirmed that Active Full is not required, as the following job run fixes everything by identifying and transferring all the data that was missing previously on the target due to the incorrect CBT scope at source.
Our plan is to provide a patch for both v7 and v8 within 1-2 weeks from now. This patch will make jobs to automatically reset CBT on a processed VM upon detecting a change of virtual disk configuration.
-
- VP, Product Management
- Posts: 6034
- Liked: 2859 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: VMware CBT bug KB 2090639
I have speculated from the beginning that the problem likely occurs when a VMDK crosses a 128GB boundary. For example, if you have a 100GB VMDK, and add 20GB to it, I'd bet it would be fine since the VMDK is still only 120GB, but if you add 20GB more, then it would be 140GB and cross a 128GB boundary. Obviously if you add more that 128GB at a time, you would always cross a 128GB boundary. That theory would continue to fit the above scenario as well, 175GB + 45GB is 220GB, so still hasn't cross the next 128GB boundary, which would be 256GB, however, add another 45GB, and then your at 265GB, which would cross it and thus break CBT.
This is just my own personal guess based on the information I've seen and also my my own inferred understanding of the CTK data structures, which is certainly incomplete. I also spent some type thinking from a code perspective how/why 128GB might be important. Obviously having no access to code it's hard to know for sure, but I can see how the CTK structure might cross some internal boundary that requires creating a new structure at each 128GB point, and somehow this is failing to be triggered when VMDKs are expected across those boundaries, leaving the CTK file unable to hold the new data.
This is just my own personal guess based on the information I've seen and also my my own inferred understanding of the CTK data structures, which is certainly incomplete. I also spent some type thinking from a code perspective how/why 128GB might be important. Obviously having no access to code it's hard to know for sure, but I can see how the CTK structure might cross some internal boundary that requires creating a new structure at each 128GB point, and somehow this is failing to be triggered when VMDKs are expected across those boundaries, leaving the CTK file unable to hold the new data.
-
- Enthusiast
- Posts: 41
- Liked: 1 time
- Joined: Sep 07, 2009 11:58 am
- Full Name: Dirk Reimold
- Contact:
Re: VMware CBT bug KB 2090639
Hell Gostev,Gostev wrote:
Our plan is to provide a patch for both v7 and v8 within 1-2 weeks from now. This patch will make jobs to automatically reset CBT on a processed VM upon detecting a change of virtual disk configuration.
thank you for the additional information and the timeframe for the patch. Maybe it is possible to implement the "reset CBT with the next job tun" as a setting you could also select manually for a job - who knows when it will be needed?
Dirk
-
- Enthusiast
- Posts: 49
- Liked: 9 times
- Joined: Aug 16, 2013 1:34 pm
- Full Name: Martin Etheridge
- Contact:
Re: VMware CBT bug KB 2090639
Presumably the Veeam patch will only address any disk configuration changes made after the patch is applied. We will therefore still need to sort out VMs which may be affected by previously implemented disk changes? This makes Dirk's suggestion of a "reset CBT on next job run" option all the more useful.
Hopefully VMware will fix this properly at some stage. In the meantime, many thanks to Veeam for helping us to work around it.
Hopefully VMware will fix this properly at some stage. In the meantime, many thanks to Veeam for helping us to work around it.
-
- Veeam ProPartner
- Posts: 23
- Liked: 11 times
- Joined: Oct 24, 2011 12:55 pm
- Full Name: Brian Farrugia
- Location: Malta, Europe
- Contact:
Re: VMware CBT bug KB 2090639
Hi Gostev,Gostev wrote: This patch will make jobs to automatically reset CBT on a processed VM upon detecting a change of virtual disk configuration.
May I suggest that this is clearly shown in the logs i.e the resetting of the CBT and the reason. It would help explaining to the customer why the backup size has increased without too much digging.
I would also like to suggest that the CBT is reset with every Active Full. My reasoning is that since it is an Active Full, might as well reset it since it will take the same capacity. It would also cover should veeam not detect the the vmdk has increased. Unlikely but you never know.
Just my 2c.
Thanks for bringing up this issue in your newsletter.
-
- Enthusiast
- Posts: 82
- Liked: 33 times
- Joined: Mar 25, 2013 7:37 pm
- Full Name: Lars Pisanec
- Contact:
Re: VMware CBT bug KB 2090639
+1 for a setting to "reset CBT during next job run".Reimold wrote: Hell Gostev,
thank you for the additional information and the timeframe for the patch. Maybe it is possible to implement the "reset CBT with the next job tun" as a setting you could also select manually for a job - who knows when it will be needed?
Dirk
-
- Enthusiast
- Posts: 30
- Liked: 2 times
- Joined: Nov 07, 2012 8:13 pm
- Contact:
Re: VMware CBT bug KB 2090639
All great ideas going forward, but looking backward at months and months of backups, I'm struggling to figure out the best way to verify those files. Mounting each of them and running a chkdsk really doesn't seem practical for 50+ VMs, each with more than 100 restore points. I'm open to any ideas anyone has about a way to automate this, or better yet, a way to tell from the backup metadata when a vmdk size change may have occurred, limiting the test set to only those backups after the size change event.
One other question about validation: Is it fair to say that if I can successfully restore a file from a backup using the Guest Files/Windows restore functionality that the backup is intact? Is that enough of a test or is it possible corruption wouldn't necessarily show up unless I happened to pick a file living in a corrupted block?
Thanks,
Geoff
One other question about validation: Is it fair to say that if I can successfully restore a file from a backup using the Guest Files/Windows restore functionality that the backup is intact? Is that enough of a test or is it possible corruption wouldn't necessarily show up unless I happened to pick a file living in a corrupted block?
Thanks,
Geoff
-
- Enthusiast
- Posts: 30
- Liked: 2 times
- Joined: Nov 07, 2012 8:13 pm
- Contact:
Re: VMware CBT bug KB 2090639
Bad form replying to my own post, but I was looking through the job history logs and it does appear there is data there that would help identify every vmdk size change, and an indirect way to figure out whether that change crossed the problematic 128GB boundary.
Jobs that have experienced a size change have a "Warning" completion status, and looking at the details you can see which vmdk changed, and the new size. If you looked at the previous backup you could see the vmdk size before the change and determine if it crossed a boundary. I have my history set to store 52 weeks (maybe that's the default, I don't remember if I changed it on install), so I have a lot of info at least about the last year of backups. Any chance Veeam could supply a query that would run through the backup history and give us a report of what changed when, and whether it's a potentially corrupting event? (Looking at the DB now to see if I can do this myself. )
Geoff
Jobs that have experienced a size change have a "Warning" completion status, and looking at the details you can see which vmdk changed, and the new size. If you looked at the previous backup you could see the vmdk size before the change and determine if it crossed a boundary. I have my history set to store 52 weeks (maybe that's the default, I don't remember if I changed it on install), so I have a lot of info at least about the last year of backups. Any chance Veeam could supply a query that would run through the backup history and give us a report of what changed when, and whether it's a potentially corrupting event? (Looking at the DB now to see if I can do this myself. )
Geoff
-
- Chief Product Officer
- Posts: 31783
- Liked: 7283 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: VMware CBT bug KB 2090639
I guess you have missed per my first post in this thread, but increasing size disk from 200GB to 300GB does not cause the issue, despite crossing 128GB boundary at 256GB mark.tsightler wrote:I have speculated from the beginning that the problem likely occurs when a VMDK crosses a 128GB boundary. For example, if you have a 100GB VMDK, and add 20GB to it, I'd bet it would be fine since the VMDK is still only 120GB, but if you add 20GB more, then it would be 140GB and cross a 128GB boundary. Obviously if you add more that 128GB at a time, you would always cross a 128GB boundary. That theory would continue to fit the above scenario as well, 175GB + 45GB is 220GB, so still hasn't cross the next 128GB boundary, which would be 256GB, however, add another 45GB, and then your at 265GB, which would cross it and thus break CBT.
This is just my own personal guess based on the information I've seen and also my my own inferred understanding of the CTK data structures, which is certainly incomplete. I also spent some type thinking from a code perspective how/why 128GB might be important. Obviously having no access to code it's hard to know for sure, but I can see how the CTK structure might cross some internal boundary that requires creating a new structure at each 128GB point, and somehow this is failing to be triggered when VMDKs are expected across those boundaries, leaving the CTK file unable to hold the new data.
-
- Chief Product Officer
- Posts: 31783
- Liked: 7283 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: VMware CBT bug KB 2090639
This is correct.geofftx wrote:corruption wouldn't necessarily show up unless I happened to pick a file living in a corrupted block?
-
- Chief Product Officer
- Posts: 31783
- Liked: 7283 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: VMware CBT bug KB 2090639
Yes, it will be shown. However, backup size is not impacted with CBT reset (in case of our product at least), only duration will be longer because the job needs to read the entire VMDK.BriFar wrote:May I suggest that this is clearly shown in the logs i.e the resetting of the CBT and the reason. It would help explaining to the customer why the backup size has increased without too much digging.
This is not a good idea to do, because often there could be other jobs processing the same VM (for example, backing up and replicating the same VM). The other job will be impacted by CBT reset, and will likely not meet its RTOs as the result.BriFar wrote:I would also like to suggest that the CBT is reset with every Active Full. My reasoning is that since it is an Active Full, might as well reset it since it will take the same capacity. It would also cover should veeam not detect the the vmdk has increased. Unlikely but you never know.
-
- VP, Product Management
- Posts: 6034
- Liked: 2859 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: VMware CBT bug KB 2090639
Well, I did, but in that post you said "for example" so I didn't know if that was an actual performed and confirmed test, or if you just pulled those numbers from the air to use as an example of how expanding more than 128GB always causes the bug, but less may not. If that was an actual tested case, then yes, that blows up my theory, but hey it was fun to guess.Gostev wrote:I guess you have missed per my first post in this thread, but increasing size disk from 200GB to 300GB does not cause the issue, despite crossing 128GB boundary at 256GB mark.
Who is online
Users browsing this forum: Egor Yakovlev and 57 guests