Host-based backup of VMware vSphere VMs.
Locked
lando_uk
Veteran
Posts: 381
Liked: 38 times
Joined: Oct 17, 2013 10:02 am
Full Name: Mark
Location: UK
Contact:

Re: VMware CBT bug KB 2090639

Post by lando_uk »

Hi

I've regularly expanded data disks since using Veeam, and every time I do, the Job detects that the disk has changed so disables CBT for the job.
e.g.
14/10/2014 02:11:57 :: Disk [10Gblabla] BLA-SQL-PUB-01/BLA-SQL-PUB-01_3.vmdk size changed. Changed block tracking is disabled.

So if that happens, I don't need to worry about this bug?
Peejay62
Expert
Posts: 235
Liked: 37 times
Joined: Aug 06, 2013 10:40 am
Full Name: Peter Jansen
Contact:

Re: VMware CBT bug KB 2090639

Post by Peejay62 »

So, to summarize what we know till now :
- expanding with more then 128GB can cause the bug, no matter the size of the vmdk
- you don't need to disable CBT (needing VM power off )
- CBT reset for the VM is sufficient and can be acclompished by a storage vmotion (which up untill Vsphere 5.5 causes a CBT reset and thus a full backup)
- Anyway, to get rid of the potential bug a full backup is inevitable so CBT tables are properly fixed.
- and everytime that you expand > 128 GB you need to do that (until the bug is fixed).

Correct?
lando_uk
Veteran
Posts: 381
Liked: 38 times
Joined: Oct 17, 2013 10:02 am
Full Name: Mark
Location: UK
Contact:

Re: VMware CBT bug KB 2090639

Post by lando_uk »

CBT reset for the VM is sufficient and can be acclompished by a storage vmotion (which up untill Vsphere 5.5 causes a CBT reset and thus a full backup)
I'm not aware that storage vmotion on 5.0/5.1 resets CBT, we do this a lot and I'm sure it doesn't force a full backup in Veeam.
Reimold
Enthusiast
Posts: 41
Liked: 1 time
Joined: Sep 07, 2009 11:58 am
Full Name: Dirk Reimold
Contact:

Re: VMware CBT bug KB 2090639

Post by Reimold »

Gostev wrote: One other thing that we have confirmed by now is that the size of virtual disk before or after expansion does not seem to matter. What matters is whether the virtual disk was increased for more than 128GB in size at once. For example, 200GB>300GB expansion is fine, but 200GB>350GB will cause CBT bug.
That will take a Little pressure out for us, since we usually did not expand a disk with mor than 50 GB in one step.

I think what now would be helpful is a timeframe when a fix for that error could be available, so that everyone could make a decision between waiting for a fix and manually resetting CBT.

For me it is still hard to accept that I do replicate a fileserver to our standby datacenter + backup that VM to an offsite repository and in a failure of the original VM I could end up having lost everything - not to mention that I all my history backups may be corrupt.

Thanks

Dirk
Peejay62
Expert
Posts: 235
Liked: 37 times
Joined: Aug 06, 2013 10:40 am
Full Name: Peter Jansen
Contact:

Re: VMware CBT bug KB 2090639

Post by Peejay62 »

lando_uk wrote: I'm not aware that storage vmotion on 5.0/5.1 resets CBT, we do this a lot and I'm sure it doesn't force a full backup in Veeam.

I stumbled into that after moving 1,6 TB of VM's...
kb.vmware.com/kb/2048201
VER
Influencer
Posts: 23
Liked: 4 times
Joined: Jan 16, 2011 10:24 am
Full Name: Wouter
Contact:

Re: VMware CBT bug KB 2090639

Post by VER »

isaako wrote:Just subscribing to this thread awaiting more info.
me too
Ratcha
Influencer
Posts: 23
Liked: 7 times
Joined: Jun 13, 2010 10:36 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by Ratcha »

Gostev wrote:What matters is whether the virtual disk was increased for more than 128GB in size at once. For example, 200GB>300GB expansion is fine, but 200GB>350GB will cause CBT bug.
Anton, a poster on reddit stated VMware support told him the following
Just had this in from VMware support "Yes the vmdk which is extended by 20 GB ten times will be affected with this issue as the expansion of disk is more than 128 GB when added together."
http://www.reddit.com/r/sysadmin/commen ... ?context=3

Can you please confirm if this is, or is not, the case.

Thanks
lando_uk
Veteran
Posts: 381
Liked: 38 times
Joined: Oct 17, 2013 10:02 am
Full Name: Mark
Location: UK
Contact:

Re: VMware CBT bug KB 2090639

Post by lando_uk »

Peejay62 wrote:I stumbled into that after moving 1,6 TB of VM's...
kb.vmware.com/kb/2048201
I've not seen this my myself, and I just backtracked trough my logs to check a VM that I moved other week. - It didn't reset CBT with a storage vmotion.

Only time I've seen Veeam trigger is CBT reset is when you expand a vdisk, or if the VM has a new ID (if you do a old vcenter to new vcenter quick migration)
Gostev
Chief Product Officer
Posts: 31783
Liked: 7283 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev »

Ratcha wrote:Can you please confirm if this is, or is not, the case.
Hi Ratcha, as I have noted earlier, it will take us some time to test all possible scenarios. Unlike VMware, we don't have access to the source code, and so can only find out things empirically through testing. I cannot comment on specific statements right now, but we will post the summary about this issue as soon as all the required testing is completed. Thanks!
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr »

Reimold wrote: I think what now would be helpful is a timeframe when a fix for that error could be available, so that everyone could make a decision between waiting for a fix and manually resetting CBT.

For me it is still hard to accept that I do replicate a fileserver to our standby datacenter + backup that VM to an offsite repository and in a failure of the original VM I could end up having lost everything - not to mention that I all my history backups may be corrupt.
Dirk, CBT relies completely on vmware apis and vmware technology. There is no "fix" VEEAM can provide. VEEAM can only provide tricks or workarounds - for example checking if the size has changed and then do force a cbt reset.

So - this is just to clarify - this is a vmware issue. VMware has to come with a fix.

Best regards,
Joerg
edpaul
Lurker
Posts: 1
Liked: never
Joined: Nov 06, 2013 4:01 pm
Full Name: Paul Bacon
Contact:

Re: VMware CBT bug KB 2090639

Post by edpaul »

Does anyone know if running an Instant Recovery job for a particular VM could be used as a way to verify if the issue is occurring on that particular server?

Paul
Gostev
Chief Product Officer
Posts: 31783
Liked: 7283 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev »

Paul, see previous page of this topic:
Reimold wrote:Vmware-Support has just updated my ticket:
- to check the backup of a VM is OK they suggest to do a "Veeam Instant VM Recovery" and the do a chkdisk /fsck on the expanded drive
- VMware is working on a fix - but not timeframe yet
Reimold
Enthusiast
Posts: 41
Liked: 1 time
Joined: Sep 07, 2009 11:58 am
Full Name: Dirk Reimold
Contact:

Re: VMware CBT bug KB 2090639

Post by Reimold »

joergr wrote: Dirk, CBT relies completely on vmware apis and vmware technology. There is no "fix" VEEAM can provide. VEEAM can only provide tricks or workarounds - for example checking if the size has changed and then do force a cbt reset.

So - this is just to clarify - this is a vmware issue. VMware has to come with a fix.

Best regards,
Joerg
Joerg,

I am completly aware that this is a VMware bug. But since Goestev talks about "a hot fix for both 7.0 Patch 4 and 8.0 code branches that will reset CBT automatically upon detecting source virtual disk size Change" I have picked that up to ask for more Information.

Dirk
Gostev
Chief Product Officer
Posts: 31783
Liked: 7283 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev »

joergr wrote:There is no "fix" VEEAM can provide. VEEAM can only provide tricks or workarounds - for example checking if the size has changed and then do force a cbt reset.
Yes, that's exactly the plan and what we are building right now. We will need a few days to implement and test this.
lando_uk
Veteran
Posts: 381
Liked: 38 times
Joined: Oct 17, 2013 10:02 am
Full Name: Mark
Location: UK
Contact:

Re: VMware CBT bug KB 2090639

Post by lando_uk »

Gostev wrote: Yes, that's exactly the plan and what we are building right now. We will need a few days to implement and test this.
I thought it already did this, it does for me anyway... ( see earlier post )
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr » 1 person likes this post

@lando_uk: You have to differentiate between veeam using or not using cbt and vmware providing the cbt tracking apis and technologies. At this time, when b+r recognizes a size change with cbt enabled vdisk it backups this disk in wise foresight at one time without using cbt. BUT the original vmware cbt data (the CTK files) are still in place. Thus - this is no real help here - at least no 100% because we need to make sure the actual vmware cbt data is undergoing a real reset.
saintdle
Veeam Vanguard
Posts: 103
Liked: 17 times
Joined: Aug 05, 2014 1:13 pm
Full Name: Dean lewis
Contact:

Re: VMware CBT bug KB 2090639

Post by saintdle » 1 person likes this post

I believe that Veeam took the right action to notify people via email.

This issue is clearly quite big, if you have it, but VMware say that you might match the criteria of the issue but not experience it.

Joy.

I've wrote a blog post here about it and keeping it updated,

I've also contacted other backup vendors to get their comments if they are affected by it too.

http://www.educationalcentre.co.uk/majo ... r-backups/

Dean
Technical Architect
Veeam Certified Architect
Veeam Vanguard
  • Personal Technical Blog - www.veducate.co.uk
  • Twitter - @saintdle
cffit
Veteran
Posts: 338
Liked: 35 times
Joined: Jan 20, 2012 2:36 pm
Full Name: Christensen Farms
Contact:

Re: VMware CBT bug KB 2090639

Post by cffit »

So do I have this right?

The issue is a VMWare issue as we all know, but VEEAM is working on a workaround fix and we can expect a VEEAM patch for both version 7 and the upcoming v8 that will help us resolve this issue from a backup perspective? And this patch we can expect to be out in the near future such as a week or two?

Like others, if I can get the VEEAM patch within a week or two, I would choose not to go through all the other more intrusive suggestions that may or may not work. There have been a lot of posts about other ways that others suggest do or might fix the issue. I'd rather have something concrete from VEEAM or VMWare that states it will in fact resolve the issue.

Thanks VEEAM for working on this and taking care of your customers even though this isn't an issue due to VEEAM itself.
Gostev
Chief Product Officer
Posts: 31783
Liked: 7283 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev »

More news based on more testing.

Multiple small disk size increases exhibit confusing behavior. Extending 175GB disk by 45GB resulted in no issues, however increasing it by 45GB once again broke CBT scope despite total increase over two action was 90GB (which is less than 128GB). Having access to source code would help, but without one, the pattern is hard to understand. Thus, we have decided to stop guessing around data corruption (risky stuff) and will recommend CBT reset after any disk size increase at all.

We have also confirmed that Active Full is not required, as the following job run fixes everything by identifying and transferring all the data that was missing previously on the target due to the incorrect CBT scope at source.

Our plan is to provide a patch for both v7 and v8 within 1-2 weeks from now. This patch will make jobs to automatically reset CBT on a processed VM upon detecting a change of virtual disk configuration.
tsightler
VP, Product Management
Posts: 6034
Liked: 2859 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: VMware CBT bug KB 2090639

Post by tsightler »

I have speculated from the beginning that the problem likely occurs when a VMDK crosses a 128GB boundary. For example, if you have a 100GB VMDK, and add 20GB to it, I'd bet it would be fine since the VMDK is still only 120GB, but if you add 20GB more, then it would be 140GB and cross a 128GB boundary. Obviously if you add more that 128GB at a time, you would always cross a 128GB boundary. That theory would continue to fit the above scenario as well, 175GB + 45GB is 220GB, so still hasn't cross the next 128GB boundary, which would be 256GB, however, add another 45GB, and then your at 265GB, which would cross it and thus break CBT.

This is just my own personal guess based on the information I've seen and also my my own inferred understanding of the CTK data structures, which is certainly incomplete. I also spent some type thinking from a code perspective how/why 128GB might be important. Obviously having no access to code it's hard to know for sure, but I can see how the CTK structure might cross some internal boundary that requires creating a new structure at each 128GB point, and somehow this is failing to be triggered when VMDKs are expected across those boundaries, leaving the CTK file unable to hold the new data.
Reimold
Enthusiast
Posts: 41
Liked: 1 time
Joined: Sep 07, 2009 11:58 am
Full Name: Dirk Reimold
Contact:

Re: VMware CBT bug KB 2090639

Post by Reimold »

Gostev wrote:
Our plan is to provide a patch for both v7 and v8 within 1-2 weeks from now. This patch will make jobs to automatically reset CBT on a processed VM upon detecting a change of virtual disk configuration.
Hell Gostev,

thank you for the additional information and the timeframe for the patch. Maybe it is possible to implement the "reset CBT with the next job tun" as a setting you could also select manually for a job - who knows when it will be needed?

Dirk
martynuk
Enthusiast
Posts: 49
Liked: 9 times
Joined: Aug 16, 2013 1:34 pm
Full Name: Martin Etheridge
Contact:

Re: VMware CBT bug KB 2090639

Post by martynuk »

Presumably the Veeam patch will only address any disk configuration changes made after the patch is applied. We will therefore still need to sort out VMs which may be affected by previously implemented disk changes? This makes Dirk's suggestion of a "reset CBT on next job run" option all the more useful.

Hopefully VMware will fix this properly at some stage. In the meantime, many thanks to Veeam for helping us to work around it.
BriFar
Veeam ProPartner
Posts: 23
Liked: 11 times
Joined: Oct 24, 2011 12:55 pm
Full Name: Brian Farrugia
Location: Malta, Europe
Contact:

Re: VMware CBT bug KB 2090639

Post by BriFar »

Gostev wrote: This patch will make jobs to automatically reset CBT on a processed VM upon detecting a change of virtual disk configuration.
Hi Gostev,
May I suggest that this is clearly shown in the logs i.e the resetting of the CBT and the reason. It would help explaining to the customer why the backup size has increased without too much digging.

I would also like to suggest that the CBT is reset with every Active Full. My reasoning is that since it is an Active Full, might as well reset it since it will take the same capacity. It would also cover should veeam not detect the the vmdk has increased. Unlikely but you never know.
Just my 2c.
Thanks for bringing up this issue in your newsletter.
lp@albersdruck.de
Enthusiast
Posts: 82
Liked: 33 times
Joined: Mar 25, 2013 7:37 pm
Full Name: Lars Pisanec
Contact:

Re: VMware CBT bug KB 2090639

Post by lp@albersdruck.de »

Reimold wrote: Hell Gostev,

thank you for the additional information and the timeframe for the patch. Maybe it is possible to implement the "reset CBT with the next job tun" as a setting you could also select manually for a job - who knows when it will be needed?

Dirk
+1 for a setting to "reset CBT during next job run".
geofftx
Enthusiast
Posts: 30
Liked: 2 times
Joined: Nov 07, 2012 8:13 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by geofftx »

All great ideas going forward, but looking backward at months and months of backups, I'm struggling to figure out the best way to verify those files. Mounting each of them and running a chkdsk really doesn't seem practical for 50+ VMs, each with more than 100 restore points. I'm open to any ideas anyone has about a way to automate this, or better yet, a way to tell from the backup metadata when a vmdk size change may have occurred, limiting the test set to only those backups after the size change event.

One other question about validation: Is it fair to say that if I can successfully restore a file from a backup using the Guest Files/Windows restore functionality that the backup is intact? Is that enough of a test or is it possible corruption wouldn't necessarily show up unless I happened to pick a file living in a corrupted block?

Thanks,

Geoff
geofftx
Enthusiast
Posts: 30
Liked: 2 times
Joined: Nov 07, 2012 8:13 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by geofftx »

Bad form replying to my own post, but I was looking through the job history logs and it does appear there is data there that would help identify every vmdk size change, and an indirect way to figure out whether that change crossed the problematic 128GB boundary.

Jobs that have experienced a size change have a "Warning" completion status, and looking at the details you can see which vmdk changed, and the new size. If you looked at the previous backup you could see the vmdk size before the change and determine if it crossed a boundary. I have my history set to store 52 weeks (maybe that's the default, I don't remember if I changed it on install), so I have a lot of info at least about the last year of backups. Any chance Veeam could supply a query that would run through the backup history and give us a report of what changed when, and whether it's a potentially corrupting event? (Looking at the DB now to see if I can do this myself.:) )

Geoff
Gostev
Chief Product Officer
Posts: 31783
Liked: 7283 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev »

tsightler wrote:I have speculated from the beginning that the problem likely occurs when a VMDK crosses a 128GB boundary. For example, if you have a 100GB VMDK, and add 20GB to it, I'd bet it would be fine since the VMDK is still only 120GB, but if you add 20GB more, then it would be 140GB and cross a 128GB boundary. Obviously if you add more that 128GB at a time, you would always cross a 128GB boundary. That theory would continue to fit the above scenario as well, 175GB + 45GB is 220GB, so still hasn't cross the next 128GB boundary, which would be 256GB, however, add another 45GB, and then your at 265GB, which would cross it and thus break CBT.

This is just my own personal guess based on the information I've seen and also my my own inferred understanding of the CTK data structures, which is certainly incomplete. I also spent some type thinking from a code perspective how/why 128GB might be important. Obviously having no access to code it's hard to know for sure, but I can see how the CTK structure might cross some internal boundary that requires creating a new structure at each 128GB point, and somehow this is failing to be triggered when VMDKs are expected across those boundaries, leaving the CTK file unable to hold the new data.
I guess you have missed per my first post in this thread, but increasing size disk from 200GB to 300GB does not cause the issue, despite crossing 128GB boundary at 256GB mark.
Gostev
Chief Product Officer
Posts: 31783
Liked: 7283 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev »

geofftx wrote:corruption wouldn't necessarily show up unless I happened to pick a file living in a corrupted block?
This is correct.
Gostev
Chief Product Officer
Posts: 31783
Liked: 7283 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev »

BriFar wrote:May I suggest that this is clearly shown in the logs i.e the resetting of the CBT and the reason. It would help explaining to the customer why the backup size has increased without too much digging.
Yes, it will be shown. However, backup size is not impacted with CBT reset (in case of our product at least), only duration will be longer because the job needs to read the entire VMDK.
BriFar wrote:I would also like to suggest that the CBT is reset with every Active Full. My reasoning is that since it is an Active Full, might as well reset it since it will take the same capacity. It would also cover should veeam not detect the the vmdk has increased. Unlikely but you never know.
This is not a good idea to do, because often there could be other jobs processing the same VM (for example, backing up and replicating the same VM). The other job will be impacted by CBT reset, and will likely not meet its RTOs as the result.
tsightler
VP, Product Management
Posts: 6034
Liked: 2859 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: VMware CBT bug KB 2090639

Post by tsightler »

Gostev wrote:I guess you have missed per my first post in this thread, but increasing size disk from 200GB to 300GB does not cause the issue, despite crossing 128GB boundary at 256GB mark.
Well, I did, but in that post you said "for example" so I didn't know if that was an actual performed and confirmed test, or if you just pulled those numbers from the air to use as an example of how expanding more than 128GB always causes the bug, but less may not. If that was an actual tested case, then yes, that blows up my theory, but hey it was fun to guess. :mrgreen:
Locked

Who is online

Users browsing this forum: Egor Yakovlev and 57 guests