Host-based backup of VMware vSphere VMs.
Locked
Gostev
Chief Product Officer
Posts: 31695
Liked: 7207 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev »

@Tom yes, it was the actual test.

All, sorry I cannot comment on every post (I am on the road this and next week). I will be updating you as news appear, and will try to do it at least daily.

To answer a common question about CBT reset handling: along with the patch, we will provide a script that resets CBT on all VMs (in fact, its was already published on Monday > http://www.veeam.com/kb1940).
chrisbarr35
Service Provider
Posts: 14
Liked: 4 times
Joined: Aug 22, 2011 3:28 pm
Full Name: Chris Barr
Contact:

Re: VMware CBT bug KB 2090639

Post by chrisbarr35 »

Hi
What is the implication of resetting CBT when using copy jobs? We already run "Active Full" backups weekly, but the intelligent copy still only copies the increments on the next copy run. If CBT is reset will the next copy job need to run a full copy of the VMs in the backup?
Chris
xadamz23
Influencer
Posts: 17
Liked: 3 times
Joined: Dec 08, 2009 10:24 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by xadamz23 »

I would like clarification on one thing that I havent seen touched on yet. I wanted to test the solution of resetting CBT and running a backup job to get everything back in working order. I have a backup job with just one VM in it. I ran the PowerCLI code provided in http://www.veeam.com/kb1940 to reset CBT. I then kicked off my backup job as I normally would. The VM is 130 GB in size, but to my surprise the backup job created a .vib file that was only 300 MB in size. Obviously only 300 MB of data had changed since the last backup. At first I thought something was wrong because I expected a huge .vib file. But then I thought to myself it reset CBT so it only had to "scan" the entire vmdk for changed blocks, it didnt actually need to back the entire vmdk file up.

Is that correct?

Edit:
I guess I dont fully understand that either though. Lets say this particular job was experiencing the corrupt CBT issue. So lets say the issue first arose weeks ago. So doesnt that mean that possibly every backup since the issue arose is bad, and not all blocks that needed to be backed up were being backed up? Then I fix CBT by doing the above, but it simply scans for changed blocks since the "last" backup. Wouldnt it still be missing some blocks?
BriFar
Veeam ProPartner
Posts: 23
Liked: 11 times
Joined: Oct 24, 2011 12:55 pm
Full Name: Brian Farrugia
Location: Malta, Europe
Contact:

Re: VMware CBT bug KB 2090639

Post by BriFar »

Gostev wrote: Yes, it will be shown. However, backup size is not impacted with CBT reset (in case of our product at least), only duration will be longer because the job needs to read the entire VMDK.
Yes you are right. I ment to write time not size.
Gostev wrote: This is not a good idea to do, because often there could be other jobs processing the same VM (for example, backing up and replicating the same VM). The other job will be impacted by CBT reset, and will likely not meet its RTOs as the result.
I know what you mean but wouldn't replicas also be effected by this bug? If so it could be that you might have a replica which meets RTO but does not boot or the data is missing or corrupted.
Don't take me wrong, I know that there will not be a one for all fix and I appreciate the efforts being put here.
Reimold
Enthusiast
Posts: 41
Liked: 1 time
Joined: Sep 07, 2009 11:58 am
Full Name: Dirk Reimold
Contact:

Re: VMware CBT bug KB 2090639

Post by Reimold »

I just saw that the VMware-KB was updated:

"This issue occurs on a virtual machine with Changed Block Tracking (CBT) enabled, when extending a virtual disk (vmdk) file to a size strictly above 128 GB, due to the block tracking information being recalculated incorrectly."
http://kb.vmware.com/selfservice/micros ... Id=2090639

So if I understand that right, every VM that was extended to a target size greater than 128 GB is at risk, no matter how big the extend was or from what size the extend was started.

Dirk
foggy
Veeam Software
Posts: 21127
Liked: 2137 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: VMware CBT bug KB 2090639

Post by foggy »

chrisbarr35 wrote:What is the implication of resetting CBT when using copy jobs? We already run "Active Full" backups weekly, but the intelligent copy still only copies the increments on the next copy run. If CBT is reset will the next copy job need to run a full copy of the VMs in the backup?
Backup copy jobs do not rely on VMware CBT and will still copy only changed blocks from the new full.
xadamz23 wrote:But then I thought to myself it reset CBT so it only had to "scan" the entire vmdk for changed blocks, it didnt actually need to back the entire vmdk file up.

Is that correct?
Correct.
xadamz23 wrote:I guess I dont fully understand that either though. Lets say this particular job was experiencing the corrupt CBT issue. So lets say the issue first arose weeks ago. So doesnt that mean that possibly every backup since the issue arose is bad, and not all blocks that needed to be backed up were being backed up? Then I fix CBT by doing the above, but it simply scans for changed blocks since the "last" backup. Wouldnt it still be missing some blocks?
No, all the missing blocks will be copied after scanning the entire VM image.
GabesVirtualWorld
Expert
Posts: 248
Liked: 38 times
Joined: Jun 15, 2009 10:49 am
Full Name: Gabrie van Zanten
Contact:

Re: VMware CBT bug KB 2090639

Post by GabesVirtualWorld »

I was wondering if there is a way to check which VMs are affected? Can we see "inside" the CBT file? Can we query the VM ?

With close to 1500 VMs I would hate to force full scans of just any VM.

The KB mentions:
When you run the command QueryChangedDiskAreas("*") to return a list of allocated disk sectors, you experience these symptoms:
Allocated portions of the virtual machine disk vmdk file are not returned.
The list of allocated virtual machine disk sectors returned is incorrect
Now, how can I run that command? And how can I see if the returned info is incorrect?
tsightler
VP, Product Management
Posts: 6026
Liked: 2855 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: VMware CBT bug KB 2090639

Post by tsightler »

You can immediately narrow it down by removing any VMs that are less than 128GB as, at least based on the information so far, they should be immune to the problem. If you have 1500VMs that are greater than 128GB, well, that doesn't help much!

I don't see how querying the API alone would be useful as you'd need something to compare it to. The only way I could see to do that would be to scan the VMDK, at least the parts beyond 128GB. I don't see any practical way to do this that wouldn't be more painful that just resetting CBT.
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr »

@Gabe and Tom:

Me personally, I did a scan over EVERY vm to examine where CBT is active: get-vm | ?{$_.ExtensionData.Config.ChangeTrackingEnabled -eq $true}

After that i ran the script targeting every one of these i found. At the end of the day 100% safety matters for me and i also like to have a good sleep ;-)

In our datacenter it got about 200 VMs per hour this way (but i did it during a working day so the helper-snaphots sometimes took a little longer to consolidate). I guess during weekend time this will be much faster. But then again, it depends on your VM-size, purpose, storage and on your ESXi hosts very much.

But ofcourse you can 'AND-filter' the CBT-VMs combined with a vdisk size check (was posted in this thread, too). Again - for me personally - i had a better gut feeling by applying a CBT-reset to every CBT-enabled VM - no matter what size.

Best regards,
Joerg
Reimold
Enthusiast
Posts: 41
Liked: 1 time
Joined: Sep 07, 2009 11:58 am
Full Name: Dirk Reimold
Contact:

Re: VMware CBT bug KB 2090639

Post by Reimold »

Joerg wrote: After that i ran the script targeting every one of these i found. At the end of the day 100% safety matters for me and i also like to have a good sleep ;-)
But if I reset the CBT on a lot of VM´s at the same time, all Veeam Jobs will take much longer because of the VMDK beeing read completly. At least in our environment the backup window will be exceeded (Proxys and repositories will be overbooked) - and I cannot let run SAP or Exchange with an active snapshot during production hours. So I have to reset them in small Groups.

Dirk
dellock6
VeeaMVP
Posts: 6162
Liked: 1970 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: VMware CBT bug KB 2090639

Post by dellock6 »

Correct. Also, Joerg, resetting CBT "before" a Vm crosses the 128Gb threshold is useless, you force a disk scan with Veeam, but at the time the VM will cross the 128GB size in the future you will have again to reset the CBT, unless until we release a workaround or VMware creates a fix... I hope honestly they will fix it quickly, but I still have another bug open in my list that is taking months (did we forgot this one? http://kb.vmware.com/kb/2068424 is still open since last january, 10 months...) so I would probably wait first for our workaround :)
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr »

dellock6 wrote:Correct. Also, Joerg, resetting CBT "before" a Vm crosses the 128Gb threshold is useless, you force a disk scan with Veeam, but at the time the VM will cross the 128GB size in the future you will have again to reset the CBT, unless until we release a workaround or VMware creates a fix... I hope honestly they will fix it quickly, but I still have another bug open in my list that is taking months (did we forgot this one? http://kb.vmware.com/kb/2068424 is still open since last january, 10 months...) so I would probably wait first for our workaround :)
Of course you are right Luca - always in mind and assumptive that VMware is 100% and absolutely guaranteed right about this 128GB "barrier".

By the time till the patch comes out i told my colleagues to report when a vmdk is enlarged so i could reset cbt especially for this one vm.

Best regards,
Joerg
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr »

Reimold wrote: But if I reset the CBT on a lot of VM´s at the same time, all Veeam Jobs will take much longer because of the VMDK beeing read completly. At least in our environment the backup window will be exceeded (Proxys and repositories will be overbooked) - and I cannot let run SAP or Exchange with an active snapshot during production hours. So I have to reset them in small Groups.

Dirk
@Dirk:

Of course you have to respect the special characteristics of your adjacencies. It could be possible to go step-by-step (you could for example write the not resetted vms to a variable and keep it till the next day and then continue). Me, personally, i first took some test runs with 3-5 vms just to be sure a) it worked with the config and snapshots and b) the time is indeed exactly the time of an active full. Before i kicked off the reset script i calculated (by the history of all backup jobs and all veeam servers respecting all connected san infrastructure systems - i could count on these values because i am doing active fulls every week - thus i relatively exactly knew the time windows) the time my active full would be exceeding the backup window. With this in mind i kicked of some selected jobs which would cause the windows to exceed and took active fulls before the actual backups ran. Thus - these selected jobs were done fast as everytime when the main backup window came. All this went well.

So - of course and as you already mentioned, you have to adapt to the special characteristics of your environment.

Best Regards,
Joerg
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr »

dellock6 wrote:have another bug open in my list that is taking months (did we forgot this one? http://kb.vmware.com/kb/2068424 is still open since last january, 10 months...) so I would probably wait first for our workaround :)
Oh my ;-) Yes this bug is bugging me very intense for a long time ;-)
geofftx
Enthusiast
Posts: 30
Liked: 2 times
Joined: Nov 07, 2012 8:13 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by geofftx »

Still working on verification strategies, but is it fair to say that if you check your latest backups using Veeam Instant Recovery, running a chkdsk on any VM volume larger that 128G, that if chkdsk returns no errors then you have not encountered the bug and all prior restore points will also be good? (Assuming, of course, that CBT has never been reset at any point in that machine's Veeam backup history.)

And if the volumes check clean is there any reason to perform a CBT reset?

Geoff
aaron_meza
Lurker
Posts: 2
Liked: never
Joined: Oct 30, 2014 7:33 pm
Full Name: Aaron Meza
Contact:

Re: VMware CBT bug KB 2090639

Post by aaron_meza »

It''s not just 128GB boundaries. Crossing the 256GB, 512GB, and 1024GB boundaries result in corruption as well. In many cases, corruption is undetectable if you are restoring fulll VM. chkdsk will pass. the only way to detect corruption is to know the checksum from the file on the original volume and compare that to recovered file checksum.
aaron_meza
Lurker
Posts: 2
Liked: never
Joined: Oct 30, 2014 7:33 pm
Full Name: Aaron Meza
Contact:

Re: VMware CBT bug KB 2090639

Post by aaron_meza »

Stoo wrote:Has anyone actually come across this bug 'in the wild' yet and have personal experience of how it manifests?

Keen to know whether this will trash the entire 128Gb+ disk's structure and file headers, making it effectively unusable/unmountable, or whether theoretically, if i'm able to use the windows guest File-Level-Restore wizard which creates vmdk mountpoints in C:\veeamflr on my backup server, and it successfully enumerates the entire drive's contents and directory structure, i should be in the clear?
Unfortunalty, no. I have produced a scenario where the entire disk structure is correct, chkdsk passes. There are simple zeroed out areas of data in a file. No way to detect without knowing the prior contents/checksum of the file.
MrSpock
Service Provider
Posts: 49
Liked: 3 times
Joined: Apr 24, 2009 10:16 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by MrSpock »

I have now made a CBT reset on all my VM's. On one host I stumbled on the error stated in this article: http://www.veeam.com/kb1113

Let us take a look at step 7 to 9:
7. Set the "scsi0:x.ctkEnabled" value back to true for each disk of the VM in question
8. Power the VM on
9. Rerun Backup or Replication job to re-enable CBT
At step 7 I did also set "ctkEnabled" to true. That means that CBT will already be enabled when next backup session starts. Veeam Backup will not re-enable CBT as stated in step 9 (I suppose).

Question: The CBT data does not contain all changes since the last backup session. Will Veeam Backup detect that CBT has been reset and re-enabled since the last backup job and make a full scan of the disk?

Best regards,

Johan
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr »

@Johan: If you followed exactly the rules in the VEEAM KB you mentioned, especially no5, you are good, because the old ctks are gone.

Joerg
MrSpock
Service Provider
Posts: 49
Liked: 3 times
Joined: Apr 24, 2009 10:16 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by MrSpock »

joergr wrote:@Johan: If you followed exactly the rules in the VEEAM KB you mentioned, especially no5, you are good, because the old ctks are gone.

Joerg
Joerg,

Yes, I am safe regarding the VMWare CBT bug, but my question is about what Veeam Backup will do when CBT has been disabled and re-enabled since the last backup session. Will Veeam Backup use the CBT data or ignore it?

Best regards,

Johan
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr »

Johan: Even you had re-enabled cbt manually by setting it according to the veeam kb article you mentioned there is actually no real ctk data on the volume - thus veeam can not use any cbt data at this time - at least not with this first backup run. So there is nothing to ignore because there simply is nothing there. The cbt data is needed after your first backup after the cbt reset - thats how cbt works - to tell the backup provider which blocks have changed since the last backup.

Joerg
loelly
Enthusiast
Posts: 51
Liked: 10 times
Joined: Apr 17, 2014 8:25 am
Full Name: Jens Siegmann
Contact:

Re: VMware CBT bug KB 2090639

Post by loelly »

So, I happen to have some TBs of file servers and need a quick confirmation that I understood this correctly:

After disabling/enabling CBT, the following backup run will scan the VMDKs entirely but only copy the missing deltas to the backup repository? So I do not need to provide some more TBs to the backup repository. It will be a time consuming, but not space consuming run, right?
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr »

You got it 100% right. The time is the time of an active full BUT the space remains the usual.
loelly
Enthusiast
Posts: 51
Liked: 10 times
Joined: Apr 17, 2014 8:25 am
Full Name: Jens Siegmann
Contact:

Re: VMware CBT bug KB 2090639

Post by loelly »

Thanks for the quick confirmation!
lando_uk
Veteran
Posts: 377
Liked: 32 times
Joined: Oct 17, 2013 10:02 am
Full Name: Mark
Location: UK
Contact:

Re: VMware CBT bug KB 2090639

Post by lando_uk »

Any VM's that I know have had their disks expanded, I can just do a storage vmotion them to sort this out yes? This will reset CBT?
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr »

Yes. http://kb.vmware.com/selfservice/micros ... Id=2048201

But only until ESXi 5.5U2. Be aware of that.
lando_uk
Veteran
Posts: 377
Liked: 32 times
Joined: Oct 17, 2013 10:02 am
Full Name: Mark
Location: UK
Contact:

Re: VMware CBT bug KB 2090639

Post by lando_uk »

If I do the CBT reset (or a storage vmotion) I presume the next synthetic full backup wont fix the issue, I'd have to run a active full on the job that the VM I fixed is in?
loelly
Enthusiast
Posts: 51
Liked: 10 times
Joined: Apr 17, 2014 8:25 am
Full Name: Jens Siegmann
Contact:

Re: VMware CBT bug KB 2090639

Post by loelly » 1 person likes this post

No, you do not need to run an Active Full. Just reset CBT as proposed, Veeam B&R will do an integrity check on the next run, save just the deltas and you do not need to waste a whole bunch of storage in the backup repository (temporarily). Your backups will take longer, comparable to an Active Full. See posts above.
ian0x0r
Veeam Vanguard
Posts: 238
Liked: 55 times
Joined: Nov 11, 2010 11:53 am
Full Name: Ian Sanderson
Location: UK
Contact:

Re: VMware CBT bug KB 2090639

Post by ian0x0r »

Subscribing to thread.
Check out my blog at www.snurf.co.uk :D
kwells
Novice
Posts: 8
Liked: never
Joined: Jan 07, 2014 5:20 pm
Full Name: Kevin Wells
Contact:

Re: VMware CBT bug KB 2090639

Post by kwells »

Hi,

I would like to use the powerCLI script supplied by Veeam,, but have never used powercli before, so have no idea how to run it. Is there a simple idiots guide somewhere please. Also I see that some people are saying they are applying the script to a few VMs at a time, but the way I am reading the script comments it seems to get a list of all the VMs that have CBT then makes the change against them all at once. Am I just completely misunderstanding how this works?

Thanks in advance for any help you can supply
Locked

Who is online

Users browsing this forum: kenthsien0909 and 71 guests