Discussions specific to the VMware vSphere hypervisor
Locked
kimhansen
Novice
Posts: 7
Liked: 2 times
Joined: Jul 20, 2009 7:05 pm
Full Name: Kim Alexander Hansen
Contact:

Re: VMware CBT bug KB 2090639

Post by kimhansen » Nov 04, 2014 11:15 pm 1 person likes this post

For people asking about confirming that CBT has been reset, this is a way to do it:

1. Reset CBT using any of the previously posted methods; manually with reboot or script with snapshot method (both work). Then just browse the data files and confirm that the <diskname>-ctk.vmdk files are gone. This should be enough, but if you want to really really make sure, also to the next step.

2. Let Veeam run a backup job on the VM after the reset and look for the following in the Veeam logs (%programdata%\Veeam\Backup\VMNAME\Task.VMNAME.vm-nnn.log):

[04.11.2014 06:30:17] <60> Info VM information: name "VM NAME", ref "vm-nnn", uuid "564d6345-3311-21da-9f59-a7188eb2062e", host "vsphere.hostname.local", resourcePool "resgroup-73", connectionState "Connected", powerState "PoweredOn", template "False", changeTracking "False", configVersion "vmx-07"

As you see veeam reports changeTracking as "False"

You can then some lines below find:

[04.11.2014 06:30:25] <60> Info [Soap] SetVmChangeTracking, vmRef 'vm-nnn', changeTrackingEnabled 'True'
[04.11.2014 06:30:25] <60> Info [VimApi] ReconfigVM, type "VirtualMachine", ref "vm-nnn"

So, veeam is trurning CBT back on

That's all folks ;)

Brgs,
Kim Alexander Hansen

Gostev
SVP, Product Management
Posts: 24450
Liked: 3412 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev » Nov 05, 2014 12:34 am

@DrColonel this information from VMware support is incorrect.
@cliffm all your questions have already been answered earlier in this topic.

kwells
Novice
Posts: 8
Liked: never
Joined: Jan 07, 2014 5:20 pm
Full Name: Kevin Wells
Contact:

Re: VMware CBT bug KB 2090639

Post by kwells » Nov 05, 2014 9:49 am

MrSpock wrote: I got the same error on one host. Solution: http://www.veeam.com/kb1113

Best regards,

Johan
Hi Johan,
The link you suggested was the one I used when I said I carried out the manual reset method. The interesting thing is that I set both the "ctkEnabled" and "scsi0:x.ctkEnabled" to false, then delete the -CTK files, when I check the options I find that the ctkEnabled has set itself back to True.

Time to talk to Veeam support?
I tried this again last night, and again the backup gave the Soap error. I have also just checked the log files and can not see the changed tracking=false anywhere. Does this mean that CBT is not being reset on this VM?

MrSpock
Enthusiast
Posts: 34
Liked: 1 time
Joined: Apr 24, 2009 10:16 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by MrSpock » Nov 05, 2014 9:55 am

Hi, kwells.

Yes, I did also notice that ctkEnabled was set back to True automatically.

So you did set "scsi0:x.ctkEnabled" back to True manually as stated in step 7? That did the trick for me at least.

Can you see any "-ctk" files now?

Best regards,

Johan

DrColonel
Novice
Posts: 5
Liked: never
Joined: Apr 29, 2014 8:00 pm
Full Name: Kyle Morrow
Contact:

Re: VMware CBT bug KB 2090639

Post by DrColonel » Nov 05, 2014 3:06 pm

Gostev wrote:@DrColonel this information from VMware support is incorrect.
@cliffm all your questions have already been answered earlier in this topic.
Their response did seem a bit suspect to me. I had already reset CBT on our affected VM's so it was no longer an immediate issue, but I thought their response was interesting. Can you elaborate or refer me to a past post showing why it's incorrect so that I can let them know that the info they're giving out about this issue is wrong?

tsightler
VP, Product Management
Posts: 5381
Liked: 2214 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: VMware CBT bug KB 2090639

Post by tsightler » Nov 05, 2014 3:43 pm 3 people like this post

DrColonel wrote:Their response did seem a bit suspect to me. I had already reset CBT on our affected VM's so it was no longer an immediate issue, but I thought their response was interesting. Can you elaborate or refer me to a past post showing why it's incorrect so that I can let them know that the info they're giving out about this issue is wrong?
The actual VMware KB article that this thread refers to is really all you need. Also, it's important to note that VMware KB2090639 has changed significantly since it was originally released. At first it claimed that it was only impacting a specific call to QueryChangedDiskAreas(), specifically when you call it with an "*", a special call that should return a list of all blocks that have been used/allocated in the entire VMDK. This call was useful for full backups to save time reading from disk areas that were not previously written and thus obviously had no data. This information lead to some vendors which didn't use this specific call to claim they were not impacted by this bug.

However, the KB article has since been updated and now notes that any call to this API can return inconsistent data if the VMDK has been expanded beyond the given thresholds. Also, they have added some additional information about those threasholds, specifically they have the following in the Q&A section:
Are virtual machines grown in smaller increments affected?
The amount of space the virtual disk is extended is not relevant, the increment of space by which a virtual disk is extended is not relevant.
Virtual machine is affected when the disk is grown past the 128G boundary in absolute size. The issue is triggered at other sizes which are a power of 2 from 128G up. For example: 256G, 512G, and 1024G.
A full backup that didn't use the QueryChangeDiskAreas() API at all should not be impacted, but incremental backups using CBT from that point could still be impacted and thus invalid since the API would fail to return changed blocks from blocks over the extended thresholds. Veeam uses QueryChangeDiskAreas() even during full backups to identified "used" blocks in a VMDK so it would impact both Full and Incremental backups until the CBT data is reset.

So to clarify, the only "safe" thing to do is to reset CBT for any VM that is over 128GB, unless you have a full history of change control information for a VM and know it was never expanded.

I would continue to monitor this thread as well as the VMware KB article as it's obvious that information is still being discovered about this issue and we may not yet be at the final state.

joergr
Expert
Posts: 386
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: VMware CBT bug KB 2090639

Post by joergr » Nov 05, 2014 7:31 pm

tsightler wrote:The issue is triggered at other sizes which are a power of 2 from 128G up. For example: 256G, 512G, and 1024G.
Tom, i would not count on that 100%. As Anton verified 200G to 300G was fine in the VEEAM Lab. Thus, i think VMware has to do way more research regarding this issue.

Again - me personally - i did reset CBT on all VMs, regardless the size. And at present time - if we change a vdisk size we trigger a cbt reset for this particular vm via scipt.

Best regards,
Joerg

jai64
Influencer
Posts: 23
Liked: never
Joined: Jun 20, 2009 10:43 am
Full Name: Joe Iovinelli
Contact:

Re: VMware CBT bug KB 2090639

Post by jai64 » Nov 05, 2014 8:30 pm

Has anyone come across the CBT getting stuck after following the steps in KB 1113?

I ran as per KB 1113 and now every server I trigger a cbt reset too gives a "Cannot use CBT: Soap fault." on every backup.

Even stranger, I triggered a full backup and it ran OK with no warnings, but the next job the "Cannot use CBT: Soap fault." came back.

xx/xx/xxxx xx:xx:xx PM :: Cannot use CBT: Soap fault. A specified parameter was not correct. . deviceKeyDetail: '<InvalidArgumentFault xmlns="urn:internalvim25" xsi:type="InvalidArgument"><invalidProperty>deviceKey</invalidProperty></InvalidArgumentFault>', endpoint: ''

Support suggested I redo the cbt reset but I have a bunch of servers to do that run 24/7 and I am using the vCenter Appliance (no powershell).

So additionally does anyone know a way to trigger a cbt reset without powershell or the Windows version of vCenter?

tsightler
VP, Product Management
Posts: 5381
Liked: 2214 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: VMware CBT bug KB 2090639

Post by tsightler » Nov 05, 2014 8:42 pm 1 person likes this post

joergr wrote: Tom, i would not count on that 100%. As Anton verified 200G to 300G was fine in the VEEAM Lab. Thus, i think VMware has to do way more research regarding this issue.
I wasn't suggesting you trust it, so sorry if that was implied. I was attempting to point out exactly your point, we don't have the final answer yet as even VMware continues to change their information and this best thing to do is to continue to monitor. I agree with you that the approach of resetting CBT on every disk size change is the most prudent for now.

Resqman
Novice
Posts: 3
Liked: never
Joined: Nov 05, 2014 10:50 pm
Full Name: FC
Contact:

Re: VMware CBT bug KB 2090639

Post by Resqman » Nov 05, 2014 10:59 pm

foggy wrote: No, all the missing blocks will be copied after scanning the entire VM image.

@Foggy. On regards to that last part. I want to make sure that I understand this correct. So if I disable CBT and then let my nightly backup jobs run tonight then the entire VM will be scanned and any blocks that were previously missed on prior backups because of this CBT issue PLUS the blocks that have changed since my last backup (last night's) will be included in this new backup correct? If this is the case then that means that yesterday's (and the day before, and the day before that ...) backups are technically useless right? So am I mistaken in thinking that since from this point forward I can only rely on tonight's incremental backup to confidently restore anything on that VM am I just not better of taking a full backup and getting rid of all the other prior incrementals and Fulls up to this point?

VeForum
Lurker
Posts: 1
Liked: 1 time
Joined: Jul 15, 2014 5:46 am
Contact:

Re: VMware CBT bug KB 2090639

Post by VeForum » Nov 06, 2014 10:46 am 1 person likes this post

For all of us who want to reset CBT one by one I made a Powercli script to handle this easily.
To find out if a VM already has run CBT reset I use a CustomAttribute which need to be set up first:

Code: Select all

New-CustomAttribute -Name "CBTReset" -TargetType VirtualMachine
As value for that attribute I use a date so the script can be used later to do it again by changing the $Marker (really don't hope it will be necessary).
Here is the script:

Code: Select all

$Marker = "2014-11-05"
$menu = Read-Host "[1] All Machines, [2] CBT reset, [3] CBT not reset"
switch ($menu) {
    1 {$vms=get-vm -Verbose | ?{($_.ExtensionData.Config.ChangeTrackingEnabled -eq $true)}}
    2 {$vms=get-vm -Verbose | ?{($_.ExtensionData.Config.ChangeTrackingEnabled -eq $true) -and ($_.CustomFields.Item("CBTReset") -eq $Marker)}}
    3 {$vms=get-vm -Verbose | ?{($_.ExtensionData.Config.ChangeTrackingEnabled -eq $true) -and ($_.CustomFields.Item("CBTReset") -ne $Marker)}}
    default {Write-Host "Invalid Option!"}
    }

$vmSelected=$null
$vmSelected=$vms | select Name -ExpandProperty CustomFields| Where{$_.key -eq "CBTReset"} | Out-GridView -OutputMode Single
if ($vmSelected -ne $null)
    {
    switch (Read-Host "Reset CTB on $vmSelected.name (y/n)?")
        {
            y 
                {
                $spec = New-Object VMware.Vim.VirtualMachineConfigSpec
                $spec.ChangeTrackingEnabled = $false
                $vm = Get-VM -Name $vmSelected.Name
                $vm.ExtensionData.ReconfigVM($spec)
                $snap=$vm | New-Snapshot -Name 'Disable CBT' 
                $snap | Remove-Snapshot -confirm:$false
                $vmReload = Get-VM -Name $vmSelected.Name | ?{($_.ExtensionData.Config.ChangeTrackingEnabled -eq $true)}
                if ($vmReload -eq $null)
                    {Set-Annotation -Entity $vm -CustomAttribute "CBTReset" -Value $Marker}
                else
                    {Write-Host "Something went wrong with $vm.Name"}
                }
            default {Write-Host "Bye"}
        }
    }
else
    {Write-Host "Bye Bye"}
Good luck
Herby

foggy
Veeam Software
Posts: 18029
Liked: 1533 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: VMware CBT bug KB 2090639

Post by foggy » Nov 06, 2014 5:28 pm

Resqman wrote:@Foggy. On regards to that last part. I want to make sure that I understand this correct. So if I disable CBT and then let my nightly backup jobs run tonight then the entire VM will be scanned and any blocks that were previously missed on prior backups because of this CBT issue PLUS the blocks that have changed since my last backup (last night's) will be included in this new backup correct?
Correct.
Resqman wrote:If this is the case then that means that yesterday's (and the day before, and the day before that ...) backups are technically useless right? So am I mistaken in thinking that since from this point forward I can only rely on tonight's incremental backup to confidently restore anything on that VM am I just not better of taking a full backup and getting rid of all the other prior incrementals and Fulls up to this point?
Previous backups of VMs that have disks with size over 128GB are at risk, yes. So, as described earlier in this thread, creating new restore points for them is recommended, after resetting CBT. It is completely up to you though, whether to delete older backup files, since they might still be recoverable (at least FLR may work).

jai64
Influencer
Posts: 23
Liked: never
Joined: Jun 20, 2009 10:43 am
Full Name: Joe Iovinelli
Contact:

Re: VMware CBT bug KB 2090639

Post by jai64 » Nov 06, 2014 5:34 pm

If the Surebackup for a device was run successfully, will that guarantee no corruption and a good recoverable VM?

nreutemann
Enthusiast
Posts: 47
Liked: 6 times
Joined: Mar 06, 2012 11:45 pm
Full Name: Nicolas Reutemann
Contact:

Re: VMware CBT bug KB 2090639

Post by nreutemann » Nov 06, 2014 6:36 pm

How much safe is run the script from veeam KB1940?

The release note of v8 says, inside "upgrade to v8", this:

"11. Reset CBT for all VMs in the environment. For more information, refer to Veeam support article KB1940."

Im following this thread the last few days and im a little scared to run the script.
I do some test and dont have any trouble, but i need to run over all the VMs and I need some encourage!

Thanks in advance!

cliffm
Enthusiast
Posts: 41
Liked: 4 times
Joined: Jun 03, 2011 12:41 am
Full Name: Cliff Meakin
Contact:

Re: VMware CBT bug KB 2090639

Post by cliffm » Nov 06, 2014 7:33 pm

chrisdearden wrote: Sure Replica is in v7.
I have V7 but can't find any SureReplica in it?

tsightler
VP, Product Management
Posts: 5381
Liked: 2214 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: VMware CBT bug KB 2090639

Post by tsightler » Nov 06, 2014 7:57 pm

cliffm wrote: I have V7 but can't find any SureReplica in it?
It's there! There's nothing specifically called "SureReplica" in the GUI, but in v7 when you create an application group you'll notice that there's both and "Add Backup" and "Add Replica" for you to select VMs for the app group. Not only that, but when you create the "SureBackup" job itself, you can add select both backup and replica jobs, you can even mix and match in the same job! Some links:

Veeam Helpcenter: SureReplica Documentation
Video: Put your replicas to work

Gostev
SVP, Product Management
Posts: 24450
Liked: 3412 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev » Nov 06, 2014 9:45 pm

nreutemann wrote:How much safe is run the script from veeam KB1940?
Super safe. Basically zero impact, except that the next run for all jobs will take longer than usual.

nreutemann
Enthusiast
Posts: 47
Liked: 6 times
Joined: Mar 06, 2012 11:45 pm
Full Name: Nicolas Reutemann
Contact:

Re: VMware CBT bug KB 2090639

Post by nreutemann » Nov 06, 2014 11:09 pm

Gostev wrote: Super safe. Basically zero impact, except that thenext run for all jobs will take longer than usual.
Excellent, thanks Gostev.

Tomorrow, before the Incremental + Synthetic of friday, I will run the script.

After the full run, I will post the results here.

Again, thanks.

_richiix
Lurker
Posts: 1
Liked: never
Joined: Nov 07, 2014 10:09 am
Full Name: Richard Tracey
Contact:

Re: VMware CBT bug KB 2090639

Post by _richiix » Nov 07, 2014 10:52 am

Hi guys,

Firstly thanks for all the information, it has been a fantastic help in diagnosing this problem on our infrastructure.

Quick question though, I am seeing this error appear on VM's that are far below the threshold.
I have seen this error crop up on VM's that only have a 25GB disk..

Anything else I should search for regarding this?

Cheers guys!

foggy
Veeam Software
Posts: 18029
Liked: 1533 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: VMware CBT bug KB 2090639

Post by foggy » Nov 07, 2014 11:44 am 1 person likes this post

Do you mean the "Cannot use CBT: Soap fault." error? There's a dedicated thread on it, but better contact support directly.

ptcruisergt
Novice
Posts: 4
Liked: never
Joined: Aug 24, 2012 2:22 pm
Full Name: P. Cruiser
Contact:

Re: VMware CBT bug KB 2090639

Post by ptcruisergt » Nov 07, 2014 4:19 pm

There is word over in the EMC forums (https://community.emc.com/thread/201841) that this VMware bug has been around since 2007. If that's true, I'm at a loss for words. There is also mention of a hotfix that can be obtained by calling support.

Apologies if this information was already posted here.

tsightler
VP, Product Management
Posts: 5381
Liked: 2214 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: VMware CBT bug KB 2090639

Post by tsightler » Nov 07, 2014 5:06 pm

ptcruisergt wrote:There is word over in the EMC forums (https://community.emc.com/thread/201841) that this VMware bug has been around since 2007.
Thanks for the info. CBT wasn't an available feature until ESX/ESXi 4.0, and I don't believe 4.0 was released publicly until 2009, but yes, this bug impacts every single version of ESX/ESXi that had the CBT feature. I guess it could have existed in 2007 in beta versions of ESX 4.0. Definitely would be good to confirm a hotfix and whether it addresses VMs that already have broken CBT or will it still require a manual CBT reset for those VMs and simply prevent the issue in the future?

lobo519
Expert
Posts: 297
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by lobo519 » Nov 07, 2014 8:23 pm

Maybe I missed it in the 10 pages but - Veeam disables CBT when there is a disk size change does it not?

I just did one last week and I am looking at the log that says "Change Block tracking is disabled".

What am I missing?

Gostev
SVP, Product Management
Posts: 24450
Liked: 3412 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VMware CBT bug KB 2090639

Post by Gostev » Nov 07, 2014 8:26 pm

That CBT tables in VMware get messed up and this affects all future use of CBT data.
Our message only means that the job will not use CBT data during this one specific run.

lobo519
Expert
Posts: 297
Liked: 34 times
Joined: Sep 29, 2010 3:37 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by lobo519 » Nov 07, 2014 8:27 pm

Got it - Thanks!

cffit
Expert
Posts: 338
Liked: 34 times
Joined: Jan 20, 2012 2:36 pm
Full Name: Christensen Farms
Contact:

Re: VMware CBT bug KB 2090639

Post by cffit » Nov 07, 2014 8:53 pm

I opened a case with VMware on this to get the hotfix mentioned a few posts above. Here is their reply to me:

Thank you for your Support Request. Just tried to call but hit your voicemail. I'm not sure where you were given information that a hotfix is available, but that is not the case. At this point there is a workaround available in http://kb.vmware.com/kb/2090639, and a fix is scheduled for a future release. You can also subscribe to an RSS feed of the KB and you'll receive an update when the fix is released.

mrt
Enthusiast
Posts: 47
Liked: 2 times
Joined: Feb 10, 2011 7:27 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by mrt » Nov 07, 2014 9:37 pm

can someone provide a modified version of the script on http://www.veeam.com/kb1940 that uses a single named vm instead of a query for every vm that has cbt enabled? I'd like to do the few vm's in my env that are potentially affected individually, not every single one. Also, can it be confirmed that the script touches all the disks in the vm? Thanks

xadamz23
Influencer
Posts: 17
Liked: 3 times
Joined: Dec 08, 2009 10:24 pm
Contact:

Re: VMware CBT bug KB 2090639

Post by xadamz23 » Nov 07, 2014 10:28 pm 1 person likes this post

Code: Select all

$myvm="Your_VM_name_goes_here"
$vm=get-vm $myvm
$spec = New-Object VMware.Vim.VirtualMachineConfigSpec 
$spec.ChangeTrackingEnabled = $false
$vm.ExtensionData.ReconfigVM($spec) 
$snap=$vm | New-Snapshot -Name 'Disable CBT' 
$snap | Remove-Snapshot -confirm:$false
Yes, the script resets the CBT for the entire VM, not individual vmdk disks.

cliffm
Enthusiast
Posts: 41
Liked: 4 times
Joined: Jun 03, 2011 12:41 am
Full Name: Cliff Meakin
Contact:

Re: VMware CBT bug KB 2090639

Post by cliffm » Nov 08, 2014 5:52 am

tsightler wrote:It's there! There's nothing specifically called "SureReplica" in the GUI, but in v7 when you create an application group you'll notice that there's both and "Add Backup" and "Add Replica" for you to select VMs for the app group. Not only that, but when you create the "SureBackup" job itself, you can add select both backup and replica jobs, you can even mix and match in the same job! Some links:

Veeam Helpcenter: SureReplica Documentation
Video: Put your replicas to work
AAAAAHHHHHHH! I had no idea, that is truly wonderful news!
Thank you :)

nreutemann
Enthusiast
Posts: 47
Liked: 6 times
Joined: Mar 06, 2012 11:45 pm
Full Name: Nicolas Reutemann
Contact:

Re: VMware CBT bug KB 2090639

Post by nreutemann » Nov 08, 2014 10:16 pm 1 person likes this post

Well, my five cents.

I run the script from the KB1940. After 30 minutes of work, no news, no problems.

After that, I upgrade to v8. And my next round of backups starts as we plan, took more time and everything goes fine, with the incrementals and with the synthetic.

So, thats all, I hope i got this problem solved.

Cya.

(Sorry, I know, my english sucks)

Locked

Who is online

Users browsing this forum: No registered users and 33 guests