Host-based backup of VMware vSphere VMs.
ngagne
Novice
Posts: 9
Liked: never
Joined: Sep 22, 2015 4:01 pm
Contact:

[KB2136854] There's a new CBT bug in ESXi 6

Post by ngagne »

Source: https://miketabor.com/another-cbt-bug-f ... -esxi-6-0/ and http://kb.vmware.com/kb/2136854. I'm not affected by this, but wanted to make sure everyone knows about it!

[EDIT] Veeam KB > KB2075
nullifi
Influencer
Posts: 23
Liked: 5 times
Joined: Aug 05, 2015 1:41 pm
Full Name: Jason Taylor
Contact:

Re: There's a new CBT bug in ESXi 6

Post by nullifi »

Annoying, the VMware KB link doesn't appear to be working.

How are you not affected, do you not use CBT?
ngagne
Novice
Posts: 9
Liked: never
Joined: Sep 22, 2015 4:01 pm
Contact:

Re: There's a new CBT bug in ESXi 6

Post by ngagne »

The VMware link took a while to load for me too; the first article says pretty much the same thing that the KB did though. We're not affected because we're still on 5.5 :)
nullifi
Influencer
Posts: 23
Liked: 5 times
Joined: Aug 05, 2015 1:41 pm
Full Name: Jason Taylor
Contact:

Re: There's a new CBT bug in ESXi 6

Post by nullifi » 2 people like this post

Ah. Wonderful. Hurray for embracing the future!
ngagne
Novice
Posts: 9
Liked: never
Joined: Sep 22, 2015 4:01 pm
Contact:

Re: There's a new CBT bug in ESXi 6

Post by ngagne »

Hooray for my predecessor not keeping VMware SnS active so I could upgrade. Between this and the plethora of other v6 bugs, I'm actually thankful that he didn't!
ashleyw
Service Provider
Posts: 207
Liked: 42 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

[MERGED] MAJOR: new CBT issue on vsphere6.

Post by ashleyw »

Hi,. We've just been made aware of this shocker;

http://www.running-system.com/attention ... -esxi-6-0/

To what extent are Veeam customers exposed to this issue (looking at the problem it impacts all products running snapshot backups).
oh deary me - a bad end to a good week!
ashleyw
Service Provider
Posts: 207
Liked: 42 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: MAJOR: new CBT issue on vsphere6.

Post by ashleyw »

and we've just brainstormed this through;-
it's our opinion that;
- surebackup won't save you either as just because the VM starts up - it doesn't mean that there hasn't been corruption on the image.
- Replication is dependent on CBT so that is potentially bad as well.
- Synthetic fulls merge the incrementals so they are potentially bad.
- If we have to run active fulls on our 70TB or primary storage every day, well need to move our Veeam host to Venus becauase a day on Venus is 243 Earth days - it's the only way we could get through our backup cycle in a day!

I'm trying to see the positive side here - but unfortunately I can't - and the extent of the impact here is unknown.
andyg
Enthusiast
Posts: 58
Liked: 5 times
Joined: Apr 23, 2014 9:51 am
Full Name: Andy Goldschmidt
Contact:

Re: There's a new CBT bug in ESXi 6

Post by andyg »

this looks bad. any comment from Veeam ?
-= VMCE v9 certified =-
nullifi
Influencer
Posts: 23
Liked: 5 times
Joined: Aug 05, 2015 1:41 pm
Full Name: Jason Taylor
Contact:

Re: MAJOR: new CBT issue on vsphere6.

Post by nullifi »

I've decided to continue to use incrementals (Maybe because I tried fulls last night and I had 14 failures due to the backup window... ahem..) since I am mostly concerned about files, and no files are generally modified during the backup window, including the time the snapshots are being consolidated.

I'll just take the risk that some OS files may be damaged if I need to restore the entire VM.
nullifi
Influencer
Posts: 23
Liked: 5 times
Joined: Aug 05, 2015 1:41 pm
Full Name: Jason Taylor
Contact:

[MERGED] : Incremental without CBT?

Post by nullifi »

With the recent CBT bug, I was wondering if unchecking the "use CBT" option in job properties would alleviate the issue caused by that bug?

All my VM's have CBT enabled, what would happen if I uncheck that option? How are incrementals taken?
Gostev
Chief Product Officer
Posts: 31779
Liked: 7279 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: There's a new CBT bug in ESXi 6

Post by Gostev » 5 people like this post

Hi, all. There is very little info available from VMware on this issue, so we will put together some tests to try and reproduce it in-house. KB article wording prompted me to think that there is a chance B&R may not be affected, but to be sure we need to confirm where this bug actually sits.

For those who don't want to take chances, the easiest workaround is to disable the use of CBT data in the advanced backup job settings. Your backups will remain incremental, but they will take longer because the job will need to read the entire VM to determine changes.
Gostev
Chief Product Officer
Posts: 31779
Liked: 7279 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: There's a new CBT bug in ESXi 6

Post by Gostev » 1 person likes this post

ashleyw wrote:surebackup won't save you either as just because the VM starts up - it doesn't mean that there hasn't been corruption on the image
Actually, even "stock" SureBackup job will immediately detect impact from such an issue at least on some VMs in your environment, thanks to application test scripts. This is because applications like AD, Exchange or SQL will refuse to mount corrupted application databases.

And for general purpose VMs, you could set up a custom SureBackup test script to remotely execute chkdsk, and collect its results via an exit code.
larry
Veteran
Posts: 387
Liked: 97 times
Joined: Mar 24, 2010 5:47 pm
Full Name: Larry Walker
Contact:

Re: There's a new CBT bug in ESXi 6

Post by larry » 1 person likes this post

I think I will Un read this until Monday as 5 oclock on Friday is almost here. :D
chrmol
Enthusiast
Posts: 38
Liked: 2 times
Joined: May 17, 2010 7:41 pm
Full Name: Christian Moeller
Location: Denmark
Contact:

Re: There's a new CBT bug in ESXi 6

Post by chrmol »

Hi Veeam (Gostev :-) )
Would Active full backups also be suspect to corruption (as last time we saw CBT problems on 5.5) ?
/Christian
Gostev
Chief Product Officer
Posts: 31779
Liked: 7279 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: There's a new CBT bug in ESXi 6

Post by Gostev »

Potentially yes, they may also be suspect to corruption, because they leverage CBT data as well.
Instead of taking chances, at this point it is best to simply disable the use of CBT data completely.
NightBird
Expert
Posts: 245
Liked: 58 times
Joined: Apr 28, 2009 8:33 am
Location: Strasbourg, FRANCE
Contact:

Re: There's a new CBT bug in ESXi 6

Post by NightBird »

If we disable use of CBT data on the backup job, Do Veeam retrieve the missing block ? or do we need to do an active full backup (I think Veeam will retrieve the missing disk block)
Gostev
Chief Product Officer
Posts: 31779
Liked: 7279 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: There's a new CBT bug in ESXi 6

Post by Gostev » 3 people like this post

Yes, as without CBT we will physically compare latest state of the disk in backup with its actual state, and transfer any non-matching blocks over into the incremental backup file.
jamerson
Veteran
Posts: 366
Liked: 24 times
Joined: May 01, 2013 9:54 pm
Full Name: Julien
Contact:

Re: There's a new CBT bug in ESXi 6

Post by jamerson »

Let hope VMware will come this week with a new patches .
all i can say its not good !
thank you guys for informing us .
alex1002
Enthusiast
Posts: 25
Liked: 1 time
Joined: Jan 27, 2015 6:17 pm
Full Name: Alex
Contact:

Re: There's a new CBT bug in ESXi 6

Post by alex1002 »

Is Microsoft
Or someone else paying them off to purposely cripple their products?
What's going on VMware?
ashleyw
Service Provider
Posts: 207
Liked: 42 times
Joined: Oct 28, 2010 10:55 pm
Full Name: Ashley Watson
Contact:

Re: There's a new CBT bug in ESXi 6

Post by ashleyw »

so just so we can understand the risks here...
In the development space we often take snapshots of VMs while they are running and then revert back to the snapshots at a later stage - would this also be impacted?

If we disable CBT on our farm, then surely the amount of IOPs involved comparing what has changed on a VM will have a huge impact to our storage layer (based on 400 VMs on approx 60TB) to the point where backup cycles are no longer viable?

If we had a primary SAN capable of SAN snapshots and were able to ship the SAN snapshots to another device (like ZFS snapshot send) I'm guessing then at least we'd be in a safe position as the changed blocks would be handled at a SAN layer rather than at a VMware layer?

I really hope this issue gets fixed urgently as the risk of silent corruption is potentially significant.
gerdesj
Service Provider
Posts: 4
Liked: 1 time
Joined: Nov 21, 2012 3:02 pm
Full Name: Jon Gerdes
Contact:

Re: There's a new CBT bug in ESXi 6

Post by gerdesj »

I provided a little feedback to the TID:
----------------------------8<------------------------------
You give indications of a problem but no definitive descriptions of the implication of what will really happen. Your customers will be using various backup solutions - yours, BEX, Veeam, Acronis, etc etc - what are the implications?

Man up and describe exactly what symptoms might occur for all your major solution providers. Bugs happen but this TID is rubbish and not what I would expect from VMware.
----------------------------8<------------------------------

We know what to do here: Gostev has spelt it out already. The real losers are those who don't keep abreast of events ("hmm data, he no restore") and the big boys who depend on CBT to keep backups within a finely tuned window.
jelloir
Novice
Posts: 5
Liked: 1 time
Joined: May 07, 2015 11:45 pm
Contact:

Re: MAJOR: new CBT issue on vsphere6.

Post by jelloir »

ashleyw wrote:and we've just brainstormed this through;-
- Synthetic fulls merge the incrementals so they are potentially bad.
Can Veeam elaborate on this please.

We use forward incremental forever and a copy job to Cloud Connect. I presume that merged [CBT based] increments may be corrupt, so what impact does this have on the full VBK file? Will simply disabling CBT ensure consistent backups from that point forward? And what if any impact on copies of this data to Cloud Connect?

Regards

James
BLWL
Enthusiast
Posts: 35
Liked: 41 times
Joined: Jan 27, 2015 7:24 am
Full Name: Bjorn L
Contact:

Re: There's a new CBT bug in ESXi 6

Post by BLWL »

Thanks Gostev for the heads-up in your always awesome newsletter!

Just curious: Anyone knows if disabling CBT per VM is a workaround?

http://kb.vmware.com/kb/1031873
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: There's a new CBT bug in ESXi 6

Post by joergr » 3 people like this post

No, CBT will be re-enabled at next backup-run. So i suggest to disable CBT in VEEAM. Thus, you will be 100% safe. You don´t need to do a full backup or change anything else because VEEAM will (when CBT is turned off in VEEAM) compare every single block and replace every single block it finds which don´t matches. So let´s say you disable CBT in VEEAM today and you have an inc backup scheduled for this night, the backup of this night will take longer but will be 100% OK.

Best,
Joerg
dkvello
Service Provider
Posts: 108
Liked: 14 times
Joined: Jan 01, 2006 1:01 am
Full Name: Dag Kvello
Location: Oslo, Norway
Contact:

Re: There's a new CBT bug in ESXi 6

Post by dkvello »

Is there any know way to figure out If a particular environment is affected ?

What are the "requirements" for this bug to appear ?

I have several R6 environments without seeing this problem, so there has to some particular attributes in an vSphere6 environment that cause this.
lrosa
Influencer
Posts: 17
Liked: 3 times
Joined: Dec 11, 2012 9:11 pm
Full Name: Luigi Rosa
Location: Italy
Contact:

Re: There's a new CBT bug in ESXi 6

Post by lrosa »

Thank you joergr, that's exactly what I needed to know.


Luigi
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: There's a new CBT bug in ESXi 6

Post by joergr »

dkvello: It looks like as per Anton´s research it could occur during heavy i/o write operations from within the guest, but i suggest to wait till we have way more intel before drawing any conclusions. Till then, i´d suggest to either downgrade to esxi 5.5 or disable cbt from within veeam.

Best,
Joerg
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: There's a new CBT bug in ESXi 6

Post by joergr »

lrosa wrote:Thank you joergr, that's exactly what I needed to know.
you are very welcome ;)
Gostev
Chief Product Officer
Posts: 31779
Liked: 7279 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: There's a new CBT bug in ESXi 6

Post by Gostev » 1 person likes this post

dkvello wrote:I have several R6 environments without seeing this problem, so there has to some particular attributes in an vSphere6 environment that cause this.
Do you know for sure there is no issue? Did you try to restore a VM and run chkdsk, at least? I am naturally curious, because it will help to understand the scope (whether corruption due to the issue happens every time, or only under certain conditions). chkdsk is typically the best tool among those readily available to quickly detect corruptions at the file system level, but it may not be able to detect every corruption every time - but rather, depending on the actual corruption pattern.

We used a more advanced approach based on one of our existing auto tests. This one constantly writes a very large file (100GB+) in guest, and then compares MD5 hash between the resulting file on the production VM, and with hash of the file in VM restored from an incremental backup. Such a test will obviously detect even single bit corruptions every time.
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: There's a new CBT bug in ESXi 6

Post by joergr » 2 people like this post

I was thinking a while about that Anton ;-) Wouldnt it be a nice thing (yeah here comes a feature idea) to have a very simple chkdsk with returncode test script right build-in to surebackup assistant, let´s say to just checkmark a surebackup option, call it somewhat like "check disk integrity at guest level with chkdsk" and bang - a chkdsk script is executed when a win* os was found in the vm.

Best,
Joerg
Locked

Who is online

Users browsing this forum: No registered users and 21 guests