v8 - Extremely long backup times after power failure

sbbots · Post by **sbbots** » Feb 03, 2015 1:09 am this post

Case# 00736443

A few weeks ago we had a power failure at a remote location and after the UPS batteries died the (2) vSphere hosts at the site shut down hard. Once power was restored the nightly incremental backup ran with the following error:

"CBT data is invalid, failing over to legacy incremental backup. No action is required, next job run should start using CBT again. If CBT data remains invalid, follow KB1113 to perform CBT reset. Usual cause is power loss. "

So the "legacy incremental backup" ran that night and incremental backups returned to normal completion times by the next night. It was during the next FULL backup that the problem started and continues to only effect FULL backups. Incremental backups still run fine. Prior to the power outage a FULL backup would normally run in just over 2 hours, but now they are taking almost 8 hours to finish. The main culprit is host #2, which uses nbd for backups and reads the entire drives of the VMs on the host on every FULL backup. Host #1 uses hotadd and has no issues.

I followed the advice of the error message and Veeam support; I performed a CBT reset (KB1113) on host #2 and ran several test backups. The initial full backup was long (as expected) and incremental backups run perfect, but every subsequent FULL backup after the initial one still reads the entire drive of each VM. One VM has a 1TB drive that is 99% empty yet the backup still reads the entire drive. Again, prior to the power failure these issues did not exist and applying the CBT reset had no effect. Any suggestions about what the problem could be?

Post by **dellock6** » Feb 03, 2015 8:25 am this post

Hi Matt,
do you see in the job realtime report the text [CBT] in the details of the affected VM? This would be a first hint if CBT is effectively back again and leveraged by Veeam or not. If the answer is no, try to reset again CBT, otherwise I suggest opening a support ticket so we can investigate the issue. Feel free afterwards to post the case ID here.

Luca.

Post by **Vitaliy S.** » Feb 03, 2015 9:48 am this post

Since subsequent incremental job passes run fine according to the OP, then I believe CBT should be working fine. Does the behavior you see happen on all VMs of that host? BTW, do you run active or synthetic full backup?

sbbots · Post by **sbbots** » Feb 03, 2015 4:43 pm this post

dellock6 wrote:do you see in the job realtime report the text [CBT] in the details of the affected VM? This would be a first hint if CBT is effectively back again and leveraged by Veeam or not. If the answer is no, try to reset again CBT, otherwise I suggest opening a support ticket so we can investigate the issue. Feel free afterwards to post the case ID here.

I do see [CBT] in the text. I have reset CBT twice and it has no effect. The problem persists only on active FULL backups. Incremental backups run fine.

Support ticket is already open and in my original post.

Vitaliy S. wrote:Does the behavior you see happen on all VMs of that host? BTW, do you run active or synthetic full backup?

It is happening to all VMs on host #2 - (1) Windows 2008R2 and (1) Linux VM. I do active full backups.

Post by **Vitaliy S.** » Feb 04, 2015 11:56 am this post

What is the bottleneck statistics for active full backup job run? Also based on what you've said "processed" and "read" counters are always the same on active full backups, correct?

sbbots · Post by **sbbots** » Feb 04, 2015 7:48 pm this post

Here are some before/after screenshots. Notice that the (2) VMs located on Host #2 (GMELSW01, GMELSW02) are having their entire drives read after the power outage. This isn't just a one-time event; Every active FULL backup (not incremental) since the incident has produced the same result. Host #1 (GMELDC01, GMELFP01) is running backups perfectly with no issues.

BEFORE
-------------

AFTER
-------------

Looking at one VM specifically (GMELSW01)...

BEFORE
-------------

AFTER
-------------

Again, I have reset CBT on host #2 twice now following the instructions from (KB1113). The host is running vSphere 5.1 Update 3 (build 2323236) if that helps.

Post by **Gostev** » Feb 04, 2015 10:41 pm this post

This issue is quite typical, as changeId * query often does not work reliably in VMware. You can try to dig these forums for the full CBT reset procedure which was posted a few years ago. From what remember, it involves power cycling VM at certain steps, and multiple CBT resets. But this procedure is not scientific, and does not always help.

sbbots · Feb 05, 2015 12:55 am

Thanks Gostev. I am just going to blow out the second host (i.e. format and reinstall vsphere) and restore from backup. One of the best things about Veeam is how easy it is to restore a VM after something like this (2-3 clicks of the mouse).

sbbots · Post by **sbbots** » Feb 07, 2015 9:13 pm this post

FYI - I talked to the person who setup our 2nd host and he confirmed that he had increased the size of the thick-provisioned VMDKs of both VMs at some point and in both cases it was over 128GB. Hmmmm. Sound like the CBT bug?

I am going to try a restore of the VMs and change the drives to thin provisioning to see what happens.

sbbots · Post by **sbbots** » Feb 09, 2015 8:18 pm this post

sbbots wrote:FYI - I talked to the person who setup our 2nd host and he confirmed that he had increased the size of the thick-provisioned VMDKs of both VMs at some point and in both cases it was over 128GB. Hmmmm. Sound like the CBT bug?

I am going to try a restore of the VMs and change the drives to thin provisioning to see what happens.

Just FYI - restoring the drives and changing the drives to thin provisioned worked. Backups ran perfect.

Still, a lot of strange things I notice with v8 that I never saw with v7. Backups on 3 different hosts show that some drives seem to only read CBT changes, other drives read the entire drive. It doesn't effect the backup time, but the log shows this. Just something I noticed.

R&D Forums

v8 - Extremely long backup times after power failure

Re: v8 - Extremely long backup times after power failure

Re: v8 - Extremely long backup times after power failure

Re: v8 - Extremely long backup times after power failure

Re: v8 - Extremely long backup times after power failure

Re: v8 - Extremely long backup times after power failure

Re: v8 - Extremely long backup times after power failure

Re: v8 - Extremely long backup times after power failure

Re: v8 - Extremely long backup times after power failure

Re: v8 - Extremely long backup times after power failure

Who is online