Host-based backup of VMware vSphere VMs.
Post Reply
pgitdept
Influencer
Posts: 17
Liked: 14 times
Joined: Feb 03, 2011 10:29 am
Full Name: PGITDept
Contact:

ESX 5.5: An error occurred while consolidating disks

Post by pgitdept » 7 people like this post

Hi All,

Case: #00510773

I thought I'd share some information regarding an issue we've been experiencing since moving to ESX 5.5 and Veeam 7 R2a. Background: We've been running Veeam for just shy of 4 years. Our Veeam server was a 2008R2 box and it backed up a 3 host ESX 5.1 cluster. Now, some of our VM's are rather large... in the region of 8TB+ large (although using a bunch of 2TB VMDK's). The size of these VM's has never been an issue and things have been running really well.

So, we now have some servers that just cannot fit within the 2TB limit. In fact, we have to allow for a 11TB drive on one of our servers (this is the lowest granularity we can achieve). So, we built a ESXi 5.5 cluster on new hosts as soon as it was available, built and tested some large VM's (Windows and RHEL) along with two new Veeam 7 servers (Windows 2012R2). Initial testing was good: backups and restore from both disk and tape, good. We're good to go!

In addition to these new servers we needed to accomodate our existing VM's and so needed to upgrade our 5.1 hosts. To do this we moved them into the new 5.5 vCenter and upgraded the hosts without much incident. This new cluster is served by the new Veeam servers and so we knew/accepted that we'd have to do full backups to begin with. This wasn't a concern as we probably don't run active fulls as often as we'd like because of VM's are so big.

On the whole this has been going well, but we did find that two of our older servers had the vSphere yellow warning triangle on them - once after the full seed, the other a few days after it's initial seed. Each of these servers are about 4TB in size and contain 3 or 4 disks. That size is usual here and many other servers of that size or larger worked correctly. The yellow triangle was the 'VM needs it's disks consolidating' message with no snaps present in Snapshot Manager. When trying to consolidate we got the following message:

Error: An error occurred while consolidating disks: msg.snapshot.error-FAILED. The maximum consolidate retries was exceeded for scsix:x.

We have spoken to VMWare, had the logs dug into and they couldn't understand why this was happening. We could take and remove snaps after, but couldn't avoid one disk from each server working on a snapshot. Next, on one of the servers, which has little change, we shutdown, manually deleted the snap and connected to the flat disk. Once we took and removed another snapshot... consolidation was needed again. VMware seemed to indicate that we might need to shutdown and clone the disk to remedy this. This wasn't something we could entertain, as this issue may present itself of any of our disks and these are production servers that cannot afford days of cloning time. Imagine cloning a 11TB disk. :shock:

So it was starting to look fairly helpless, until Veeam identified the following in the log (VMWare did highlight this, but didn't spend too much time :

2014-01-16T04:17:27.495Z| vcpu-0| I120: DISKLIB-LIB : Free disk space is less than imprecise space neeeded for combine (0x96a3b800 < 0x9b351000, in sectors). Getting precise space needed for combine...
2014-01-16T04:17:40.173Z| vcpu-0| I120: SnapshotVMXConsolidateHelperProgress: Stunned for 13 secs (max = 12 secs). Aborting consolidate.
2014-01-16T04:17:40.173Z| vcpu-0| I120: DISKLIB-LIB :DiskLibSpaceNeededForCombineInt: Cancelling space needed for combine calculation
2014-01-16T04:17:40.174Z| vcpu-0| I120: DISKLIB-LIB : DiskLib_SpaceNeededForCombine: failed to get space for combine operation: Operation was canceled (33).
2014-01-16T04:17:40.174Z| vcpu-0| I120: DISKLIB-LIB : Combine: Failed to get (precise) space requirements.
2014-01-16T04:17:40.174Z| vcpu-0| I120: DISKLIB-LIB : Failed to combine : Operation was canceled (33).
2014-01-16T04:17:40.174Z| vcpu-0| I120: SNAPSHOT: SnapshotCombineDisks: Failed to combine: Operation was canceled (33).
2014-01-16T04:17:40.178Z| vcpu-0| I120: DISKLIB-CBT : Shutting down change tracking for untracked fid 9428050.
2014-01-16T04:17:40.178Z| vcpu-0| I120: DISKLIB-CBT : Successfully disconnected CBT node.
2014-01-16T04:17:40.211Z| vcpu-0| I120: DISKLIB-VMFS : "/vmfs/volumes/50939781-d365a9b0-5523-001b21badd94/<SERVERNAME>/<SERVERNAME>-000002-delta.vmdk" : closed.
2014-01-16T04:17:40.213Z| vcpu-0| I120: DISKLIB-VMFS : "/vmfs/volumes/50939781-d365a9b0-5523-001b21badd94/<SERVERNAME>/<SERVERNAME>-flat.vmdk" : closed.
2014-01-16T04:17:40.213Z| vcpu-0| I120: SNAPSHOT: Snapshot_ConsolidateWorkItem failed: Operation was canceled (5) 2014-01-16T04:17:40.213Z| vcpu-0| I120: SnapshotVMXConsolidateOnlineCB: Synchronous consolidate failed for disk node: scsi0:2. Adding it to skip list.

We can see that there is a process that calculates if there is enough space free for consolidation and if this process does not complete in 12 or less seconds, it aborts the consolidate operation. After speaking to VMWare we found that we couldn't extend this timer - it's hardcoded. Veeam suggested that this precise calculation is only needed if there is less free space available in the datastore than the size of the disk that needs consolidation. So we extended the LUN and datastore so we had enough free space and ran the consolidate task again. This time it worked instantly and has continued to work for the last few days.

So I guess we have a work-around for the issue - to extend the datastore. Obviously this isn't ideal when we might be speaking about 11TB VMDK's (22TB datastore!!).

We still don't know why this happened, or should I say why it didn't happen before... The VM hasn't changed in size for months and we never had this issue on 5.1 or Veeam 6.5. It's on the same backend storage and the datastore latency hasn't really risen. I'm guessing something must have changed in connection with vSphere 5.5 snapshotting and this has made it less tolerant of our environment.

Anyway, I just thought I'd post this in-case anyone else experiences this or similar issues or has any further insight into this issue.

Thanks
Adrian
m1m1n0
Novice
Posts: 5
Liked: 3 times
Joined: Nov 18, 2013 9:13 am
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by m1m1n0 » 3 people like this post

Hello!

fire a support request to VMware. They consolidate deltas differently in 5.5, which are supposed to be more gentle to VMs. Your VM is too intensive on writes and consolidation process cannot keep up with the changes when -delta file grows faster than it can commit the changes to the base. It is not Veeam's problem, it's the way ESXi 5.5 behaves comparing to 5.1.

VMware will most likely recommend to increase the time for which the VM is allowed to be paused during delta consolidation. Don't be afraid that you will have longer downtime now when you remove snapshot, 5.1 was doing that silently anyway.

And do this ASAP. The process of removing snapshots for VMs like yours is very stressful on disk subsystem.

Source: gone through this myself

EDIT: in ESX 5.1 there was a possibility that consolidation will be failing the same way until some 30 times, however after that the host would pause your VM up to 30 minutes to consolidate the delta. ESX 5.5 does not do that, fall back to this mechanism does not happen and the allowed pause time is 5 seconds IIRC.
dahdco
Novice
Posts: 3
Liked: never
Joined: Oct 07, 2011 2:31 pm
Full Name: Doug Heckman
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by dahdco »

We had the exact same error. It didn't occur until we upgraded our host to 5.5 (vcenter was upgraded a ~week prior).

In our first instance we changed the scsi controller from Paravirtual to LSI Logic SAS and were then able to consolidate and the problem didn't return on that VM.

On our second occurrence we weren't able to take a downtime to switch the controller from Paravirtual to LSI. We were able to consolidate by first taking a snapshot, then consolidate, then remove all snapshots. The consolidation failing was happening every night on this VM and this fix would work every time in the morning. We ended up moving the VM config files (VMX etc) to a new LUN (off of 3PAR to Compellent). Since doing this we haven't had the consolidation fail on this VM. Both the old and new LUN location had less free space than the actual disk size (1.95TB) so the precise check is still occurring. This VM has 5 1.9TB disk. The consolidation was always failing on the last one. I just checked the logs for this VM and see the following events. They are almost all around 11 seconds response. Looks like I'm barely missing the 12 second failure - repeatedly. Kind of scary.

2014-02-08T05:03:57.988Z| SnapshotVMXCombiner| I120: DISKLIB-LIB : Free disk space is less than imprecise space neeeded for combine (0xa0ffb800 < 0xe7db5800, in sectors). Getting precise space needed for combine...

2014-02-08T05:04:08.604Z| SnapshotVMXCombiner| I120: DISKLIB-LIB : Upward Combine 2 links at 1. Need 4 MB of free space (1318903 MB available)
dahdco
Novice
Posts: 3
Liked: never
Joined: Oct 07, 2011 2:31 pm
Full Name: Doug Heckman
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by dahdco »

Spoke too soon. The VM where we moved the config file to faster disk had another occurrence. VMWare went with it's a locking issue and wanted me to reboot the backup server and try again. I did and it worked but I think it's unrelated and things where just "faster" at the time and didn't trigger the 12 second timeout. I have logs where consolidation fails due to locking which look very different from these failures, which I uploaded to vmware today. Hopefully they'll actually look at the logs this time (pretty sure they didn't before contacting me).

So far I've had the error on 3 different VMs, across two different backup servers - we have 4. Two of the VMs had a single occurrence and the other has had 5+.
munklarsen
Influencer
Posts: 23
Liked: 9 times
Joined: Nov 15, 2012 11:02 pm
Full Name: Michael Munk Larsen
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by munklarsen » 5 people like this post

Just wanted to mention this so that people don't blame veeam for this :) We run TSM for VE on our enviroment and we have the same issues. Some times we can consolidate just fine, other times is doesn't work. What however always works is:

1) make snapshot (without snapping the memory)
2) consolidate
3) remove snap
4) consolidate
teknomage
Service Provider
Posts: 25
Liked: 2 times
Joined: Jul 21, 2010 8:55 pm
Full Name: Mike
Location: Fargo, ND
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by teknomage »

I was running into the same issue (Error: An error occurred while consolidating disks: msg.snapshot.error-FAILED. The maximum consolidate retries was exceeded for scsix:x) and your steps fixed me right up. Thanks munklarsen.
cdickerson
Enthusiast
Posts: 25
Liked: 4 times
Joined: Nov 23, 2010 2:39 am
Full Name: Craig Dickerson
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by cdickerson » 3 people like this post

Had this same problem. At first the VMware engineer said there was no change to the way snapshots were removed in vSphere 5.5, I quickly told him he was wrong. Here is the fix VMware provided to me.

-Pick a VM (s) which is most affected. You need to add a parameter in the virtual machine configuration.

You can do this by:

• Shut down the virtual machine
• Right-click the virtual machine and click Edit Settings.
• Click the Options tab.
• Under Advanced, click General.
• Click Configuration Parameters and add snapshot.maxConsolidateTime = 30
award@kahnlitwin.com
Lurker
Posts: 2
Liked: 1 time
Joined: Jun 19, 2014 10:07 am
Full Name: Andrew Ward
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by award@kahnlitwin.com » 1 person likes this post

I had this problem and found that the Veeam server still had the affected VM's disk mounted. I removed the disk from the Veeam server and cosolidated without problems.
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by Vitaliy S. »

Yes, that could be one of the reasons for this issue as well. On top of that, if you happen to see disks not mounted from the proxy server again, contact our technical team for investigating this behavior. Thanks!
TBone
Novice
Posts: 7
Liked: never
Joined: Feb 12, 2014 4:14 pm
Full Name: Brian
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by TBone »

I will add that we have ben having a similar problem with ESX 5.1. After the backup of the server completes, Veeam get a notice that the snapshot has been removed. However vCentre starts throwing errors that consolidation is required. Examining the config of the server in question, it is still running off a snapshot, despite the fact that snapshot manager doesn't show anything.

I have just opened a ticket with VMWare so we'll see what they make of it. This problem, does not happen every day, but generally is always the same server (a SQL 2008R2). It did not start happening until we moved from a Lefthand SAN to Nimble Storage.
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by Vitaliy S. »

Could it be the issue with datastore performance not having enough IOPs to commit the snapshot? When you see this issue happening, can you please check the performance graphs of the source datastore?
Peejay62
Expert
Posts: 235
Liked: 37 times
Joined: Aug 06, 2013 10:40 am
Full Name: Peter Jansen
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by Peejay62 »

TBone wrote:I will add that we have ben having a similar problem with ESX 5.1. After the backup of the server completes, Veeam get a notice that the snapshot has been removed. However vCentre starts throwing errors that consolidation is required. Examining the config of the server in question, it is still running off a snapshot, despite the fact that snapshot manager doesn't show anything.
.
I am curious, did you ever get to solve this or found out the cause? I am seeing some similar behaviour.

Thanks, Peter
veremin
Product Manager
Posts: 20415
Liked: 2302 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by veremin »

Hi Peter, we've provided some explanation on that issue previously; should clarify the nature of the experienced behavior. Thanks.
Peejay62
Expert
Posts: 235
Liked: 37 times
Joined: Aug 06, 2013 10:40 am
Full Name: Peter Jansen
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by Peejay62 »

Vladimir thank you. More and more i am convinced that indeed the issue lies within vsphere. Just to illustrate. I defined a new job yesterday containing 17 VMs. They have never been snapshotted before. Ended up with 6 vms needing consolidation afterwards. Common factor is that within this job al vms in need of consolidation ran on the same host (not same datastore). Vmware reports that snapshot removal is complete, browsing datastore shows it not. Seems like the actual running of the cleanup gets lost, host overloaded or some kind of a command queue loss??
Anyway, i am trying to find a good way in trying to avoid the consolidations needed, it's getting kind of annoying.

Thanks, Peter
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by Vitaliy S. »

Peejay62 wrote:Vmware reports that snapshot removal is complete, browsing datastore shows it not. Seems like the actual running of the cleanup gets lost, host overloaded or some kind of a command queue loss??
Do you run your jobs using vCenter Server connection or direct ESXi host? Try to switch between these two and see if you observe the same snapshot consolidation error message or not.
Peejay62
Expert
Posts: 235
Liked: 37 times
Joined: Aug 06, 2013 10:40 am
Full Name: Peter Jansen
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by Peejay62 »

We run using Vcenter connection. It will be too much effort I am afraid to connect to ESXi host directly because I don't know where or when on what host it will happen... Probably I can make a new job, picking a specific host and select all the vms running on that one. Then I have to sit and wait to see if the consolidations occurs. That might just be the case. You gave me a good idea to dig in and find the root cause for this, thanks

Peter
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by Vitaliy S. »

Yes, I was referring to adding the "problematic" host as a standalone one via IP address and then run a couple of test jobs to see if this issue can be reproduced or not.
seadave
Enthusiast
Posts: 36
Liked: 6 times
Joined: Oct 30, 2014 12:43 am
Location: Seattle
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by seadave »

I've also had this error. We've been running Veeam 8.0.0.917 for about 3 months without issue. This week we used a mail archiving tool to export a large number of email messages ~1.5M (150G) to a file share. A few days after this, when Veeam snapshot our mailbox server, it filled up the LUN it was on. The LUN had quite bit of free space ~500G, but the VM is 2.2TB so possible it needed more than that. Awoke to find VM hung wanting to retry a file operation. I first expanded free space and then selected "Retry" in vCenter. Wouldn't work so I choose Continue. VM was still hung so needed reboot. It came back up fine. Realized later in the day the VM still needed snapshot consolidation. I attempted one and I got the error:

An error occurred while consolidating disks: 9 (Bad file descriptor).

Realized later that the consolidation process kept cycling during the day. I've never seen that happen before. It appeared like it knew there was a problem and just kept running sometimes failing and sometimes succeeding until all deltas were processed. It was weird. Finally resolved itself right after I had paid for a $1200 afterhours VMware support credit. I couldn't believe it.

Later that night I manually ran Veeam against that VM again and again I got the error after the backup had completed successfully and vCenter was attempting to consolidate the snapshot. This time it only took four attempts before it was resolved. I think I might be getting hit by the 12s bug. Not sure. Wondering about what kind of risk this involves moving forward. I think it might be time to build a clean mailserver and migrate mailboxes to free up the whitespace in the old one.
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by foggy »

Looks like Snapshot Hunter tries to consolidate snapshots. You can ask support for logs review to identify whether everything works as expected.
dellock6
VeeaMVP
Posts: 6166
Liked: 1971 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by dellock6 »

You can quickly look in history as explained in the blog post linked by Alexander and search for snapshot hunter activities, just to be sure.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
ken.wilson
Service Provider
Posts: 35
Liked: 3 times
Joined: Sep 26, 2011 2:28 pm
Full Name: Ken Wilson
Location: Toronto, Canda
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by ken.wilson »

award@kahnlitwin.com wrote:I had this problem and found that the Veeam server still had the affected VM's disk mounted. I removed the disk from the Veeam server and cosolidated without problems.
This was my issue as well with one of my VM's. One of my Proxy's still had the disk added but it's strange the job log didn't show an issue and was successful. Either way once the disk was removed from the proxy I was able to consolidate the disk.
rafael.gonzalez
Novice
Posts: 3
Liked: 2 times
Joined: Oct 08, 2014 12:51 am
Full Name: Rafael Gonzalez
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by rafael.gonzalez » 2 people like this post

Ok, so I was hitting the same issue - a particularly busy test Exchange VM was successfully backing up, but would bomb out on the consolidation step - so I always had the yellow bang on the VM. The only way to remedy that was to power off the VM and delete snap/consolidate, then power back on. Not good.

I spoke to veeam (support case ID 00835299) and vmware support over several days, and the word I got was that in our scenario, the situation just could not be dealt with with vanilla settings - so - they sent me this article:

http://kb.vmware.com/selfservice/micros ... Id=2082886

(Article number is 2082886 in case forum hoses the link.)

The short version is that I had to change a setting for our problem VM that allowed it to remain in an "unresponsive state" for the duration needed to consolidate the disks. (I used the PowerCLI option to target the one VM but didn't alter the time value under "Additional Info").

Well - I'm happy to report that the initial results look good, and the backups appear to be completing.

It's not a perfect solution, but until then - it increased joy.
iwik
Influencer
Posts: 20
Liked: 4 times
Joined: Apr 02, 2014 11:10 am
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by iwik » 1 person likes this post

Hi, I hit same problem. My solution was to shutdown vm and run consolidation. It was successful when vm was not running.
dharmon
Lurker
Posts: 1
Liked: never
Joined: Jun 25, 2015 2:05 pm
Full Name: David Harmon
Contact:

Re: ESX 5.5: An error occurred while consolidating disks

Post by dharmon »

My fix: Even though none of my proxy servers, including the backup server, showed that they had a drive mounted for a backup or replication, once I stopped the backup server and proxy servers, the consolidation proceeded without error. I turned the backup server and proxy servers back on afterward.
Post Reply

Who is online

Users browsing this forum: Google [Bot] and 20 guests