Large VM crashed and corrupted during backup

bdoe · Post by **bdoe** » Dec 02, 2014 7:40 pm this post

I have two virtual file servers running Windows Server 2008 R2. They started life on ESXi 5.1 hosts, so they each have 2TB x 10 spanned volumes for a 20TB share, both of which are nearly full. It works well, but since they were running 5.1 Veeam couldn't back them up. It would try and then give me the classic snapshot size error. The VM wasn't affected while I was trying this. The hosts use local storage, and generally the datastores aren't too busy. The file servers are the only things that would be active, every other VM is a DC, DHCP, or otherwise pretty idle.

Our most recent host came this year and thus is running ESXi 5.5. It likewise has a large server running 2012 R2. We were trialing Veeam at the time, and it was recommended to use several disks for parallel processing, so it's using 3TB .vmdks. Veeam's never had a problem with this one. This file server is so far around half filled.

Recently, I upgraded one of the 5.1 hosts to 5.5 and got the VMware Tools updated as well. I was hopeful Veeam would have better luck. Instead, it was far worse. Within a few minutes, the VM was stopped. I spent several minutes trying to get Veeam to stop the job. Once Veeam finally stopped, I saw the snapshot was removed, but the VM showed a redo log error. After some poking around I wound up getting a consolidation error. An attempt to consolidate disks gave me the following error: "An error occurred while consolidating disks: 9 (Bad file descriptor)." Eventually, what I had to do was remove the VM from inventory, move everything but *.nvram, *.vmdk, and *.vmx to a temp directory, and edit the .vmx so that it was using the correct disks (instead of fileserver_3-000001.vmdk). Then I added it back to inventory, and it booted successfully.

Unfortunately, it happened again later that evening. I use one Veeam job to backup all of vCenter except the file servers, and then individual jobs for the file servers. However, even though Veeam still knew the exception by VM name, I guess some sort of identifier changed, and Veeam decided to back up the VM, causing another failure. Fortunately since I knew how to fix it, I had it repaired soon, and edited the vCenter job to exempt it.

So, now I'm stuck wondering a few things. First, what exactly happened? Both 5.5 hosts have roughly 1.3T available, which is a small percentage but should be more than enough. Next, why would the new 5.5 host be working fine while the upgraded one catastrophically failed? And is there any hope of backing up these two using Veeam? I'm currently having to use two backup products, and I'd prefer to consolidate.

Since I think this is a VMware issue, I haven't opened a case yet, but can if needed.

Post by **foggy** » Dec 03, 2014 11:26 am this post

Yes, opening a case is recommended.

bdoe wrote:It would try and then give me the classic snapshot size error.

Could you please elaborate on that error? Do you mean the "File is larger than the maximum size supported by datastore" error?

bdoe · Post by **bdoe** » Dec 03, 2014 4:38 pm this post

foggy wrote:Yes, opening a case is recommended.

Okay, will do. edit: case is 00694194.

Could you please elaborate on that error? Do you mean the "File is larger than the maximum size supported by datastore" error?

Yes, that's the one. The 2TB .vmdk's were created at the maximum size vCenter allowed me to, I didn't reduce the size any for headroom. At one point I was looking into moving the snapshots to an iSCSI datastore, but since I'm using local storage for everything I didn't want to get more complex.

Post by **foggy** » Dec 04, 2014 2:10 pm this post

Was the datastore the affected VM resides on originally created in vSphere 5.1 or hosts were previously upgraded from earlier version somewhere back in time?

bdoe · Post by **bdoe** » Dec 04, 2014 2:21 pm this post

Yes, the datastore was created with 5.1. Filesystem on it says VMFS 5.58, where the new host running 5.5 says VMFS 5.60.

I did run into an issue with VAAI this week. The two hosts use Areca controllers. I had a long-standing issue with deploying various VMware .ovf templates to those two, such as the vCenter Appliance, or more recently after the upgrade, a vMA. I eventually found that although VAAI was unsupported, the Areca driver, firmware, or both resulted in those .ovf templates failing to deploy. vCenter could not configure the database, and vMA failed fsck on the fist boot. The same templates worked fine on Dell hosts. I disabled VAAI and the templates work fine on the other hosts using Areca. Is there any chance that could be related?

Also, just this week the group using the large VM in question cleared it out in preparation for a new project. I could potentially upgrade the VM to larger .vmdk's (say 5 x 4TB) and Server 2012. Would this possibly help?

Post by **foggy** » Dec 04, 2014 3:39 pm this post

bdoe wrote:I disabled VAAI and the templates work fine on the other hosts using Areca. Is there any chance that could be related?

VMware support seems to be the most proper target for this kind of questions.

bdoe · Dec 10, 2014 7:13 pm

Yesterday morning, I shut down the VM, removed it from inventory, and connected to the host via SSH. From there, I moved all of the -00000x.vmdk abd associated -sesparse.vmdk files to a separate location. After that, I re-added it to inventory and booted it up, with no problems. Next, I removed and re-added it to Veeam, and then launched a backup. It transferred 17.8TB and took 21 hours, but it completed without an issue, and the VM stayed running while the job ran.

Since the only thing that's changed was VAAI, it looks like that was the source of the problem.

R&D Forums

Large VM crashed and corrupted during backup

Re: Large VM crashed and corrupted during backup

Re: Large VM crashed and corrupted during backup

Re: Large VM crashed and corrupted during backup

Re: Large VM crashed and corrupted during backup

Re: Large VM crashed and corrupted during backup

Re: Large VM crashed and corrupted during backup

Who is online