Comprehensive data protection for all workloads
Post Reply
pufferdude
Expert
Posts: 222
Liked: 15 times
Joined: Jul 02, 2009 8:26 pm
Full Name: Jim
Contact:

Backup went bad, now VM is a mess

Post by pufferdude »

I've been using veeam 4.x successfully for at least 9 months, so I'm pretty comfortable with it. Today I created a new backup job to back up a new Exchange 2010 server. The first (full, obviously) backup went just fine, and four hours later the second backup started (per the schedule). The VM lives on a iSCSI SAN and I'm using vStorage API to back it up. The second backup kicked in and it started OK (about 40MB/sec) but when it got about 60GB into the 260GB VM, progress pretty much halted and the MB/sec started dropping, with no more data shoing being backed up.

So, I told veeam to stop the job and it said it would, but I let it sit there for an hour and it never stopped. So I rebooted the veeam server... this of course stopped the job, but now my VM is in some weird state where it appears to be running just fine, but I have all sort of HUGE vmdk files in the server's VMFS directory on the san. My server is called 'xchange" and has two drives: 60GB for OS and 200GB for data. These are the vmdks I see in the directory:

XCHANGE.vmdk 14.8GB
xchange.vmdk 116Gb (yes, there are two different vmdks with the same name, only differing in capitalization!)
XCHANGE.ctk.vmdk 3.8GB
xchange-ctk.vmdk 6.4GB
xchange-000001-ctk.vmdk 6.4GB
XCHANGE-000001-ctk.vmdk 3.8GB
xchange-000001.vmdk 198GB
XCHANGE-000002.vmdk 67GB
XCHANGE-000002-ctk.vmdk 3.8GB
xchange-000002.vmdk 247GB
xchange-000002-ctk.vmdk 6.4GB

Does ANYONE have any idea how I fix this mess? After I restarted veeam server, it tried to delete the snapshot it had created, but it said "file not found" or similar. So I gracefully downed the xchange server and brought it back up. Now vCenter says there are no shapshots for the server, yet all of these huge files exist and I'm not sure at all what do to.

Any help appreciated.
Gostev
Chief Product Officer
Posts: 31522
Liked: 6700 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Backup went bad, now VM is a mess

Post by Gostev »

Hi Jim,

1. Two different VMDK files - this is normal as your Exchange server has 2 disks. Looks like you named them in the same way, just different capitalization.

2. The 2 big files with 00000x in name are snapshot files. Basically, when you hard reset the backup server during the job, you are not giving Veeam Backup a chance to remove the snapshot (this happens automatically at the end of backup job). So, their presence is expected.

However, in all cases, these snapshots should be reflected as existing snapshots in VMware Infrastructure Client. If you are not seeing them there, this is NOT normal, and it means something went wrong on ESX/vCenter side during your attempts to remove the snapshots. To resolve this, it would be best to open a support case with VMware and have the help you with removing (commiting) those snapshots manually.

Hope this helps!
cby
Expert
Posts: 109
Liked: 6 times
Joined: Feb 24, 2009 5:02 pm
Contact:

Re: Backup went bad, now VM is a mess

Post by cby »

In the past when a VM has got itself in a mess with snapshots (more vmdks than expected in one instance) I've resorted to cloning the VM then deploying a new VM from the clone. This has the effect of 'rolling up' the snapshots and producing consolidated vmdks. It's tedious but has worked on more than one occasion.
TrevorBell
Veteran
Posts: 364
Liked: 17 times
Joined: Feb 13, 2009 10:13 am
Full Name: Trevor Bell
Location: Worcester UK
Contact:

Re: Backup went bad, now VM is a mess

Post by TrevorBell »

Yes i agree i also just make a clone and then delete original then power on the clone as the new live.
pufferdude
Expert
Posts: 222
Liked: 15 times
Joined: Jul 02, 2009 8:26 pm
Full Name: Jim
Contact:

Re: Backup went bad, now VM is a mess

Post by pufferdude »

Hey guys, thanks for all the good info. Gostev... the snapshots were indeed "there" on disk, but not reflected in the vCenter interface at all... according to that, there were no snaps on the vm at all. I did a little googling and found a KB article that describes my situation. The simple fix was to TAKE another snapshot on the vm via vCenter, which then caused some phantom "consolidate helper" snaps to ALSO show up, and simply delete/roll-up them all. Seemed to work fine for me and after 20 min or so the large snap files were all gone and the server was running just fine.

Second issue, and I think this might be a veeam bug, is that the vmdk files were SOMEHOW named the same, only different capitalization (xchange.vmdk & xchange-flat.vmdk for one disk, XCHANGE.vmdk & XCHANGE-flat.vmdk for the second. I certainly didn't do this intentionally, and think something went weird when I initially named the VM "XCHANGE" then renamed it (via VC) to "xchange", and THEN moved it to a new datastore and THEN added the second disk. Somehow in all of that, the new disk ended up being xchange.vmdk instead of the expected XCHANGE_1.vmdk.

I any case, veeam didn't seem to like this situation at all. It would get through the 60GB (first disk) and then just hang on the second (200GB) disk. It's as if ESX has no trouble distinguishing the two similar-named disks, but veeam somehow thought they were one and the same (ignoring case), and freaked out. My fix for this was to just use vmkfstools to properly rename the second disk to xchange_1.vmdk and edit the vmx file to recognize this disk. After that, veeam took a good first backup and the subsequent backups overnight at 4 hr. intervals were all over 1GB/sec (and this is an active Exchange server!)

So, all is good at this point... I think the root of the entire problem was somehow getting the two disks named the same, but different case.
Gostev
Chief Product Officer
Posts: 31522
Liked: 6700 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Backup went bad, now VM is a mess

Post by Gostev »

This possible bug with Veeam handling two disks named the same, but different case definitely worths investigating from our side. We would appreciate if you send us full logs from failed backup job. With logs, we will be able to confirm that the job hung because of this unusual VM configuration issue with disks having exact same names. Or, that this was not the case, and the reason was different.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], EKo_BEOFG and 132 guests