Comprehensive data protection for all workloads
Post Reply
AJ83
Enthusiast
Posts: 60
Liked: 2 times
Joined: Oct 06, 2009 2:32 pm
Contact:

Problem with full VMFS

Post by AJ83 »

This weekend we had big problems caused bij Veeam Backup&Replication 4.1, what happend is the following:
Saturday 5:55AM - A certain SQL VM with 20GB c and 500GB D had failed backup, and wouldn`t come out of snapshot mode.
At first try to backup, Veeam reported the following error message after it had processed the complete VM:
Total VM size: 1,00 TB
Processed size: 1,00 TB
Processing rate: 133 MB/s
Backup mode: VCB SAN
Start time: 26-6-2010 3:58:55
End time: 26-6-2010 6:10:05
Duration: 2:11:10

Backing up VM files
Could not find file 'C:\Documents and Settings\adm_monitor\Local Settings\Temp\veeamvcb0d3f7ea9-a43a-41d8-932f-c10fa9eb8f5c\NED-INAV01.vmx'.
(strangely this machine is reported as beeing 1TB, while the machine is actually 520GB in size)

The retry reported this:
Backing up VM files
VCB backup failed: VCB error: Error: Other error encountered: Snapshot creation failed: The operation is not allowed in the current state. An error occurred, cleaning up...

The third and last retry reported this:
Constructing rollback file name
RemoveSnapshot failed, snapshotRef "snapshot-12223", timeout "3600000"
A general system error occurred: Invalid fault


This was not noticed at first, but eventually, the following night we did. 3 Machines were frozen because Veeam B&R tried to back them up while the VMFS had been completely filled when the snapshot of this SQL machine had grown to 105GB!
Vmware was displaying:
There is no more space for the redo log of XXXXXXX-00001.vmdk. You may be able to continue this session by freeing disk space on the relevant partition, and clicking retry. Otherwise click Abort to terminate this session

I turned off some working machines and migrated their storage to another VMFS to make room for the commit.
I answered the question in VIC to retry the writing to disk, and the machines seemed te bo working perfectly.

But there was still a problem. A certain application VM with a 20GB c, 200GB D, 200GB, E and 50GB F disk was in the proces of being backed up the moment the VMFS filled up. This particular machine had only the 20GB C on the filled up VMFS, the rest of the disks was on a different VMFS. Veeam B&R was however not letting the Backup job fail. I tried stopping manually a couple of times, but no response. Finally i saw no alternative to just killing the services and process of veeam B&R.
The moment i did that, i saw the following in VIC:
Remove snapshot: "Doing an online commit, cannot power off"
The result was strange, the VM was now instantly powered off.

When i turned it on, there was a problem. In windows diskmanagement, i saw the data disks as uninitialised disks., hmm.... now what. Shutdown the machine and made a copy of the folder of this machine on the VMFS, with an the elaborate structure of snapshot files, just to be safe.
Then, i tried committing the snapshots., WHAM, machine down, uhmmm wth.... now what.
After a couple of try`s, my last resort was to delete the disks from the vm, and reattaching the original vmdk files. Luckily, this worked perfectly.

My question is, is this a tested scenario? Is there something i could`ve done to avoid the problem with the uninitialised disks?
Is it not possible for Veeam software to check if there is free space on a VMFS before putting a machine into snapshot mode, to prevent this kind of issue?
Gostev
Chief Product Officer
Posts: 32761
Liked: 7971 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Problem with full VMFS

Post by Gostev »

Hello, this looks like hang VCB process issue. You can open a support case and provide full logs to confirm if needed.

I definitely recommend that you use vStorage API backup mode instead of VCB. VCB provides no benefits over vStorage API, however by using VCB, you are putting an extra layer of 3rd party software between our software and your storage. If anything goes wrong with VCB, this will affect our product and your storage as well. You would not get this issue in vStorage API mode, when our product accesses the storage directly and thus can properly manage all exceptions.

Free disk space check would not help in this specific case, because your datastore had plenty of free space before backup started (so it would pass this check). Datastore filled up with snapshot data later, due to extened snapshot presence while the backup process was hanging.
AJ83
Enthusiast
Posts: 60
Liked: 2 times
Joined: Oct 06, 2009 2:32 pm
Contact:

Re: Problem with full VMFS

Post by AJ83 »

Dear Gostev,

I would love to use vStorage API, however, we use ESX3.5 at the moment.
It`s not true that the backup started before the VMFS was full. While the VMFS was full, Veeam B&R was putting 2 other machines on that VMFS in snapshot mode. If the software would`ve checked. Only the machine that caused it all would`ve been affected, and the problem with the lost disks on the other VM wouldn`t have happend. That would`ve saved me hours of night work.
Vitaliy S.
VP, Product Management
Posts: 27700
Liked: 2909 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Problem with full VMFS

Post by Vitaliy S. »

Actually, you can use vStorage API mode with ESX 3.5 hosts, the only limitation is that CBT will not be available, as this is feature of ESX4. For more details on vStorage API and supported ESX versions please refer to the sticked FAQ thread of this forum.
AJ83
Enthusiast
Posts: 60
Liked: 2 times
Joined: Oct 06, 2009 2:32 pm
Contact:

Re: Problem with full VMFS

Post by AJ83 »

I was not aware of this. I thought that setting it to vStorageAPI environment would let it fall back to VCB anyway in an esx3.5.
I`ve changed it to vStorage API now, and see how it goes. Tnx!
Gostev
Chief Product Officer
Posts: 32761
Liked: 7971 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Problem with full VMFS

Post by Gostev »

We have added the source disk space check before snapshot creation to the upcoming v5 release. Thank you for your feedback.
AJ83
Enthusiast
Posts: 60
Liked: 2 times
Joined: Oct 06, 2009 2:32 pm
Contact:

Re: Problem with full VMFS

Post by AJ83 »

Nice! Hope it will save someone alot of trouble in the future.
This is what makes Veeam such a great product, listening to the users. Tnx for your response!
AJ83
Enthusiast
Posts: 60
Liked: 2 times
Joined: Oct 06, 2009 2:32 pm
Contact:

Re: Problem with full VMFS

Post by AJ83 »

Well, last night i have let my backup jobs on 1 site run on vStorage API, and im very disapointed:
VCB:
Status Success Start time 28-6-2010 21:00:35 Details
Checking retention policy... 1 rollback point(s) have been deleted.
Total VMs 19 End time 29-6-2010 1:09:19
Processed VMs 19 Duration 4:08:44
Successful VMs 19 Total size 2,65 TB
Failed VMs 0 Processed size 2,65 TB
VMs in progress 0 Processing rate 186 MB/s

vStorage API:
Status Success Start time 29-6-2010 21:00:41 Details
Checking retention policy... 1 rollback point(s) have been deleted.
Total VMs 19 End time 30-6-2010 6:15:31
Processed VMs 19 Duration 9:14:49
Successful VMs 19 Total size 1,32 TB
Failed VMs 0 Processed size 1,32 TB
VMs in progress 0 Processing rate 42 MB/s

Shouldnt performance be better with vStorage API? The machine where Veeam runs on, is also the VCB Proxy, so it is the same machine accessing the data.
Vitaliy S.
VP, Product Management
Posts: 27700
Liked: 2909 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Problem with full VMFS

Post by Vitaliy S. »

I see that you have different VMs in those two jobs. I would try switching to vStorage API mode for your 2.65 TB job and see the results, as the VM backup speed may vary from VM to VM. Also could you tell us if you have used the same desination target/source LUN for both jobs?
AJ83
Enthusiast
Posts: 60
Liked: 2 times
Joined: Oct 06, 2009 2:32 pm
Contact:

Re: Problem with full VMFS

Post by AJ83 »

Those are exactly the same VM`s, actually, it is the same job, i just changed the backup mode. I don`t know why there is a difference in size. I know the 1.32TB is the correct total size. But if you look at the time, you know why i`m not satisfied. There are 2 jobs that run at night, 1 with VSS and 1 without VSS. The VSS enabled job is still running right now, influencing our VM performance in production.

I will put it back to VCB and post the results tomorrow.
Vitaliy S.
VP, Product Management
Posts: 27700
Liked: 2909 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Problem with full VMFS

Post by Vitaliy S. »

Could you specify if you've chosen vStorage API SAN with failover or just SAN mode? I've just consulted with our QA team and we haven't seen such a big difference in using VCB and vStorage API SAN modes.

As for the 2,65 TB processed VM size, could you tell us if you have VMs that are running from snapshots in this job? I believe VM snapshot size might be the reason for the extra space reported in the statistics view.
AJ83
Enthusiast
Posts: 60
Liked: 2 times
Joined: Oct 06, 2009 2:32 pm
Contact:

Re: Problem with full VMFS

Post by AJ83 »

There are no VM`s in snapshot mode other then the snapshot that will be taken by veeam.
I was running with vStorage API in SAN with failover mode.

Here are the results now i changed it back to VCB:
Status Success Start time 30-6-2010 21:00:37 Details
Checking retention policy... 1 rollback point(s) have been deleted.
Total VMs 19 End time 1-7-2010 1:04:41
Processed VMs 19 Duration 4:04:04
Successful VMs 19 Total size 2,65 TB
Failed VMs 0 Processed size 2,65 TB
VMs in progress 0 Processing rate 190 MB/s

The total size is affected by 1 machine that has a total of 470GB in disks, but is displayed as:
7 of 17 files processed

Total VM size: 1,20 TB
Processed size: 1,20 TB
Processing rate: 279 MB/s
Backup mode: VCB SAN
Start time: 30-6-2010 22:47:03
End time: 1-7-2010 0:01:59
Duration: 1:14:56

This is only visible in the job statistics, not in the calculated size of the job.
Vitaliy S.
VP, Product Management
Posts: 27700
Liked: 2909 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Problem with full VMFS

Post by Vitaliy S. »

Hm...do you have anything special in that VM configuration? I guess there could be a failover to a network mode in your previous job run, that's why that run was much slower than VCB SAN mode. If you'd like to investigate it further please refer to our support team so they could to take a look at the log files.
AJ83
Enthusiast
Posts: 60
Liked: 2 times
Joined: Oct 06, 2009 2:32 pm
Contact:

Re: Problem with full VMFS

Post by AJ83 »

nothing special with this VM.

Could i try to set it to SAN only to see if is failing over to NBD?
Vitaliy S.
VP, Product Management
Posts: 27700
Liked: 2909 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Problem with full VMFS

Post by Vitaliy S. »

Yes, you could try setting the job to use SAN mode only to see the real stats for SAN mode and check if it tried to failover last time.
Arnold
Enthusiast
Posts: 35
Liked: never
Joined: May 14, 2010 9:33 am
Full Name: Arnold
Contact:

Re: Problem with full VMFS

Post by Arnold »

Just a thought, if you open up Veeam Admin console can you not find the susspect job under "Sessions" and then view the report? This should tell you on the right hand side if it failed over to Network at any time.... :)
AJ83
Enthusiast
Posts: 60
Liked: 2 times
Joined: Oct 06, 2009 2:32 pm
Contact:

Re: Problem with full VMFS

Post by AJ83 »

Arnold wrote:Just a thought, if you open up Veeam Admin console can you not find the susspect job under "Sessions" and then view the report? This should tell you on the right hand side if it failed over to Network at any time.... :)
Well, it does not state it had a failure, so i suspect there was no failover. The backup mode is stated as: SAN/NBD without changed block tracking.

I think i will contact support for this issue.
Post Reply

Who is online

Users browsing this forum: Amazon [Bot] and 42 guests