Comprehensive data protection for all workloads
Post Reply
MadDog2K
Influencer
Posts: 19
Liked: never
Joined: May 04, 2009 2:02 pm
Full Name: Wouter de Jong
Contact:

Backup failures

Post by MadDog2K »

Hi,

We have an ESXi server, an Openfiler server and a server running Veeam Backup.
And with a few different scenario's concerning subnets/vlan's, I get backup failures.
Note that all this is just a test-environment, and after making some changes in the setup
to try to improve backup speeds, I accidently came to the point of testing the scenario's below.

Setup is as follows :

10.128.20.x = Vlan A
10.128.30.x = Vlan B

Scenario 1:
ESXi: VMkernel 10.128.20.x (+ default gw)
Openfiler: 10.128.20.x (+ default gw)
Backup-server: 10.128.30.x
Backup ESXi@10.128.20.x to Openfiler@10.128.20.x == OK

Scenario 2:
ESXi: VMkernel 10.128.20.x (+ default gw) + 10.128.30.x
Openfiler: 10.128.20.x (+ default gw) + 10.128.30.x
Backup-server: 10.128.30.x
Backup ESXi@10.128.20.x to Openfiler@10.128.20.x == FAIL

Scenario 3:
ESXi: VMkernel 10.128.20.x (+ default gw) + 10.128.30.x
Openfiler: 10.128.20.x (+ default gw) + 10.128.30.x
Backup-server: 10.128.30.x
Backup ESXi@10.128.20.x to Openfiler@10.128.30.x == OK

Scenario 4:
ESXi: VMkernel 10.128.20.x (+ default gw) + 10.128.30.x
Openfiler: 10.128.20.x (+ default gw) + 10.128.30.x
Backup-server: 10.128.30.x
Backup ESXi@10.128.30.x to Openfiler@10.128.20.x == FAIL

Scenario 5:
ESXi: VMkernel 10.128.20.x (+ default gw) + 10.128.30.x
Openfiler: 10.128.20.x (+ default gw) + 10.128.30.x
Backup-server: 10.128.30.x
Backup ESXi@10.128.30.x to Openfiler@10.128.30.x == OK



Scenario 2:
--------------------------------
6 of 15 files processed

Total VM size: 350.00 GB
Processed size: 47.23 KB
Avg. performance rate: 1 KB/s
Backup mode: agentless

Start time: 05/10/2009 11:42:45
End time: 05/10/2009 11:44:05

Backing file "nfc://conn:esxi01.almetaal.local,nfchost:ha-host,stg:49a1a058-b0c4e0fc-4dbd-0015179fb0ee@sbs.almetaal.local/vmware-2.log"
Backup failed
Client error: The semaphore timeout period has expired
Failed to write data to the file [\\10.128.20.100\storage1.vm01.blub\sbs.almetaal.local\Backup Job 3.vbk].

Server error: End of file
--------------------------------

Scenario 4:
--------------------------------
6 of 15 files processed

Total VM size: 350.00 GB
Processed size: 47.23 KB
Avg. performance rate: 1 KB/s
Backup mode: agentless

Start time: 05/10/2009 12:13:33
End time: 05/10/2009 12:14:47

Backing file "nfc://conn:10.128.30.70,nfchost:ha-host,stg:49a1a058-b0c4e0fc-4dbd-0015179fb0ee@sbs.almetaal.local/vmware-2.log"
Backup failed
Client error: The specified network name is no longer available
Failed to write data to the file [\\10.128.20.100\storage1.vm01.blub\sbs.almetaal.local\Backup Job 4.vbk].

Server error: End of file
--------------------------------

I'm just wondering why Scenario 2 and 4 fail.
Could it be ESXi that's bugging, could it be Openfiler that's bugging ... or does Veeam Backup get confused somehow ?
I myself don't see any technical reasons why any of these Scenario's should fail.

The Scenario not listed here is where I give the backup-server an extra nic into 10.128.20.x, and remove the 10.128.30.x from ESXi and Openfiler.
That's probably I'll go for, if that does run succesfull, cause it makes more sense :>

MadDog2K
Influencer
Posts: 19
Liked: never
Joined: May 04, 2009 2:02 pm
Full Name: Wouter de Jong
Contact:

Re: Backup failures

Post by MadDog2K »

After some time, Scenario 3 fails as well...

--------------------------------
13 of 15 files processed

Total VM size: 350.00 GB
Processed size: 28.00 GB
Avg. performance rate: 27 MB/s
Backup mode: agentless

Start time: 05/10/2009 12:33:16
End time: 05/10/2009 12:51:02

Backing file "nfc://conn:esxi01.almetaal.local,nfchost:ha-host,stg:49a1a058-b0c4e0fc-4dbd-0015179fb0ee@sbs.almetaal.local/sbs.almetaal.local-flat.vmdk"
Backup failed
Client error: % connection timeout. Receive operation has failed. Top level protocol: [TCP]. Socket: [Host:esxi01.almetaal.local,port:902].
Failed to download data chunk from the NFC channel. Size of the chunk: [262144].
Failed to retrieve next FILE_PUT message. File path: [[ESXi-VM] sbs.almetaal.local/sbs.almetaal.local-flat.vmdk]. File pointer: [30791958528]. File size: [107374182400].
Unable to read file block. File: [[ESXi-VM] sbs.almetaal.local/sbs.almetaal.local-flat.vmdk]. Offset: [30791434240]. Block granularity: [1048576].

Server error: End of file
--------------------------------

What is Client error, and what is Server error ? (what is what)
If Client error is Veeam Backup, then maybe ESXi screws up ?

Gostev
SVP, Product Management
Posts: 26715
Liked: 4279 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Backup failures

Post by Gostev »

Wouter, all 3 issues above look to be storage-related (OpenFiler) issue to me.
1. Storage timed out on write of backup file to 10.128.20.100.
2. 10.128.20.100 unavailable? Could be either networking issue, or OpenFiler issue with such network configuration.
3. Storage connection timed out when ESXi was retrieving VM file data from the storage in order to pass it to Veeam Backup.

I don't think the issues are scenario-specific (except 4 may be). I am pretty sure that you will see similar timeouts in all other scenarios too, if you setup stress test for your storage (like multiple VMs running disk I/O intensive workloads).

MadDog2K
Influencer
Posts: 19
Liked: never
Joined: May 04, 2009 2:02 pm
Full Name: Wouter de Jong
Contact:

Re: Backup failures

Post by MadDog2K »

Hi Gostev,

Thanks for your reaction.
It could very well be OpenFiler that get's confused :)

However :
Gostev wrote: 3. Storage connection timed out when ESXi was retrieving VM file data from the storage in order to pass it to Veeam Backup.
The VM's are stored on local storage, and only backups are written to the Openfiler storage.
The error really means that ESXi got a timeout fetching the VM from it's storage, in this case from it's local RAID-array ?

Thanks :)

Gostev
SVP, Product Management
Posts: 26715
Liked: 4279 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Backup failures

Post by Gostev »

Wouter, sorry - I did not mention that it is local storage in the last case. I would not expect timeouts from local storage, so may be the reason is different in this case. It is hard to say more from just error description, but debug logs can usually give our devs more details on the issue. I have not seen this error reported before so I don't have any other ideas...

MadDog2K
Influencer
Posts: 19
Liked: never
Joined: May 04, 2009 2:02 pm
Full Name: Wouter de Jong
Contact:

Re: Backup failures

Post by MadDog2K »

Hi Gostev,

I've sent an email with the logfiles :)

Are there other methods (easy) to retrieve the VM's over NFC like Veeam backup does,
so I could check if they fail as well ?

Gostev
SVP, Product Management
Posts: 26715
Liked: 4279 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Backup failures

Post by Gostev »

Hi - yes, you could do that using Datastore Browser in the VMware Infrastructure Client from the same computer where Veeam Backup is installed. But you will need to shutdown the VM first before doing this test.

Post Reply

Who is online

Users browsing this forum: Bing [Bot], Zew and 53 guests