Veeam 8 possible CBT affecting issues

VMware specific discussions

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby SGalbincea » Mon Jan 26, 2015 3:46 pm

I am in a similar boat here. Ticket# 00713193

The setup:

VMware vSphere 5.5U2b Virtual Environment
2x Nimble CS460's via Brocade VDX 6740s
1x Veeam B&R Virtual Server
2x Physical Veeam Proxies, one 2008R2 and one 2012R2 (SAN Direct) each with 2x 10Gb LAN, 2x 10Gb SAN connectivity
1x DataDomain DD2500 w/DDBoost (2 repo's configured with each proxy as a GW for each repo)

I am having issues with larger servers. The backups will run for some time, and then fail with the following message:

"1/25/2015 10:25:28 PM :: Processing AMHOUGEO01 Error: Thread not finished within [1800000] milliseconds.
Failed to upload disk.
Agent failed to process method {DataTransfer.SyncDisk}."


I can't see any reason at all why the data flows would time out - the environment is very much over-spec'd for this application. I cannot find any clues in the logs anywhere as to what is happening. I have disabled parallel processing with some degree of success, but it is simply not practical to do so with our data amounts (30+TB nightly).

I have tried the suggestions here with regards to disabling IP4 offload and the reg key, so I have tried those and will report back what happens there.
Senior Network Engineer, VCP5-DCV, VMSP, VMTSP, NIOP
Senior Network Engineer
BEMA Information Technologies
Houston, TX
SGalbincea
Enthusiast
 
Posts: 55
Liked: 6 times
Joined: Fri May 25, 2012 2:09 pm
Location: Houston, TX
Full Name: Steve Galbincea

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby Codea » Mon Jan 26, 2015 9:52 pm

You probably have already checked this but I'll mention it any way, check that the time on all your servers and sites are synchronised. We've had similar issues and found that to be a common cause of our connections closing.
Codea
Lurker
 
Posts: 2
Liked: 1 time
Joined: Tue Aug 02, 2011 2:04 am
Full Name: Tony Velarde

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby SGalbincea » Tue Jan 27, 2015 9:32 pm

Quick update: job is still running (12.5TB of 30TB), which gives me some hope. Only changes made were to disable IPv4 Offload on all 10Gb NICs in the proxy servers. I have not tried the registry key fix as of yet.
Senior Network Engineer, VCP5-DCV, VMSP, VMTSP, NIOP
Senior Network Engineer
BEMA Information Technologies
Houston, TX
SGalbincea
Enthusiast
 
Posts: 55
Liked: 6 times
Joined: Fri May 25, 2012 2:09 pm
Location: Houston, TX
Full Name: Steve Galbincea

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby KevinK » Mon Feb 02, 2015 3:31 pm 3 people like this post

Below is a list of the changes implemented so far;

From Veeam
Create the DataMoverLocalFastPath (DWORD) registry value under HKLM\SOFTWARE\Veeam\Veeam Backup and Replication, and set it to the following value: 2
This will use shared memory instead of the loopback NIC to transfer the data between the agents.


Create the SESSTIMEOUT (DWORD) registry value under HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\SERVICES\ LANMANWORKSTATION\PARAMETERS\ and set the value in decimal: 60000

Set target repositories to use Data Domain Local accounts instead of Domain accounts.
Set target repositories to use IP addressing instead of DNS.


>>> Site 1 backups now work <<<

Disable TCP checksum offload on Veeam Backup server 10GBps NICs.


From EMC
Increase the CIFS request timeout on the client (Windows 2008 and Windows 2012 server) by changing the registry parameters under the key HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\LanmanWorkStation\Parameters.
•Set the registry parameter ExtendedSessTimeout to 3600 seconds. If the parameter is not present, add it with type REG_DWORD.
•Set the registry parameter ServersWithExtendedSessTimeout to the list of one or more Data Domain servers, where each server is either an IP address or a name of the Data Domain system. If the client uses different names or uses both IP address and name for a given Data Domain system, all of them must be included here. If the parameter is not present, add it with type REG_ MULT_SZ.

Tune the Data Domain CIFS
1) # cifs option set “socket options" "TCP_NODELAY SO_RCVBUF=3146268 SO_SNDBUF=3146268 IPTOS_THROUGHPUT SO_KEEPALIVE"
This command sets CIFS socket receive/send buffer sizes to 3MB. The default value for this is 1MB
2) # cifs option set maxlogsize 204800
This command sets the CIFS client log file sizes to 200MB. The default size is only 50MB.
3) # cifs option set "dd aio count" 32
This parameter sets the number of pending I/O requests between CIFS server and DD File System.


From Veeam
•Create the NetUseShareAccess (DWORD) registry value under HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\ and set the value in to 1
This will use the Windows wrapper for traffic to the CIFS share.


>>> Site 2 backups now work <<<

We removed the NetUseShareAccess setting to troubleshoot with EMC the write issues to the DD CIFS share, but magically it started working without that setting.

I've asked Veeam and EMC to close the tickets with a summary of the changes complete thus far, however we we're still unable to pinpoint the issue.

I suggest anyone experiencing similar issues raises a ticket before applying the modifications I've listed above, that way Veeam can correlate the information for a better overview of customers affected. At the moment I feel like Veeam think I'm the only person with these issues :)
KevinK
Enthusiast
 
Posts: 28
Liked: 10 times
Joined: Wed Apr 24, 2013 9:18 am
Full Name: Kevin Kissack

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby jabroni » Wed Feb 04, 2015 6:49 pm

Have you tried running a restore? I have the same network errors during backup and now that I actually have to restore some files I get the same network errors every single time I attempt an Instant Recover, VM Recover or a VM Files Recover.

Code: Select all
Error when attempting Instant VM Recovery:

2015-02-04 12:41:32          Starting VM XXX recovery
2015-02-04 12:42:27          Connecting to host XXX
2015-02-04 12:42:49          Checking if vPower NFS datastore is mounted on host
2015-02-04 12:42:49          Locking backup file
2015-02-04 12:47:23 Error    Publishing VM
                             The specified network name is no longer available.
                             Failed to read data from the file [\\XXX\vol_VeeamBackup\backups\Tier 1 VM Backup\Tier 1 VM Backup2015-01-19T210127.vib].
                             Cannot process [restore] command.
                             
2015-02-04 12:47:23          Canceling backup file lock
2015-02-04 12:47:23 Error    Failed to publish VM XXX Error: The specified network name is no longer available.
                             Failed to read data from the file [\\XXX\vol_VeeamBackup\backups\Tier 1 VM Backup\Tier 1 VM Backup2015-01-19T210127.vib].
                             Cannot process [restore] command.


Error when attempting full VM restore:

2015-02-04 12:56:48          Starting restore job
2015-02-04 12:56:48          Locking required backup files
2015-02-04 12:57:58          Queued for processing at 04/02/2015 12:57:58 PM
2015-02-04 12:58:03          Preparing next VM for processing
2015-02-04 12:58:02          Required backup infrastructure resources have been assigned
2015-02-04 12:58:03          Using source proxy XXX [hotadd]
2015-02-04 12:59:18          9 files to restore (42.0 GB)
2015-02-04 13:05:02 Error    Restoring [XXX] XXX.vmx
                             The specified network name is no longer available.
                             Failed to read data from the file [\\XXX\vol_VeeamBackup\backups\Tier 1 VM Backup\Tier 1 VM Backup2015-01-28T200319.vib].
                             Failed to restore file from local backup. VFS link: [XXX.vmx]. Target file: [MemFs://frontend::CDataTransferCommandSet::RestoreText_{387d0f21-ee00-4fea-9cfa-73c49b2743d9}]. CHMOD mask: [0].
                             Agent failed to process method {DataTransfer.RestoreText}.
                             Agent failed to process method {DataTransfer.RestoreText}.
                             
2015-02-04 13:05:02 Error    Restore job failed Error: The specified network name is no longer available.
                             Failed to read data from the file [\\XXX\vol_VeeamBackup\backups\Tier 1 VM Backup\Tier 1 VM Backup2015-01-28T200319.vib].
                             Failed to restore file from local backup. VFS link: [XXX.vmx]. Target file: [MemFs://frontend::CDataTransferCommandSet::RestoreText_{387d0f21-ee00-4fea-9cfa-73c49b2743d9}]. CHMOD mask: [0].
                             Agent failed to process method {DataTransfer.RestoreText}.
                             Agent failed to process method {DataTransfer

                             
Error when attempting VM File Restore:

2015-02-04 13:09:31          VM files restore started
2015-02-04 13:15:28 Error    Failed to restore file: XXX.vmx Error: The specified network name is no longer available.
                             Failed to read data from the file [\\XXX\vol_VeeamBackup\backups\Tier 1 VM Backup\Tier 1 VM Backup2015-01-28T200319.vib].
                             Unable to retrieve next block transmission command. Number of already processed blocks: [0].
                             Unable to receive file.
                             Exception from server: The specified network name is no longer available.
                             Failed to read data from the file [\\XXX\vol_VeeamBackup\backups\Tier 1 VM Backup\Ti
2015-02-04 13:15:28 Error    Failed to restore file: XXX.vmxf Error: Exception of type 'Veeam.Backup.AgentProvider.AgentClosedException' was thrown.
2015-02-04 13:15:28 Error    Failed to restore file: XXX.nvram Error: An established connection was aborted by the software in your host machine
2015-02-04 13:15:28 Error    Failed to restore file: XXX_1.vmdk Error: An established connection was aborted by the software in your host machine
2015-02-04 13:15:28 Error    Failed to restore file: XXX_1-flat.vmdk Error: An established connection was aborted by the software in your host machine
2015-02-04 13:15:28 Error    Failed to restore file: XXX_2.vmdk Error: An established connection was aborted by the software in your host machine
2015-02-04 13:15:28 Error    Failed to restore file: XXX_2-flat.vmdk Error: An established connection was aborted by the software in your host machine
2015-02-04 13:15:28 Error    Failed to restore file: XXX.vmdk Error: An established connection was aborted by the software in your host machine
2015-02-04 13:15:28 Error    Failed to restore file: XXX-flat.vmdk Error: An established connection was aborted by the software in your host machine
2015-02-04 13:15:30          VM files restore completed


I opened a case regarding the restore though I highly suspect the (intermittent) issue when the backup is running has the same root cause as the issue I'm having with a restore.

Veeam case ID: 00743859
jabroni
Novice
 
Posts: 4
Liked: never
Joined: Tue Jan 20, 2015 4:08 pm

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby jabroni » Tue Feb 17, 2015 8:47 pm

Alright I think I've found the root cause of the issue with our CIFS backup target.

Code: Select all
2015-02-17 03:10:12 :: Processing XXXXXX Error: The specified network name is no longer available.
Failed to write data to the file [\\XXXXXX\vol_VeeamBackup\backups\Tier 1 VM Backup\Tier 1 VM Backup2015-02-16T200259.vib].
Failed to download disk.
An existing connection was forcibly closed by the remote host
Failed to upload disk.
Agent failed to process method {DataTransfer.SyncDisk}.


Type errors as well as errors with VM recovery like:

Code: Select all
2015-02-05 18:08:53          Starting VM XXXXXX Restore Test recovery
2015-02-05 18:09:41          Connecting to host XXXXXX
2015-02-05 18:10:08          Checking if vPower NFS datastore is mounted on host
2015-02-05 18:10:08          Locking backup file
2015-02-05 18:40:11 Error    Publishing VM
                             A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
2015-02-05 18:40:11          Canceling backup file lock
2015-02-05 18:40:11 Error    Failed to publish VM XXXXXX Restore Test Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond



Both issues seem to have been resolved by upgrading my NetApp OnTap version to 8.2.3P1 which has a fix for:

Code: Select all
Bug ID   798842
Title   Service disruption caused by issue related to SMB 2.1 leasing


I guess V8 uses SMB2.1 but V7 didn't?

If the backups all complete successfully tonight without this issue occurring and it doesn't happen the next cycle, I'll consider it resolved. So far my tests have been successful, though I want to see this error not come up on a few other jobs for a couple of days too.
jabroni
Novice
 
Posts: 4
Liked: never
Joined: Tue Jan 20, 2015 4:08 pm

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby connelp » Mon Mar 02, 2015 8:12 am

Interesting. I am getting exactly the same problem on my file server (around 1TB).

Last week (the first set up full backups following the upgrade from 7 to 8 ) the backup failed twice but worked on the third attempt.

I have opened a case (00819540) so will see what happens.
connelp
Influencer
 
Posts: 22
Liked: never
Joined: Mon Oct 08, 2012 7:48 am

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby KevinK » Fri Mar 06, 2015 9:47 am

The fix for us was to increase the memory allocation on the physical backup servers or reduce the stream count each could handle. I suspect your retry is completing when other jobs have finished and memory is lower.

This must be something to do with v8 - we used to run 16 streams on a 16GB-RAM machine without issue. Apparently the new Veeam documentation states you should have 2GB/stream available.

According to a post on LinkedIn I'm not the only one who is under this impression;
Image
KevinK
Enthusiast
 
Posts: 28
Liked: 10 times
Joined: Wed Apr 24, 2013 9:18 am
Full Name: Kevin Kissack

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby jabroni » Mon Mar 09, 2015 3:47 pm

Once I sorted out the issues with our NetApp backup target, I ran into the same issue of needing to double my RAM allocations on my backup proxies (16GB Veeam server and 8GB proxies) as we did have a couple of jobs fail to to insufficient resources. CPU use doesn't seem to be any different.
jabroni
Novice
 
Posts: 4
Liked: never
Joined: Tue Jan 20, 2015 4:08 pm

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby jgq85 » Wed Sep 30, 2015 2:36 am

We're getting the error:

Code: Select all
9/29/2015 2:07:28 AM :: Processing Mail9 Error: calling system(), returns nonzero. Err: 5040
Failed to write data to the file '[hqdd2200] DDBoost:/Mail9/Mail92015-09-29T020712.vbk ( 2049626114 )'. Offset: '1131770236928'.
Failed to download disk.
Shared memory connection was closed.
Failed to upload disk.
Agent failed to process method {DataTransfer.SyncDisk}.


Is the fix supposed to be in the Windows server, or on the datadomain? I don't understand.

It seems to be happening on only one of our servers, which is the most critical, heh.
It's a full backup, ran for like 16 hours before failing at that error.
In the job one hard disk had a red x next to it, but no details on what (just a red x + "9/29/2015 2:10:50 AM :: Hard disk 5 (1000.0 GB) 1000.0 GB read at 17 MB/s [CBT]") I still have Veeam trying to look into logs but we're not getting anywhere.
jgq85
Influencer
 
Posts: 13
Liked: 2 times
Joined: Mon Jul 13, 2015 1:50 pm
Full Name: JGQ85

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby amitrahat » Wed Oct 07, 2015 1:34 pm 1 person likes this post

Hi,

I've had the same issue and found that the guests failed to be backed up reside on a datastore to which the host, on which the Veeam VM is running, does not have direct access.
Once I moved the Veeam VM to a VMware host connected directly to that datastore - issues were resolved.

Further explanation:
Veeam is running on VMware Host A
VMware host A is directly connected to datastore1
Veeam is failing to backup guest1 running on VMware Host B and residing on datastore2
VMware Host B is directly connected to datastore1 and datastore2
Once Veeam was moved to VMware Host B issue was resolved

Best Regards,

Amit
amitrahat
Lurker
 
Posts: 1
Liked: 1 time
Joined: Wed Oct 07, 2015 1:27 pm
Full Name: Amit Rahat

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby larry » Wed Nov 18, 2015 4:24 pm

I had the same error:

Code: Select all
An existing connection was forcibly closed by the remote host
Unable to retrieve next block transmission command. Number of already processed blocks: [3417].
Failed to download disk.
An existing connection was forcibly closed by the remote host
Failed to upload disk.
Agent failed to process method {DataTransfer.SyncDisk}.


error, very intermittent on some jobs but happing every time on one VM in One job.
On the Veeam server I did the Change Both TCP/UDP Checksum Offload to disable per page one of post.

Job ran successfully for the first time, second and third time. Now will wait and see.

My setup is 10Gig copper, ESX6 Veeam 8 NetApp SAN, reading data at 400ms
larry
Expert
 
Posts: 372
Liked: 89 times
Joined: Wed Mar 24, 2010 5:47 pm
Full Name: Larry Walker

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby tommy_p » Mon Aug 22, 2016 1:22 pm

I had those problems too. After disabling TCP checksum offload, the problem was solved. Thank you! :D
tommy_p
Veeam ProPartner
 
Posts: 1
Liked: never
Joined: Mon Aug 22, 2016 1:15 pm

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby johndrago » Tue Sep 27, 2016 10:48 pm

Where can I find the TCP checksum offload setting?

Thanks,

John
johndrago
Lurker
 
Posts: 1
Liked: never
Joined: Wed Oct 05, 2011 7:49 pm
Full Name: John Drago

Re: Veeam 8 possible CBT affecting issues

Veeam Logoby foggy » Wed Sep 28, 2016 7:40 am

Hi John, it's in the network adapter properties.
foggy
Veeam Software
 
Posts: 14716
Liked: 1075 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

PreviousNext

Return to VMware vSphere



Who is online

Users browsing this forum: btmaus and 18 guests