Comprehensive data protection for all workloads
Post Reply
ian0x0r
Veeam Vanguard
Posts: 238
Liked: 55 times
Joined: Nov 11, 2010 11:53 am
Full Name: Ian Sanderson
Location: UK
Contact:

Slow job initialisation since moving to 9.5

Post by ian0x0r »

This is already an open support case, case ID 02008794

I have noticed that there is a real delay in jobs starting when utilising multiple NICS and preferred networks in Veeam. Let me give you a break down of my current setup and what the issue is.

VEEAM01 (Management Server). IP address 172.16.10.15
VEEAMREPO01 (Proxy / ReFS repository). IP address 172.16.10.23, 10.0.99.25, 10.0.99.26
VEEAMVSAN1 (Proxy) . IP address 172.16.10.14, 10.0.99.27

The 172.16.10.x is a /16 network and is acting as the management network. The 10.0.99.x /24 network is the data network used for backup traffic. 10.0.99.x /24 is not routable.

My assumption is that the VBR management server should be able to co-ordinate jobs on the proxy and repository servers to utilise the 10.0.99.x network without it needing to be able to talk to the 10.0.99.x network. Is this assumption correct?

What I have found is a ton of errors in the task log similar to below

Code: Select all

.2016 18:33:27] <52> Error    Failed to connect to agent's endpoint '10.0.99.27:2505'. Host: 'VeeamVSAN1'.
[19.12.2016 18:33:27] <52> Error    A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.0.99.27:2505 (System.Net.Sockets.SocketException)
[19.12.2016 18:33:27] <52> Error       at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
[19.12.2016 18:33:27] <52> Error       at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
[19.12.2016 18:33:27] <52> Error       at Veeam.Backup.Common.CNetSocket.Connect(IPEndPoint remoteEp)
[19.12.2016 18:33:27] <52> Error       at Veeam.Backup.AgentProvider.CAgentEndpointConnecter.ConnectToAgentEndpoint(ISocket socket, IAgentEndPoint endPoint
This results in the job taking an absolute age to start.

As a test I added an additional NIC to the VEEAM01 (management server) on the 10.0.99.x subnet an re ran a job. Low and behold the job started pretty much instantly. A couple of test jobs I re-ran have ran 80% quicker because there are no errors in the task log anymore as above.

So the question is, WHY does the proxy need to establish a connection with the management server on the preferred network that is assigned for data moving?

A colleague of mine is having a very similar issue, case ID 02002661.

My environment is VMware, his is Hyper V.

Thanks,

Ian
Check out my blog at www.snurf.co.uk :D
ian0x0r
Veeam Vanguard
Posts: 238
Liked: 55 times
Joined: Nov 11, 2010 11:53 am
Full Name: Ian Sanderson
Location: UK
Contact:

Re: Slow job initialisation since moving to 9.5

Post by ian0x0r »

Just to add to this quickly, this has nothing to do with the NIC binding order as defined in this article https://technet.microsoft.com/en-us/lib ... 3eedb0322f and is not even applicable in server 2016 as discussed in this article https://blogs.technet.microsoft.com/net ... indows-10/

Ian
Check out my blog at www.snurf.co.uk :D
Cragdoo
Veeam Vanguard
Posts: 629
Liked: 251 times
Joined: Sep 27, 2011 12:17 pm
Full Name: Craig Dalrymple
Location: Scotland
Contact:

Re: Slow job initialisation since moving to 9.5

Post by Cragdoo »

Hello the case ID 02002661 is my case, and thought I'd add a little detail

Image

VBR1 is located on the 172.16.236.x subnet , and HYPV Hosts are all in the 172.21.80.x subnet. VBR1 only knows the HYPV hosts on 172.21.80x (defined in DNS), and 172.21.84.x is non routable from 172.16.236.x

What we are seeing, similar to Ian above, are entries in the logs , where the VBR server appears to be trying to establish connections on the non-routable sub net

Code: Select all

Failed to connect to agent's endpoint '172.21.84.x:2503'. Host: 'hypv4'.
[12.12.2016 06:52:07] <65> Error    A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 172.21.84.104:2503 (System.Net.Sockets.SocketException)
[12.12.2016 06:52:07] <65> Error       at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
[12.12.2016 06:52:07] <65> Error       at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
[12.12.2016 06:52:07] <65> Error       at Veeam.Backup.Common.CNetSocket.Connect(IPEndPoint remoteEp)
[12.12.2016 06:52:07] <65> Error       at Veeam.Backup.AgentProvider.CAgentEndpointConnecter.ConnectToAgentEndpoint(ISocket socket, IAgentEndPoint endPoint)
[12.12.2016 06:52:07] <65> Info     [NetSocket] Connect
We do have network traffic rules defined in VBR console, but I would have thought this only applies to data traffic not management traffic?

Hope the extra info helps
gsroute
Novice
Posts: 6
Liked: 1 time
Joined: Dec 05, 2012 12:01 pm
Full Name: Graeme Snee
Contact:

Re: Slow job initialisation since moving to 9.5

Post by gsroute »

Hi Ian,

Case ID is 02002661 is one I've opened, thanks for including it as we do have similar problems.

In my words on what we've seen since updating to 9.5 is:

Backup jobs are taking considerably longer to run, a job can be broken down into a few sections
1. The initialisation where the backup server is connecting to everything to setup all connections between hosts, proxies and repositories.
2. Data transfer
3. cleanup

Sections 1 and 3 are now taking a vey long time but the data transfer times are normal. An example of one job, it would take around 10 minutes to run from start to finish backing up 3 VMs incrementally, now that job will run for 55 mins of which only 6 minutes are the data processing time. This was 10 mins in v9 and 55 mins in 9.5, of which we've been running since 2nd December.

During the initilisation period I can see in the logs that the backup server is trying to connect to the Hyper-V hosts, move the Change Block Tracking data to another host to work as a proxy but then repeat this every minute until either is works or fails. It is also picking up all IP addresses on the host, two of these IPs are not connectable from the backup server as they are not routable due to them being private networks for SMB, although the repos are piggy backing on that same network, however this part works ok.

There is a second issue where I have a few VMs that will not backup at all, I've moved these to other volumes and these are starting to work better, although slow.
Thanks
Graeme.
ian0x0r
Veeam Vanguard
Posts: 238
Liked: 55 times
Joined: Nov 11, 2010 11:53 am
Full Name: Ian Sanderson
Location: UK
Contact:

Re: Slow job initialisation since moving to 9.5

Post by ian0x0r »

This maybe the proof to show that the backup manager is using the data mover network.

Image
Check out my blog at www.snurf.co.uk :D
Cragdoo
Veeam Vanguard
Posts: 629
Liked: 251 times
Joined: Sep 27, 2011 12:17 pm
Full Name: Craig Dalrymple
Location: Scotland
Contact:

Re: Slow job initialisation since moving to 9.5

Post by Cragdoo »

anyone care to comment ??

The latest update on either case, is a discussion with QA about this behaviour.
PTide
Product Manager
Posts: 6535
Liked: 762 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Slow job initialisation since moving to 9.5

Post by PTide »

Hi Craig,
VBR1 is located on the 172.16.236.x subnet , and HYPV Hosts are all in the 172.21.80.x subnet. VBR1 only knows the HYPV hosts on 172.21.80x (defined in DNS), and 172.21.84.x is non routable from 172.16.236.x
I'm confused with your setup a little bit - as far as I can see from the picture the repository is located in the 172.21.84.x subnet that is not routable from VBR subnet. That might be a dumb question but how did you manage to add a repo from that network? Repo should be accessible from VBR.

Thanks
tsightler
VP, Product Management
Posts: 6027
Liked: 2855 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Slow job initialisation since moving to 9.5

Post by tsightler » 2 people like this post

This biggest part I'm struggling with is what changed with 9.5 because the preferred network settings have, as far as I know, always prioritized connections for any communications with the Veeam agent, and that included communications from the VBR server to the agents running on proxies/repos even in earlier versions. Curious, what version were you running previously? Do you have some logs of backups when this was working as it would be interesting to compare?

I'm guessing that the delay is being exasperated by the fact that the firewall is probably configured to silently drop TCP traffic rather than actually reject the traffic (i.e. send proper ICMP response that destination is unreachable). This will cause every attempt to wait until the connection timeout instead of immediately retrying. Perhaps you could either tweak the firewall rules to refuse, rather than silently drop, the traffic from the VBR server to the SMB network, we would still try and fail a lot of times, but instead of waiting 20 seconds (or whatever the default timeout is), it should fail almost instantly each time. If you can't tweak the firewall rules themselves then you should be able to add some outbound rules the Windows firewall to immediate reject all traffic to the unreachable network. This more hides the problem vs fixing it, but it might be workable.
ian0x0r
Veeam Vanguard
Posts: 238
Liked: 55 times
Joined: Nov 11, 2010 11:53 am
Full Name: Ian Sanderson
Location: UK
Contact:

Re: Slow job initialisation since moving to 9.5

Post by ian0x0r »

Thanks guys,

Just wanted to confirm for my use case at least that setting outbound firewall rule on management server to reject traffic rather silently drop has worked around the issue. There is no further delay in job initialisation.

Thanks for your replies all, and the guys in support for putting in the time and effort to look for a resolution.

Ian
Check out my blog at www.snurf.co.uk :D
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 84 guests