Comprehensive data protection for all workloads
Post Reply
jveerd1
Service Provider
Posts: 41
Liked: 9 times
Joined: Mar 12, 2013 9:12 am
Full Name: Joeri van Eerd
Contact:

v9 Veeam agent communication

Post by jveerd1 » Feb 03, 2016 9:23 am

support case ID 01679519

Our customer has deployed a twin datacenter solution (Site A and Site B). Both datacenters are protected by Veeam. Veeam is used for backup and for replication.

For backup, one proxy and one repository are deployed in the datacenter where the virtual machines run. The backup jobs are managed by a backup server running in the remote datacenter. Both datacenters use a remote repository in a third site for off-site backup.
For replication, one proxy is deployed in the datacenter where the virtual machines run and the backup server in the remote datacenter acts as a proxy at the target datacenter. This is setup to enable a simple failover scenario.
In total there are 4 proxies (2 dedicated for backup, 2 dedicated for replication), 3 repositories and 2 backup servers used for backup and replication of both datacenters.

Last week Veeam v8 is upgraded to Veeam v9 in the first datacenter (Site A). Inmediately after the upgrade some backup jobs start hanging randomly. When looking at the backup logs it seems there are connection errors. When a connection error occurs the backup jobs tries to recover from the error. It looks like the job can recover, but the backup job never proceeds and hangs. The only way to stop the job is to kill the process.

Because the setup is mirrored in the other datacenter (Site B, currently still running v8), the logs from the other backup server are investigated. The connection errors do not appear in the backup logs of jobs managed by the v8 backup server.

The connection between both datacenters has not been reliable recently. To eliminate connection issues between the sites the backup server (v9) of Site A is moved from Site B to Site A. Backup jobs did not hang ever since. So it seems the unreliable connection between the sites might be the cause of the hanging jobs.

To our best knowledge (please correct us if we are wrong) a backup job should never hang. It should fail in case of connection issues or it should resume after reconnecting. In v8 a vm would fail and retried in case of a connection error if we remember correctly. We would really like if Veeam was able to explain what changed in v9 with agent communication. Or could help us find a way to workaround this issue, because for simple failover the backup server should run in the opposite site.

foggy
Veeam Software
Posts: 18037
Liked: 1533 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: v9 Veeam agent communication

Post by foggy » Feb 03, 2016 10:58 am

Joeri, just to eliminate the simplest thing (though it doesn't look this is the issue here), are any of the components shared between two Veeam B&R instances? Components having different versions/patch level cannot communicate with each other.

How about your replication jobs? Do they experience any issues?

If the issue is indeed caused by the connection errors, then logs should be closely investigated, so please continue working with support on this.

jveerd1
Service Provider
Posts: 41
Liked: 9 times
Joined: Mar 12, 2013 9:12 am
Full Name: Joeri van Eerd
Contact:

Re: v9 Veeam agent communication

Post by jveerd1 » Feb 03, 2016 11:48 am

There are two components that are shared:
-Enterprise Manager is running v9 on the backup server responsible for Site A. Backup server for Site A and backup server for Site B are managed.
-Offsite repository (backup copies) is used by proxies in both Sites running v8 and v9. But the proxies are accessing different shares and there are no problems with running backup copies.

We don't experience any issues with replication.

Hopefully support is able to find/fix the issue soon. We will continue working with them.

Just curious, did anything change in the agent communication regarding reconnection events?

skrause
Expert
Posts: 431
Liked: 90 times
Joined: Dec 08, 2014 2:58 pm
Full Name: Steve Krause
Contact:

Re: v9 Veeam agent communication

Post by skrause » Feb 03, 2016 2:37 pm

If your repositories are on the same server, even if they are on separate volumes, it will cause a version mismatch for one of your B&R servers because the Veeam services installed on the repository server are only installed once. So your Repository is running one version of the services while two different versions of the services are connecting to it from the management and proxy servers.
Steve Krause
Veeam Certified Architect

foggy
Veeam Software
Posts: 18037
Liked: 1533 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: v9 Veeam agent communication

Post by foggy » Feb 03, 2016 3:01 pm

Steve, you're generally right, however, repositories of different types (CIFS and Linux, for example) can coexist on the same server even if belong to different Veeam B&R versions (since data mover for CIFS runs on another server). Not sure if it is the case here, however, since OP doesn't see any issues with backup copy jobs that are targeting them, versions mismatch doesn't seem to be the cause of the issues.
jveerd1 wrote:Just curious, did anything change in the agent communication regarding reconnection events?
I'm not aware of any changes in this area (except some improvements made for WAN accelerated jobs) and, anyway, you're referring to the connection between the backup server and data movers, not between data movers (btw, we do not call them agents anymore).

jveerd1
Service Provider
Posts: 41
Liked: 9 times
Joined: Mar 12, 2013 9:12 am
Full Name: Joeri van Eerd
Contact:

Re: v9 Veeam agent communication

Post by jveerd1 » Feb 03, 2016 3:57 pm

The customer is running an ExaGrid system as an off-site repository, leveraging the ExaGrid-Veeam Data Mover shares. No issues at all with backup copy jobs from Site A and from Site B. The Veeam component is uploaded to the system when the job starts.

I would love to elaborate about the technical details of this case, but I have to comply with the forum rules. I will keep you updated when the case evolves, it is escalated to the next tier now.

Post Reply

Who is online

Users browsing this forum: andrey.chizhenkov, Google [Bot], its-user01 and 65 guests