Comprehensive data protection for all workloads
Post Reply
llogan6906
Novice
Posts: 9
Liked: never
Joined: Feb 07, 2011 9:02 pm
Full Name: Luke Logan
Contact:

Network applications interrupted during replication

Post by llogan6906 »

Trying to figure out a frustrating problem, so I am probably over-sharing in this post. Just wanted to provide as much information as possible.

We currently are trying to implement a replication process using B&R 6.5 on a vSphere 5.1 host to a 5.1 destination. The equipment is brand new Lenovo RD630 servers (dual 8-core Xeon) with 8 - 10k SAS drives in a RAID 10 array on the primary server and 8 - 10k SAS drives in a RAID 5 on the destination server (for some additional room for more restore points). Our primary focus is creating a replica (that will eventually be at a remote office) for two of our primary servers (Win Server 2008 R2). We'd like to have this replication job run every hour to acheive the RPO we want. This job is currently set up with both servers in the same rack for seeding and testing. The job takes about 13-16 minutes to run and the data transferred is fairly small on average, so it appears to fit in the window we wanted.

However, every time this replication job runs, our users get kicked out of programs they are running like QuickBooks, UltraTax CS, and Creative Solutions Accounting. These programs all have one thing in common; the data files and/or program files are stored on one of the two servers being replicated. I have tried turning off VSS and other items, but still have users kicked out every hour that I have replication turned on. I even ran a constant ping from one of these servers to a couple of the users' workstations who were having trouble. During the replication, the response times changed from <1 ms to about 100-150 ms. I don't think that is a long enough delay to be causing the errors directly.

I have had to turn off this replication job during operating hours until we get if fixed. Changing the frequency to only run outside of working hours obviously kills part of our intent with this project as the RPO is substantially larger.

I was under the understanding that this replication process was supposed to be nearly "invisible" to end users. Did I miss something? Or do I have something configured incorrectly?

We have used Veeam B&R for a number of years (for backups) and I would love to get this replication working correctly. Thanks for any help/advice you can give!
veremin
Product Manager
Posts: 20415
Liked: 2302 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Network applications interrupted during replication

Post by veremin »

Hi.

During the run of replication job the snapshot of VM is taken and removed right after the job finishes. This process of creating and removing snapshot is handled solely by VMware (specifically by ESXi host).

So, may I ask you to take the snapshot of your production VMs manually, keep snapshot open for long enough time before deleting it, similar to time it takes to backup the VM. And then trigger the snapshot commit operation to check if you experience the similar behavior or not.

Such information is likely to give clear understanding whether this issue should be addressed by Veeam or VMware.

Thanks.
llogan6906
Novice
Posts: 9
Liked: never
Joined: Feb 07, 2011 9:02 pm
Full Name: Luke Logan
Contact:

Re: Network applications interrupted during replication

Post by llogan6906 »

Finally had a chance to try a snapshot through vSphere today. Completed and deleted with no interruption to users. Another thought I had, I have performed Veeam Backups while users were in the system and never had a complaint of them being "kicked out" of a network file.

Does this then point to an issue with the configuration of Veeam B&R or the Replication job itself or with the Veeam B&R 6.5 software and vSphere 5.1?

Thanks for any help you can offer!
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Network applications interrupted during replication

Post by Gostev »

Did you wait before deleting the snapshot (for the same time as replication cycle runs, about 15 minutes)?
llogan6906
Novice
Posts: 9
Liked: never
Joined: Feb 07, 2011 9:02 pm
Full Name: Luke Logan
Contact:

Re: Network applications interrupted during replication

Post by llogan6906 »

Yes, it was over 30 minutes from creation to removal. I can remove it sooner if you'd like me to test that as well. Thanks!!!
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Network applications interrupted during replication

Post by tsightler »

Assuming your job is using hotadd mode, I'd try forcing the proxy to network mode and see if the problem goes away. I've seen quite a few issues with longer than expected stuns with hotadd mode. If you're not seeing the issue when taking and removing a snapshot manually, this is certainly a possibility.
llogan6906
Novice
Posts: 9
Liked: never
Joined: Feb 07, 2011 9:02 pm
Full Name: Luke Logan
Contact:

Re: Network applications interrupted during replication

Post by llogan6906 »

I am new to Replication...where exactly would the hotadd or network mode selection be? In the setup of the replication job or some other global setting? Thanks!
ZachW
Enthusiast
Posts: 68
Liked: 10 times
Joined: Aug 02, 2011 6:09 pm
Full Name: Zach Weed
Contact:

Re: Network applications interrupted during replication

Post by ZachW »

This option will be found within the proxy properties of the proxy that is being used by this job. So, go backup infrastructure > backup proxies > right-click proxy > change transport mode to use only Network mode.
llogan6906
Novice
Posts: 9
Liked: never
Joined: Feb 07, 2011 9:02 pm
Full Name: Luke Logan
Contact:

Re: Network applications interrupted during replication

Post by llogan6906 »

Well, I have tested and still had a user "kicked out" of a QuickBooks file she had open (stored on shared drive of one of the servers being replicated). Seems to happen around the time of "Deleting helper snapshot" or "Removing VM snapshot" if that helps any.

I will try running a backup job that includes this server and see if it also is having this problem. I know it did not from early versions of Veeam, but I don't think I have tested it since upgrading to 6.5. I will probably not be able to post those results until sometime on Monday.

Thanks for everybody's help and input!
ZachW
Enthusiast
Posts: 68
Liked: 10 times
Joined: Aug 02, 2011 6:09 pm
Full Name: Zach Weed
Contact:

Re: Network applications interrupted during replication

Post by ZachW »

Keep in mind that this is not unheard of ( for a stun to be extended during the removal of a snapshot ). What occurs no matter what on the removal of a snapshot is that the snapshot will be consolidated back into the parent disk and to get this done VMware will perform a "stun" on the VM. Now, while this is always occurring you will find that in certain environments to where there is a higher amount of latency on the storage you will see an extended stun on the VM which will in turn usually drop network connectivity to the box. While it is usually recommended to perform a test of creating a snapshot on the VM and removing it after the amount of time that it took the job to complete I have not always found these results to be identical to the results seen in the processing of the job for whatever reason. My recommendation would be to take a look at the latency on the storage during the last time you saw this occur if you have access to historical performance stats.. And if not then I would recommend monitoring the storage utilizing either a third-party tool or esxtop and taking a look at the peaks during the snapshot removal task. While this will not resolve the issue for you it should be able to give you a better understanding of what is occurring during that period of time to assist in pointing you in the correct direction.
llogan6906
Novice
Posts: 9
Liked: never
Joined: Feb 07, 2011 9:02 pm
Full Name: Luke Logan
Contact:

Re: Network applications interrupted during replication

Post by llogan6906 »

Looking at the Datastore Performance Log, I see that Write Latency increased to a maximum of about 10 milliseconds (never exceeded that). During normal operations, it looks like it maxes out around 6-7 milliseconds.

Again, those are for the entire datastore on which the replication is originating (thereby the datastore where snapshots would be created and removed). Storage path and disk latency times appeared similar or lower. Are those latency times large enough to be causing this issue?
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Network applications interrupted during replication

Post by Gostev »

No, this latency is very low.

What is typically causing the issue is VM stun during snapshot commit. When you get the issue again, check the corresponding VM logs files - the stun times are labeled pretty clearly there. Note that snapshot commit in VMware may take multiple stuns, so read the full log around that time.

Generally, with your job duration I would not expect any commit issues (unless there is some storage problem), so we might as well be just chasing wild goose here. If nothing stands out in VM logs, I would recommend open a support case, as it is extremely inefficient to troubleshoot issues over forum posts.
rchew
Influencer
Posts: 20
Liked: never
Joined: Dec 16, 2009 7:02 pm
Full Name: Raymond Chew
Contact:

Re: Network applications interrupted during replication

Post by rchew »

I was wondering if a solution was found for this issue. We experience the same thing. Our Virtual Center resides at our Corporate HO and the ESX Hosts are in our branches connected via a E-10 (10mbps) link. Anytime the snapshots are deleted, the guest VM loses network connectivity for up to 30 seconds but usually 10-15 seconds.
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Network applications interrupted during replication

Post by foggy »

Raymond, generally there's nothing to fix here as short stuns of the guest VM are typical for snapshot commit process (which is controlled solely by VMware, not Veeam). You can also check this topic for additional info. Thanks.
Post Reply

Who is online

Users browsing this forum: Amazon [Bot] and 75 guests