Comprehensive data protection for all workloads
Post Reply
gopher_49
Enthusiast
Posts: 39
Liked: never
Joined: Jun 26, 2010 2:27 pm
Full Name: chris h
Contact:

recovering from network outage during a replication job

Post by gopher_49 »

I'll be performing the first full replica locally. Then changing the IP address on the ESXi host and placing the ESXi host offsite. I already have a site to site VPN tunnel established between the two sites. The local site has a IP range of 192.168.1.0 and the remote side has an IP range of 192.168.3.0. Replicas from then on will be done over a 2.25 mbps connection. I actually get 2.25 mbps. Now, what happens if the remote side drops during a replica job... I've seen this happen in the past and the VM disks show to be using snapshots. They show to be using snapshot files until I have run a batch file to stop the hung job. I get nervous when my VM disks are running on snapshot files during productino hours. How can we simplify from a network crash like a DSL connection dropping? What steps can I take to get my VM's in a normal state again? As I understand having the disks pointing to snapshot files is not a good thing if I reboot the VM for it will run off of the snapshot, right?! My remote site will be using a DSL connectino which will having issues every once and awhile.
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: recovering from network outage during a replication job

Post by Gostev »

Hello Chris, I am not 100% sure what you are asking, so I will try to answer to my best understanding of your question.

If you terminate the running job in anyway, then you are not giving Veeam Backup a chance to delete its snapshot. In that case, the snapshot must be deleted manually. To do that, you should open vSphere Client, and delete the Veeam snapshot. All data from snapshot will be commited back to VMDK.

Alternatively, you can wait for the next incremental job run - Veeam Backup will detect that the snapshot is still present, and will clean it up before proceeding further. Does this anwer your question.
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: recovering from network outage during a replication job

Post by Vitaliy S. »

Chris,

I've just thought of an idea why not to use Veeam Monitor 5.0 to alert on the opened/orphaned snaphosts growth. Besides you may trigger a custom script that will remove/consolidate existing snapshots to a VM automatically as soon as the alert is risen. This post alert script option can also be configured with Veeam Monitor application.

Just my two cents.
gopher_49
Enthusiast
Posts: 39
Liked: never
Joined: Jun 26, 2010 2:27 pm
Full Name: chris h
Contact:

Re: recovering from network outage during a replication job

Post by gopher_49 »

Both of your replies help me alot... What is the impact on a VM is it's virtual disks have snapshot assigned to them? I'll be replicated across a DSL connection... It will fail sometimes... When it fails the VM being replicated will have snapshots assigned to it's virtual disks. How does this affect the operation of the VM? Also, what if the VM reboots? Will the VM mount the snapshot?
Gostev
Chief Product Officer
Posts: 31460
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: recovering from network outage during a replication job

Post by Gostev »

No impact whatsover - it is completely transparent. Just watch the disk space on VM datastore. Snapshot can potentially grow to the same size as original VMDK (that is, when every single VMDK block is changed by running VM, which is of course not realistic).
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: recovering from network outage during a replication job

Post by tsightler »

Well, it's not quite fair to say there is "no impact whatsoever". VMware snapshots do have a measurable impact on the performance of I/O in a VM, and, if you're using SAN storage, can actually impact the performance of the entire cluster because of the additional SCSI reservations and the significant increase in IOPS that are required when there is an active snapshot on a VM, but this is probably not a major issue except for VM's with moderate to high I/O requirements. See http://www.vmdamentals.com/?p=332 for a more detailed description than I can provide. My real world observations are pretty much identical to their test results.
gopher_49
Enthusiast
Posts: 39
Liked: never
Joined: Jun 26, 2010 2:27 pm
Full Name: chris h
Contact:

Re: recovering from network outage during a replication job

Post by gopher_49 »

So.. If my replica job starts at 11:59PM on Monday and at 2:00am Tuesday the DSL connection drops at that moment in time my virtual disks are set to run on snapshots... The .vmx file shows each virtual disk pointing to a snapshot file and in the vSphere client each virtual disk shows the snapshot file being associated to it.. Even with this said I have all the time in the world to simply resume the replica job OR cancel the job then consolidate/delete remaining snapshots, correct?
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: recovering from network outage during a replication job

Post by Vitaliy S. »

Chris,

If your replication job fails due to DSL connection drop and you have an open snapshot running, you will need to remove it manually or wait till the second job run removes it. There is no resume option for backup/replication jobs.
cag
Enthusiast
Posts: 74
Liked: never
Joined: Mar 26, 2011 4:02 am
Full Name: Conrad Gotzmann
Contact:

[MERGED] Replication should continue after failed replicatio

Post by cag »

Feature request !!!!

If find the recovery of the replication jobs very annoying. First that the replication job fails after 13.5 hours. Second it cannot pickup from where it left off. 3rd I need to clean up the mess it left behind. I have added a new disk to a vm and I am replicating to the DR site. Its a additional 80GB VM. After 3 attempts to add this disk I give up. The error is a simple client timeout message. Please recover and continue where you left off. Dont give me a "annoying"

Cannot complete the operation because the file or folder
2011_6.vmdk already exists. Of course it does, if the program checked the last replication it would know it was from a failed attempt. Continue on !!!. or at least check the logs and clean up.

Is this asking to much !.

Is there some settings that make veeam better at recovery from errors. I would like a set it and forget it config.
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: recovering from network outage during a replication job

Post by Vitaliy S. »

There is no such setting, but thanks for the feedback!
bkc
Influencer
Posts: 22
Liked: 3 times
Joined: Dec 21, 2010 10:31 pm
Full Name: brad clements

Re: recovering from network outage during a replication job

Post by bkc »

We have a similar issue, temporary outage on the internet drops the vpn, replication job gets killed

on the next retry, the job fails with:

Code: Select all

Preparing replica VM Error: Detected an invalid snapshot configuration.
Error: Detected an invalid snapshot configuration.	
so far manually removing a snapshot on the replica vm hasn't fixed this problem.

I am opening a support case for this issue, it's a pain to cleanup after replication failures.

also I would like replication jobs to have a 'network retries' option. I realize it's not as simple as that as some vmware commands are atomic (eg. create snapshot), but the software neesd to be a lot more resilient to 1 minute network outages in a 15 hour replica job.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], tyler.jurgens and 257 guests