Comprehensive data protection for all workloads
Post Reply
fconil
Influencer
Posts: 13
Liked: never
Joined: Nov 24, 2009 12:43 am
Full Name: Francois Conil
Contact:

Replica hung after network connectivity issues

Post by fconil » Jul 21, 2010 5:17 am

Hi,

We are running a replica over WAN (2Mbps dedicated link). It usually runs fine (completes in an odd 3-4h), but last week it failed mid way due to a timeout.

I didn't notice it earlier and the retry kicked in.

Now the replica job is stuck at 40% and "checking for previous backups".
I can't stop the job, and I have up to 10 delta files for a given disk (although vsphere is only showing two chained snapshots, consolidate helper and veeam backup), which makes me think more than twice before attempting anything rash like restarting the backup server or the target.

Is there a way to stop the job and consolidate my disks without panning my virtual server completely? As the need for offsite replicas might suggest, this is quite a critical server. There has been no activity on the target server (replica site) since the connection died mid way.

Gostev
SVP, Product Management
Posts: 24939
Liked: 3622 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Replica hung after network connectivity issues

Post by Gostev » Jul 21, 2010 8:45 am

Hello Francois, it would be best to work with our technical support directly on issues like this. They will be able to understand what the job is doing after reviewing the logs, and recommend the best course of actions.

Generally, restarting Veeam server in any situation should not cause any issues to original VMs, as Veeam Backup performs read-only access to production storage, reading the snapshot data. Of course, this may leave the snapshots behind as Veeam does not get a chance to issue command to remove them - but you should be able to remove them manually.

Also, can you let me know if you are running the latest Veeam Backup version? I believe that the issue with hung replciation due to network drop/timeout, while existed before, was fixed in one of the more recent releases.

fconil
Influencer
Posts: 13
Liked: never
Joined: Nov 24, 2009 12:43 am
Full Name: Francois Conil
Contact:

Re: Replica hung after network connectivity issues

Post by fconil » Jul 21, 2010 9:29 am

We're running 4.0.

There has been no write access to the destination server since the network failure.

No creation or deletion of snapshots since the incident.

Gostev
SVP, Product Management
Posts: 24939
Liked: 3622 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Replica hung after network connectivity issues

Post by Gostev » Jul 21, 2010 9:32 am

That's what I thought... this issue is listed as resolved in version 4.1 released October last year.
http://www.veeam.com/files/release_note ... _notes.pdf
• Replication and backup jobs hang if SSH connection to the target server drops.

fconil
Influencer
Posts: 13
Liked: never
Joined: Nov 24, 2009 12:43 am
Full Name: Francois Conil
Contact:

Re: Replica hung after network connectivity issues

Post by fconil » Jul 21, 2010 9:52 am

That seems spot on.

Here is the last lines of my job log (no modification since job failure)

Code: Select all

16.07.2010 01:02:07] <17> Info    (Server) Service output: [2010-07-16 01:02:07.348 04772 info 'App'] Successfully released all resources.\n
[16.07.2010 01:02:07] <17> Info    (Server) Service output: [2010-07-16 01:02:07.364 04772 trivia 'SOAP'] Sending soap request to [TCP:127.0.0.1:443]: logout\n
[16.07.2010 01:02:07] <05> Info    (Server) Service error: An existing connection was forcibly closed by the remote host
[16.07.2010 01:02:07] <05> Info    (Server) Service error: --tr:Cannot write data to the socket. Data size: [1048576].
[16.07.2010 01:02:07] <05> Info    (Server) Service error: --tr:Failed to serialize data area. ID: [289]. Offset: [2336227328].
[16.07.2010 01:02:07] <05> Info    (Server) Service error: --tr:Failed to send next file block. Block identity: [Data block. Start offset: [2336227328], Length: [1048576], Area ID: [289].].
[16.07.2010 01:02:07] <05> Info    (Server) Service error: --tr:Unable to asynchronously write data block. Block identity: [Data block. Start offset: [2336227328], Length: [1048576], Area ID: [289].].
[16.07.2010 01:02:07] <05> Info    (Server) Service error: --tr:Processing of asynchronous write requests has failed. Output file: [File blocks transmission channel (sender).].
[16.07.2010 01:02:07] <05> Info    (Server) Service error: --tr:Failed to process conveyored task.
[16.07.2010 01:02:07] <05> Info    (Server) Service error: --tr:FIB uploader: Unable to upload FIB. FIB path: [BLOCKS_READER: DISK=VDDK:[disk] server/server.vmx, CTK=VSPHERE_CTK://viConn=127.0.0.1/VM=vm-2031/Snapshot=snapshot-3163].
[16.07.2010 01:02:07] <05> Info    (Server) Service error: Failed to process VM disk backup. VMDK path: [vddk://<vddkConnSpec><viConn name="127.0.0.1" authdPort="443" vicPort="443" /><vmxPath vmRef="vm-2031" datacenterRef="datacenter-2" datacenterInventoryPath="Datacenter" snapshotRef="snapshot-3163" datastoreName="disk" path="server/server.vmx" /><vmdkPath datastoreName="disk" path="server/server-000007.vmdk" /><transports seq="san;nbd" /><readBuffer size="2097152" /></vddkConnSpec>].
[16.07.2010 01:02:08] <04> Info    (Server) Service: closed
[16.07.2010 01:02:25] <04> Info  [Ssh] Connection::Error, Error: An existing connection was forcibly closed by the remote host, Message: An existing connection was forcibly closed by the remote host
[16.07.2010 01:02:28] <32> Warning  SSH2 WatchDog is stopped: An existing connection was forcibly closed by the remote host

Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 12 guests