Comprehensive data protection for all workloads
Post Reply
pkuczynski
Enthusiast
Posts: 37
Liked: never
Joined: May 17, 2009 7:55 pm
Full Name: Peter Kuczynski

replication overwhelms out backup SAN

Post by pkuczynski »

Hi everyone,

We just confirmed that this has now happened twice since we started using replication on our LAN. The backup SAN which is the replication target becomes unavailable.
Intuitively, I suspected it was simply overwhelmed by the sheer amount of replication traffic.

Here's our set up.
2 esxi4 hosts using vsphere4.
1 production iscsi san
1 backup san used for replication.
All connected over a 1 gig switch
1 2003 server for backups only not replicas

When we schedule our replication to run, eventually, the 2nd san just becomes unavailable. you can ping it, you cant ssh into it, it's a Suse box. After a reboot, all functionality returns. Last time it took 2 months for this to happen. Today I was able to do this in one day.

So, what can we do here to still take advantage of replicas? Not replicate as often? Or is there something else I need or am missing.

Thanks!

Peter
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2799 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: replication overwhelms out backup SAN

Post by Vitaliy S. »

Peter,

Could you please clarify what your replication job schedule looks like. By the way, how many replication jobs are you running at the same time? If this issue confirms, I would recommend to try to upload large files using vSpere Client (Datastore Browser) to the same SAN and see if you get the same behavior.

On top of that, have you had a chance of investigating SAN box log files for any clues that might be useful for further troubleshooting? Thanks!
pkuczynski
Enthusiast
Posts: 37
Liked: never
Joined: May 17, 2009 7:55 pm
Full Name: Peter Kuczynski

Re: replication overwhelms out backup SAN

Post by pkuczynski »

We sequence all our servers to replicate nightly. Obviously, since this last outage caused by replication, we have stopped this.
We are thinking of only doing very conservative replicas only for those servers that have the most changes, the rest would be backups only.

We may have had a replica overlap another replica in terms of timing when scheduling them.

After the outage, and after the san was rebooted, the san vendor said they did not find anything wrong with the san.
pkuczynski
Enthusiast
Posts: 37
Liked: never
Joined: May 17, 2009 7:55 pm
Full Name: Peter Kuczynski

Re: replication overwhelms out backup SAN

Post by pkuczynski »

Im also seeing this in each vm:
Replicating file "[Sullego-datastore-01] Axigen email_replica/Axigen email replica-flat.vmdk"
Unable to establish direct connection to the shared storage (SAN).
Please ensure that:
- HBA is properly installed in the Veeam Backup server computer, or software iSCSI initiator is configured correctly.
- SAN volume can be seen by operating system in the Windows Disk Management snap-in on the Veeam Backup server.
- Read access is allowed for the Veeam Backup server computer on the corresponding LUN (refer to your SAN documentation).

Direct SAN connection is not available, failing over to network mode...
pkuczynski
Enthusiast
Posts: 37
Liked: never
Joined: May 17, 2009 7:55 pm
Full Name: Peter Kuczynski

Re: replication overwhelms out backup SAN

Post by pkuczynski »

please disregard my previous entry above, I figured out the issue I had with the iscsi initiator.

I'm trying replication with LOW compression, that is using less resources on both SAN's.

What about if I used no compression?
pkuczynski
Enthusiast
Posts: 37
Liked: never
Joined: May 17, 2009 7:55 pm
Full Name: Peter Kuczynski

Re: replication overwhelms out backup SAN

Post by pkuczynski »

So when I resolved my iscsi initiator issue, and the replication now workes in SAN mode and not in network mode, the resource utilization just dropped, each san is working at about .20 or .24, and not 4.5 like it was.
I think this issue is solved!
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: replication overwhelms out backup SAN

Post by joergr »

i think compression level will have NO impact to san at all, it only will have impact to the b+r machine (cpu-power!!!!!). if your backup-san got unresponsive during high load, your backup-san is a bad one. sounds hard but is true.

best regards,
Joerg
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2799 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: replication overwhelms out backup SAN

Post by Vitaliy S. »

Peter,

First of all, great troubleshooting from your side! I've thought that the issue could be connected to iSCSI initiator or SAN itself, so let's keep an eye on your SAN while replication jobs are running.

Joerg is correct, Veeam backup server (CPU) is to be affected with the compression levels, not the target SAN.

Thank you!
Post Reply

Who is online

Users browsing this forum: No registered users and 274 guests