-
- Enthusiast
- Posts: 37
- Liked: never
- Joined: May 17, 2009 7:55 pm
- Full Name: Peter Kuczynski
replication overwhelms out backup SAN
Hi everyone,
We just confirmed that this has now happened twice since we started using replication on our LAN. The backup SAN which is the replication target becomes unavailable.
Intuitively, I suspected it was simply overwhelmed by the sheer amount of replication traffic.
Here's our set up.
2 esxi4 hosts using vsphere4.
1 production iscsi san
1 backup san used for replication.
All connected over a 1 gig switch
1 2003 server for backups only not replicas
When we schedule our replication to run, eventually, the 2nd san just becomes unavailable. you can ping it, you cant ssh into it, it's a Suse box. After a reboot, all functionality returns. Last time it took 2 months for this to happen. Today I was able to do this in one day.
So, what can we do here to still take advantage of replicas? Not replicate as often? Or is there something else I need or am missing.
Thanks!
Peter
We just confirmed that this has now happened twice since we started using replication on our LAN. The backup SAN which is the replication target becomes unavailable.
Intuitively, I suspected it was simply overwhelmed by the sheer amount of replication traffic.
Here's our set up.
2 esxi4 hosts using vsphere4.
1 production iscsi san
1 backup san used for replication.
All connected over a 1 gig switch
1 2003 server for backups only not replicas
When we schedule our replication to run, eventually, the 2nd san just becomes unavailable. you can ping it, you cant ssh into it, it's a Suse box. After a reboot, all functionality returns. Last time it took 2 months for this to happen. Today I was able to do this in one day.
So, what can we do here to still take advantage of replicas? Not replicate as often? Or is there something else I need or am missing.
Thanks!
Peter
-
- VP, Product Management
- Posts: 27377
- Liked: 2799 times
- Joined: Mar 30, 2009 9:13 am
- Full Name: Vitaliy Safarov
- Contact:
Re: replication overwhelms out backup SAN
Peter,
Could you please clarify what your replication job schedule looks like. By the way, how many replication jobs are you running at the same time? If this issue confirms, I would recommend to try to upload large files using vSpere Client (Datastore Browser) to the same SAN and see if you get the same behavior.
On top of that, have you had a chance of investigating SAN box log files for any clues that might be useful for further troubleshooting? Thanks!
Could you please clarify what your replication job schedule looks like. By the way, how many replication jobs are you running at the same time? If this issue confirms, I would recommend to try to upload large files using vSpere Client (Datastore Browser) to the same SAN and see if you get the same behavior.
On top of that, have you had a chance of investigating SAN box log files for any clues that might be useful for further troubleshooting? Thanks!
-
- Enthusiast
- Posts: 37
- Liked: never
- Joined: May 17, 2009 7:55 pm
- Full Name: Peter Kuczynski
Re: replication overwhelms out backup SAN
We sequence all our servers to replicate nightly. Obviously, since this last outage caused by replication, we have stopped this.
We are thinking of only doing very conservative replicas only for those servers that have the most changes, the rest would be backups only.
We may have had a replica overlap another replica in terms of timing when scheduling them.
After the outage, and after the san was rebooted, the san vendor said they did not find anything wrong with the san.
We are thinking of only doing very conservative replicas only for those servers that have the most changes, the rest would be backups only.
We may have had a replica overlap another replica in terms of timing when scheduling them.
After the outage, and after the san was rebooted, the san vendor said they did not find anything wrong with the san.
-
- Enthusiast
- Posts: 37
- Liked: never
- Joined: May 17, 2009 7:55 pm
- Full Name: Peter Kuczynski
Re: replication overwhelms out backup SAN
Im also seeing this in each vm:
Replicating file "[Sullego-datastore-01] Axigen email_replica/Axigen email replica-flat.vmdk"
Unable to establish direct connection to the shared storage (SAN).
Please ensure that:
- HBA is properly installed in the Veeam Backup server computer, or software iSCSI initiator is configured correctly.
- SAN volume can be seen by operating system in the Windows Disk Management snap-in on the Veeam Backup server.
- Read access is allowed for the Veeam Backup server computer on the corresponding LUN (refer to your SAN documentation).
Direct SAN connection is not available, failing over to network mode...
Replicating file "[Sullego-datastore-01] Axigen email_replica/Axigen email replica-flat.vmdk"
Unable to establish direct connection to the shared storage (SAN).
Please ensure that:
- HBA is properly installed in the Veeam Backup server computer, or software iSCSI initiator is configured correctly.
- SAN volume can be seen by operating system in the Windows Disk Management snap-in on the Veeam Backup server.
- Read access is allowed for the Veeam Backup server computer on the corresponding LUN (refer to your SAN documentation).
Direct SAN connection is not available, failing over to network mode...
-
- Enthusiast
- Posts: 37
- Liked: never
- Joined: May 17, 2009 7:55 pm
- Full Name: Peter Kuczynski
Re: replication overwhelms out backup SAN
please disregard my previous entry above, I figured out the issue I had with the iscsi initiator.
I'm trying replication with LOW compression, that is using less resources on both SAN's.
What about if I used no compression?
I'm trying replication with LOW compression, that is using less resources on both SAN's.
What about if I used no compression?
-
- Enthusiast
- Posts: 37
- Liked: never
- Joined: May 17, 2009 7:55 pm
- Full Name: Peter Kuczynski
Re: replication overwhelms out backup SAN
So when I resolved my iscsi initiator issue, and the replication now workes in SAN mode and not in network mode, the resource utilization just dropped, each san is working at about .20 or .24, and not 4.5 like it was.
I think this issue is solved!
I think this issue is solved!
-
- Veteran
- Posts: 391
- Liked: 39 times
- Joined: Jun 08, 2010 2:01 pm
- Full Name: Joerg Riether
- Contact:
Re: replication overwhelms out backup SAN
i think compression level will have NO impact to san at all, it only will have impact to the b+r machine (cpu-power!!!!!). if your backup-san got unresponsive during high load, your backup-san is a bad one. sounds hard but is true.
best regards,
Joerg
best regards,
Joerg
-
- VP, Product Management
- Posts: 27377
- Liked: 2799 times
- Joined: Mar 30, 2009 9:13 am
- Full Name: Vitaliy Safarov
- Contact:
Re: replication overwhelms out backup SAN
Peter,
First of all, great troubleshooting from your side! I've thought that the issue could be connected to iSCSI initiator or SAN itself, so let's keep an eye on your SAN while replication jobs are running.
Joerg is correct, Veeam backup server (CPU) is to be affected with the compression levels, not the target SAN.
Thank you!
First of all, great troubleshooting from your side! I've thought that the issue could be connected to iSCSI initiator or SAN itself, so let's keep an eye on your SAN while replication jobs are running.
Joerg is correct, Veeam backup server (CPU) is to be affected with the compression levels, not the target SAN.
Thank you!
Who is online
Users browsing this forum: No registered users and 274 guests