Discussions specific to the VMware vSphere hypervisor
B.F.
Expert
Posts: 138
Liked: 7 times
Joined: Jan 28, 2014 5:41 pm
Contact:

New Dell Compellent = slower replication?

Post by B.F. » Aug 25, 2017 4:26 pm

We recently installed new Dell Compellent disk storage systems in both our main and secondary sites. Prior to Compellent, we have been using an old HP MSA2000 where each of the ESXi hosts were directly connected via SAS cables. Now with Compellent, we have a dedicated 10gb switch at each site where all the local ESXi hosts communicate with the site's new Dell storage via iSCSI. We still have the HP MSA installed as well for the time being.

One of our main VM's that we replicate from the main to secondary site is our file server. The file server is around 3tb total. Shortly after the Compellent installation, we migrated the file server's replica in the secondary site from the HP MSA to the Dell storage.

Ever since we have done this, we have noticed that the replication time to complete has increased. Looking at the bottleneck history, I can see that the target percentage has increase 13 - 25 percentage points. All the other pieces of the bottleneck chain remains relatively the same. The communication between the sites has not changed and is reflected by the consistent 27% (+-2%) in the bottleneck logs. The performance rate logs from EM show that before Dell, it was around 13 MB/s. Now we are getting around 6 MB/s

We have confirmed that the replication does not overlap with the Compellent Data Progression or it's built in Snap Shot schedule. I did discover that the Server OS preference setting on each of the Dell systems was set to Other Single Path. I have since changed them to VMware ESXi 6.0. No change. Compellent does not have deduplication as far as I'm aware either.

Please advice and thanks!

foggy
Veeam Software
Posts: 16822
Liked: 1359 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: New Dell Compellent = slower replication?

Post by foggy » Aug 25, 2017 5:22 pm

Am I getting right, that the primary bottleneck for this replication job is target? What transport modes are used by the source and target proxy servers?

B.F.
Expert
Posts: 138
Liked: 7 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: New Dell Compellent = slower replication?

Post by B.F. » Aug 25, 2017 7:15 pm

foggy wrote:...the primary bottleneck for this replication job is target?
Actually the bottleneck seems to ping pong between the source and the target now. The source has always been 77% (+- of course) and the target was 50% - 55%. Now the target is 68% - 80%.
foggy wrote:What transport modes are used by the source and target proxy servers?
Not certain I understand what you are asking.
Compression level = Optimal (recommended)
Exclude swap file blocks = checked
Enable VMware Tools quiescence = unchecked
Use changed block tracking data = checked
Enable CBT for all protected VMs automatically = checked
Data Transfer = Direct (only option available)
Target Proxy = Automatic selection (we do have a proxy setup at each site)

Does the above answer your question?

Thanks

DaveWatkins
Expert
Posts: 320
Liked: 85 times
Joined: Dec 13, 2015 11:33 pm
Contact:

Re: New Dell Compellent = slower replication?

Post by DaveWatkins » Aug 27, 2017 3:41 am

You talk about pathing, have you set the ESX hosts to use round robin for the new iSCSI LUN's?

Jumbo frames setup for the iSCSI traffic?

foggy
Veeam Software
Posts: 16822
Liked: 1359 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: New Dell Compellent = slower replication?

Post by foggy » Aug 28, 2017 3:13 pm

I was talking about the transport mode used to populate the target datastore. You can look it up in the job session window, if you select the particular VM in the left pane and locate the proxy server selected for processing to the right.

B.F.
Expert
Posts: 138
Liked: 7 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: New Dell Compellent = slower replication?

Post by B.F. » Aug 29, 2017 1:54 pm

DaveWatkins wrote:You talk about pathing, have you set the ESX hosts to use round robin for the new iSCSI LUN's?

Jumbo frames setup for the iSCSI traffic?
Looking at the "Path Selection Policy", it looks like they are set for "Most Recently Used" and not "Round Robin".

I'm also seeing that the vSwitches are set for 9000 MTU. However, we must of forgotten to adjust the NIC's themselves since I see they are still set to 1500 MTU.

Is it ok to make these type of changes on a live system or do I need to vmotion VM's off, reboot hosts, etc?

Thanks


PS: We are still pretty new to iSCSI come from a direct SAS connection environment.

B.F.
Expert
Posts: 138
Liked: 7 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: New Dell Compellent = slower replication?

Post by B.F. » Aug 29, 2017 4:52 pm

foggy wrote:I was talking about the transport mode used to populate the target datastore. You can look it up in the job session window, if you select the particular VM in the left pane and locate the proxy server selected for processing to the right.
The Transport Mode for the Proxy is "Automatic Selection"

Should it be something else?

Thanks

foggy
Veeam Software
Posts: 16822
Liked: 1359 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: New Dell Compellent = slower replication?

Post by foggy » Aug 29, 2017 4:59 pm

I mean the transport mode effectively selected by the proxy server during VM processing. You can look it up in the job session log, as I've described above.

B.F.
Expert
Posts: 138
Liked: 7 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: New Dell Compellent = slower replication?

Post by B.F. » Aug 29, 2017 6:19 pm

Sorry foggy, I'm not sure I'm finding what you are asking.

Here's where I'm going
  • Backup and Replication
    Jobs
    Then select the job on the right
    I then select the VM on the left, as described, in the bottom pane.
I then see the following Actions show up
  • Replicating restore point....
    Queued for processing....
    Required Backup infrastructure resources have been assigned
    VM processing started...
    VM size...
    Discovering replica VM
    Preparing replica VM
    Processing configuration
    Creating helper snapshot
    Using target proxy <name> for disk Hard disk 2 [hotadd]
    Hard disk 2 ...read at 5 MB/s
    Using target proxy <name> for disk Hard disk 1 [hotadd]
    --- Continues this for all the disks for the VM ---
    Deleting helper snapshot
    Finalizing
    Busy: Source 76% > Proxy.....
    Primary bottle neck....
    Network traffic verification detected no corrupted blocks
    Process finished...
Not seeing a Transport method so I must be in the wrong place. :(

foggy
Veeam Software
Posts: 16822
Liked: 1359 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: New Dell Compellent = slower replication?

Post by foggy » Aug 29, 2017 9:24 pm

Hotadd is the transport method in this case. In case target is the primary bottleneck, looks like the target storage is the issue.

B.F.
Expert
Posts: 138
Liked: 7 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: New Dell Compellent = slower replication?

Post by B.F. » Oct 30, 2017 6:45 pm

I have a couple followup questions on this topic.

1. What does "Processing rate" actually mean? I see that the Throughput Speed can be quite a bit faster than the processing rate.

2. When there is an incremental replication going on, is there a lot of random searches and writes at the destination? Mostly curious if there is a lot of comparing going on between the latest source and what's at the destination.

3. What is the path of the data during offsite replication? Does it go through vCenter at all? My assumption is Source -> local Veeam Server -> WAN -> Offsite Proxy -> Destination.

Thanks
Brendan

DGrinev
Veeam Software
Posts: 1219
Liked: 128 times
Joined: Dec 01, 2016 3:49 pm
Full Name: Dmitry Grinev
Location: St.Petersburg
Contact:

Re: New Dell Compellent = slower replication?

Post by DGrinev » Oct 31, 2017 5:20 pm

Hi,

1. I'd recommend you review a thread called "Interpreting real-time statistics" you'll find detailed descriptions of every stat used by jobs.
2. All data changes since the last job run are written to the snapshot delta file, and the snapshot delta file acts as a restore point. The more details you'll find in "Replication chain".
3. The Off-site replication data flow looks like: Source -> Source Proxy -> WAN -> Offsite Proxy -> Destination. Also, you can see it in the UG article "Replication Scenarios". Thanks!

B.F.
Expert
Posts: 138
Liked: 7 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: New Dell Compellent = slower replication?

Post by B.F. » Dec 06, 2017 3:15 pm

After a long process of speaking with Dell and Veeam on this issue, we have FINALLY got the problem resolved.

The target proxy transport mode was set to "Automatic". When the replication ran, it would choose to use HotAdd. If we changed the transport mode to "Network", it then would show NBD instead of HotAdd. Result? HotAdd was getting between 4 -10 MB/s. NBD is getting 34 - 50 MB/s! Huge improvement :!:

Thought I'd share our findings in case others are noticing similar throughput issues with replication.

Thanks

DGrinev
Veeam Software
Posts: 1219
Liked: 128 times
Joined: Dec 01, 2016 3:49 pm
Full Name: Dmitry Grinev
Location: St.Petersburg
Contact:

Re: New Dell Compellent = slower replication?

Post by DGrinev » Dec 07, 2017 10:05 am

Hi B.F.,

Thank you for following up, that's could be useful for further readers.
Also, I would recommend you to read this post by Tom, that explains in depth the difference between Hotadd and Network mode processes. Thanks!

Mawdo@LMH
Lurker
Posts: 1
Liked: never
Joined: Feb 06, 2018 9:32 pm
Full Name: Paul Mawdsley
Contact:

Re: New Dell Compellent = slower replication?

Post by Mawdo@LMH » Feb 06, 2018 9:47 pm

Looking at the "Path Selection Policy", it looks like they are set for "Most Recently Used" and not "Round Robin".
Just a note (and an apology for resurrecting a thread...)

We have experienced some VERY serious outages with the MPIO set to "Most Recently Used" as is default. I would recommend, if not already done, you set this to "Round Robin" ASAP. We have ... NOW :evil: Please contact Dell support if unsure.

It was a very annoying day off for me when an MRU pathed volume went offline with ~50% of our servers on it... Took 3 days to get SharePoint behaving again.

Post Reply

Who is online

Users browsing this forum: No registered users and 23 guests