Comprehensive data protection for all workloads
Post Reply
sbk
Enthusiast
Posts: 32
Liked: 2 times
Joined: Apr 06, 2017 2:03 pm
Full Name: Shawn Krawczyk
Contact:

Advice on achieving optimal replication performance

Post by sbk »

I have a 2 GB fiber WAN connection at both my primary and DR site and I am attempting to utilize it to its full potential (I need to move more data, faster). Up until recently, it was a 1GB on each end. I am able to achieve around 600/700 Mbps on a nightly basis, however many jobs are still running during business hours and when larger datasets need to be replicated it can take a week + to complete (for a 6 – 10TB server). Replications are sourced from backups. The backups are on physical repository servers connected to Nimble storage arrays (HF60). In the destination, there are physical proxy servers that connect to another Nimble storage array (also an HF60). The internet lines are dedicated to backups, connected via. IPSEC site-to-site VPN tunnel with all advanced security features disabled (basically an any any rule so not traffic is inspected or stepped on in any way).

I have a global network traffic rule between the sites – but I currently have that set at 2600 Mbps (I don’t want the traffic throttled right now). I have 89 replications jobs currently, some are a 1 for 1 (1 server per job) and others are many server per job. The 3 physical proxy servers in the destination (DR) are capable of doing 32 tasks per proxy, but I have them currently set at 16 each. I keep trying to change the “Use multiple upload streams per job” and can’t seem to find a sweet spot.. Increasing, decreasing or turning it off… There is a message that says to disable this option if you are running a large number of concurrent jobs – what is considered a large number?

Most replications jobs report the Target or Network as the bottle neck. The Network makes more sense to me than the Target but in either case neither are being taxed. WAN is well below its available bandwidth, and target storage array is not reporting latency or anything else (its just about idle…). The target storage array in our DR is the same as our primary storage for production in main site.

We are not using any WAN accelerators. We tried using them years ago and ended up abandoning them due to the amount of data we replicate on a nightly bases – and the amount of cache we needed didn’t make sense. We have MANY servers that are in the 5-10 TB range and some way bigger then that, so again I’m still not sure if we could make a WAN accelerator work but I would be willing to give it a try again if that’s what’s needed.

Any idea advice in order to achieve better performance?
Has anyone come up with a setup that works really well?
Is a virtual proxy better in the destination vs. physical so hotadd can be utilized?
Budget or resources isn’t an issue, I can deploy whatever is needed. Going into 2024 I need to be replicating even more data but I can hardly replicate what I have now.

Case # 06352917
HannesK
Product Manager
Posts: 14322
Liked: 2890 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Advice on achieving optimal replication performance

Post by HannesK »

Hello,
what performance to you get outside Veeam? iperf results would be helpful and which settings were used. What latency do you have? Usually latency is the biggest challenge.

At the destination, HotAdd is usually the fastest. Having 1 job per machine sounds overkill. For 89 machines I would probably go for 1-2 replication jobs. Adding more tasks can help.

WAN accelerator in high bandwidth mode (did not exist probably when you tried it): the wizard says "up to 1Gbit/s". I heard customers using multiple in parallel but that was for the low bandwidth mode even more years ago. Maybe worth a try. Changing the "use multiple upload streams per job" can have positive impact depending on the latency.

Bottleneck: is the network always at 98%+, or just some random values between 60-90% or even lower?

Summary: there is no "golden rule" that fits for everything.

Best regards,
Hannes
sbk
Enthusiast
Posts: 32
Liked: 2 times
Joined: Apr 06, 2017 2:03 pm
Full Name: Shawn Krawczyk
Contact:

Re: Advice on achieving optimal replication performance

Post by sbk »

I did do some iperf testing from source proxy to destination proxy while a lot of replications jobs where already running and between the already running jobs plus the iperf test it was peaking around 1.2 Gbps. If the WAN was the bottle neck I would expect the "network" to be listed as the bottleneck in the job performance but I am mostly seeing "target".

Thank you for the hotadd suggestion, I may try that out to at least compare. We moved away from having virtual proxies due to the load it ended up putting on the VMware environment and its been night and day difference at least for backups using a physical server with direct storage access.

One job per machine isn't overkill when the servers are anywhere from 5-10 TB in size and have a high change rate. Having many large servers in a single job creates numerous problems, at least in my experience/current environment. If we are talking about smaller servers (2 TB or less), a many servers to one job setup works excellent.

You are correct, there was no high bandwidth mode before. Certainly worth looking into, I would just need to run the numbers in terms of needed cache size compared to our change rate to see what kind of proxy we would need.

In regards to your summary, I do understand that. Just too many variables between environments so perhaps what I should be asking instead:
If someone out there is moving replicas over a WAN at greater rates then 1 Gbps I would be interested in hearing the high level details of your setup!
Post Reply

Who is online

Users browsing this forum: No registered users and 85 guests