As an introduction, I'm new to both Veeam and ESX. I was a Hyper-V guy from the beginning, but still learning my way around vSphere and Veeam.
I've been working with replication of a large (40TB) vm and am having problems determining either a) where my performance problems are coming from or b) why the performance is as expected.
3 x vSphere 5.5.0 servers (Essentials Plus) - 1 GB connected to LAN
1 x Coraid SRX6300 - 10GB connected to servers (We'll call this CoraidA)
DR - 1GB connected via private fiber at around 25ms
Same as above
(We'll call this storage CoraidB)
Veeam server running here
Veeam Proxy running here
Veeam replication job pulling the data from Primary to DR
We have 2 physical sites Primary and DR. DR has had the equipment, but we are just starting to make it functional. We had an issue with CoraidA at Primary, so we brought CoraidB over with the intention of migrating to it and moving CoraidA back to DR. The reasoning I'll not go into as it is a bit of a rabbit trail.
With this vm being so large, I decided I'd simply use replication and cutover. It turns out this is about a 120 hour job run for an initial replication. The thing that surprised me, though, was that we were only achieving an average of around 280MB/s and it showed a bottleneck in the target (CoraidB), which had nothing else running on it. The replication was using all default settings (Optimal compression and local target storage optimization) so Veeam recommended we go to No compression level and Local Target (16TB+ backup files) since a) the data in this vm is highly non-dedupe friendly and b) the vm is so large. Still, having the target be the bottleneck seemed bizarre. The network or source would have made far more sense.
Now, we've got CoraidA at DR and I've setup replication again. Only now, it shows the Source as the bottleneck (CoraidB) and it is moving at a whopping 5MB/s. Obviously, I have an open support ticket on the CoraidB storage since the issues look to be following it. However, it does make me wonder:
1) Do I have the replication setup as I should?
2) How do I go about pinpointing bottlenecks? What tools should I be using?
3) What do I not know here that I don't know I don't know?