Bottleneck Confusion

B.F. · Post by **B.F.** » Sep 11, 2017 6:06 pm this post

Here's the layout.

Veeam Server has a large RAID 5 Disk via SAS physical connection.

There are 2 jobs in this Veeam Server scenario:
Job 1 Backs up a large VM to the RAID listed above
Job 2 Replicates the recently backed up VM to a secondary site based off of the backup in Job 1

Job 1 Bottleneck shows that the Target (the SAS RAID 5) is at 0%
Job 2 Bottleneck shows the Source (same SAS RAID 5) is at 75%

Based off this data, Veeam has to work much harder to read the data than to write the data on the same media?

Is this normal?

Thanks

Post by **foggy** » Sep 11, 2017 8:13 pm this post

Bottleneck distribution depends on the other parts of the processing chain. In your case, backup job target spends most of the time waiting for other components in the source-proxy-network-target chain, while the same array is the weakest part in the backup copy job processing chain.

Post by **tsightler** » Sep 11, 2017 8:24 pm this post

It's also very important to note that this is nearly 100% to be expected. Veeam is measuring how much time it spent "waiting" for reads or writes and typically writes are lower latency than reads due to write-caching on the RAID controller which means that the writes can be acknowledged nearly immediately (as soon as it's in cache), while reads take whatever time is required to actually retrieve the blocks from the spinning disk.

Not only that, but as foggy properly points out, most likely the backup chain had the bottleneck somewhere else, which kept the writes to the RAID disk from being the point where Veeam was waiting.

B.F. · Post by **B.F.** » Sep 11, 2017 8:31 pm this post

Does this seem normal for such a vast difference for the array if it's writing vs reading?

Perhaps I should have listed the full chains for each

Job 1: Source 68% > Proxy 82% > Network 39% > Target 0%
Job 2: Source 75% > Proxy 5% > Network 34% > Target 64%

Job 2 is the one I'm trying to figure out if there are any performance gains to be made. I have a couple things I'm trying for the Target. I'm guessing since the array is directly connected to the Veeam server as the source, there isn't much I can do?

Thanks!

Sep 11, 2017 11:44 pm

Yes, totally normal. In the first instance the data is being read from the source so there's a good amount of wait there, but then it's compressed and deduped, thus most wait is for CPU on the proxy. By the time the data stream is being written to the target it's compressed/deduped, so there's a lot less data being written to the target that being read and processed by the source/proxy.

For a backup copy the pattern is completely different, the amount of data read is the same as the amount of data written, so I'd expect the source to be the bottleneck, while the proxy is barely used (data is already compressed/deduped) so the bottlenecks are just read/write speeds of the two sides, with the read almost always being slightly slower than the writes. Those numbers look exactly like what would be expected.

B.F. · Post by **B.F.** » Sep 12, 2017 2:25 pm this post

Wow, very insightful information!

If one were to want to try and improve performance in Job 1, looks like I need to beef up the Proxy?

The big thing I'm trying to improve on is Job 2. In the past we would get "Processing Rate" in the mid teens. Ever since we upgraded to a new Dell Compellent disk system, we are only getting around 7 MB/s processing rate. The bottleneck stats for the Source had not changed but the Target was in the 50's range and then jumped up to almost equal to the Source. I have been tweaking things here and there and got the Target at least in the 60's now.

If anyone might have some suggestions on how to improve Job 2, would be greatly appreciated.

Thanks

Post by **foggy** » Sep 12, 2017 4:31 pm this post

B.F. wrote:If one were to want to try and improve performance in Job 1, looks like I need to beef up the Proxy?

Correct.

B.F. wrote:The big thing I'm trying to improve on is Job 2. In the past we would get "Processing Rate" in the mid teens. Ever since we upgraded to a new Dell Compellent disk system, we are only getting around 7 MB/s processing rate. The bottleneck stats for the Source had not changed but the Target was in the 50's range and then jumped up to almost equal to the Source. I have been tweaking things here and there and got the Target at least in the 60's now.

Have you played with the block size (storage optimization settings in the job)?

B.F. · Post by **B.F.** » Sep 14, 2017 8:59 pm this post

foggy wrote:Have you played with the block size (storage optimization settings in the job)?

Hmm, not sure where these job settings are that you speak of.

Tell me more.

Thanks

Post by **foggy** » Sep 15, 2017 5:38 pm this post

I'm talking about these settings.

B.F. · Post by **B.F.** » Sep 20, 2017 7:40 pm this post

Ahh, these settings look like they are for backups and not replication. Job 2 is replication. The Job 1, which is a backup, has all the same settings as listed in your link except it's not encrypted since it's on premises.

Would changing the Transport Mode on the Target Proxy for Job 2 help at all? Stumbled on an older article about how they changing it from Automatic selection to Virtual appliance helped the throughput tremendously. However it looks like their scenario was for backup as well and not replication.

It was also suggested on Reddit that Automatic looks at what works and not necessarily what is fastest?

Thanks

Post by **foggy** » Sep 21, 2017 11:32 am this post

You can find the same setting in the replication job. Changing transport mode for the target proxy will not help in your case, due to the fact the bottleneck is on source.

B.F. wrote:It was also suggested on Reddit that Automatic looks at what works and not necessarily what is fastest?

This is not correct, automatic selects the most optimal from the available transport modes.

B.F. · Post by **B.F.** » Sep 21, 2017 2:48 pm this post

foggy wrote:You can find the same setting in the replication job.

Well this is interesting. When I go into the Replication Job Settings, Advanced Settings, I do not have all the options that it shows in the above link's pic. What I see under the Traffic tab:

Data Reduction
-Exclude swap file blocks

Compression Level
-<drop down>

That is it. We are on v9.5.0.1038

I also don't see "Guest Processing" as an option on my Replication setting's left column like I do on the pic either.

Please advise.
Thanks

Post by **foggy** » Sep 26, 2017 10:50 am this post

Ok, now I see that you're replicating from backup rather than running backup copy job as I initially assumed. Replication from backup job doesn't have these settings, since it is using the same block size as the source backup job.

B.F. · Post by **B.F.** » Sep 26, 2017 3:54 pm this post

From your standpoint, there is nothing else we can do to improve Job 2 performance rate?

Thanks

Post by **foggy** » Sep 26, 2017 8:53 pm this post

It's all about the storage read performance, so anything you could tweak in this regard could help.

B.F. · Post by **B.F.** » Sep 28, 2017 3:48 pm this post

OK, not trying to beat the horse but just trying to fully understand all the behind the scenes things that are going on.

I did another test where I have a test VM at the DR site. I then created a VMDK that is on the same iSCSI storage where the replica is dumped. The VMDK is mounted in my test VM as E:\

I ran 11 LAN Speed Test runs to E:\ using 1gb of data. I ran the test from the same Veeam server at the main site that the replication job runs from. The average Read is 63 MB/s and 46 MB/s Write. That is way more than what I'm seeing on Veeam. As I type this, there is a replication job running 87% complete and only getting 5MB/s processing rate.

Why such a large discrepancy?

Thanks

Post by **foggy** » Sep 29, 2017 3:21 pm this post

Veeam B&R not just sequentially reads data from the source repository (like the test does), but needs to copy only changed blocks, which makes it (slower) random reads.

B.F. · Post by **B.F.** » Sep 29, 2017 7:35 pm this post

Ah, that makes sense. So if only changed blocks are figured out and read from the source, the writing of the replica to the destination should only need to write what was pulled from the source correct? There really shouldn't be much of a writing overhead then. Wouldn't the replica then be written in sequence?

Thanks!

Post by **foggy** » Oct 01, 2017 9:07 pm this post

That's correct, but you should pay attention at your source, since it is the bottleneck in this case.

R&D Forums

Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Re: Bottleneck Confusion

Who is online