Host-based backup of VMware vSphere VMs.
Post Reply
B.F.
Expert
Posts: 160
Liked: 9 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Bottleneck Confusion

Post by B.F. »

Here's the layout.

Veeam Server has a large RAID 5 Disk via SAS physical connection.

There are 2 jobs in this Veeam Server scenario:
Job 1 Backs up a large VM to the RAID listed above
Job 2 Replicates the recently backed up VM to a secondary site based off of the backup in Job 1


Job 1 Bottleneck shows that the Target (the SAS RAID 5) is at 0%
Job 2 Bottleneck shows the Source (same SAS RAID 5) is at 75%

Based off this data, Veeam has to work much harder to read the data than to write the data on the same media?

Is this normal?

Thanks
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Bottleneck Confusion

Post by foggy »

Bottleneck distribution depends on the other parts of the processing chain. In your case, backup job target spends most of the time waiting for other components in the source-proxy-network-target chain, while the same array is the weakest part in the backup copy job processing chain.
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Bottleneck Confusion

Post by tsightler »

It's also very important to note that this is nearly 100% to be expected. Veeam is measuring how much time it spent "waiting" for reads or writes and typically writes are lower latency than reads due to write-caching on the RAID controller which means that the writes can be acknowledged nearly immediately (as soon as it's in cache), while reads take whatever time is required to actually retrieve the blocks from the spinning disk.

Not only that, but as foggy properly points out, most likely the backup chain had the bottleneck somewhere else, which kept the writes to the RAID disk from being the point where Veeam was waiting.
B.F.
Expert
Posts: 160
Liked: 9 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: Bottleneck Confusion

Post by B.F. »

Does this seem normal for such a vast difference for the array if it's writing vs reading?

Perhaps I should have listed the full chains for each

Job 1: Source 68% > Proxy 82% > Network 39% > Target 0%
Job 2: Source 75% > Proxy 5% > Network 34% > Target 64%

Job 2 is the one I'm trying to figure out if there are any performance gains to be made. I have a couple things I'm trying for the Target. I'm guessing since the array is directly connected to the Veeam server as the source, there isn't much I can do?

Thanks!
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Bottleneck Confusion

Post by tsightler » 2 people like this post

Yes, totally normal. In the first instance the data is being read from the source so there's a good amount of wait there, but then it's compressed and deduped, thus most wait is for CPU on the proxy. By the time the data stream is being written to the target it's compressed/deduped, so there's a lot less data being written to the target that being read and processed by the source/proxy.

For a backup copy the pattern is completely different, the amount of data read is the same as the amount of data written, so I'd expect the source to be the bottleneck, while the proxy is barely used (data is already compressed/deduped) so the bottlenecks are just read/write speeds of the two sides, with the read almost always being slightly slower than the writes. Those numbers look exactly like what would be expected.
B.F.
Expert
Posts: 160
Liked: 9 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: Bottleneck Confusion

Post by B.F. »

Wow, very insightful information!

If one were to want to try and improve performance in Job 1, looks like I need to beef up the Proxy?

The big thing I'm trying to improve on is Job 2. In the past we would get "Processing Rate" in the mid teens. Ever since we upgraded to a new Dell Compellent disk system, we are only getting around 7 MB/s processing rate. The bottleneck stats for the Source had not changed but the Target was in the 50's range and then jumped up to almost equal to the Source. I have been tweaking things here and there and got the Target at least in the 60's now.

If anyone might have some suggestions on how to improve Job 2, would be greatly appreciated.

Thanks
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Bottleneck Confusion

Post by foggy »

B.F. wrote:If one were to want to try and improve performance in Job 1, looks like I need to beef up the Proxy?
Correct.
B.F. wrote:The big thing I'm trying to improve on is Job 2. In the past we would get "Processing Rate" in the mid teens. Ever since we upgraded to a new Dell Compellent disk system, we are only getting around 7 MB/s processing rate. The bottleneck stats for the Source had not changed but the Target was in the 50's range and then jumped up to almost equal to the Source. I have been tweaking things here and there and got the Target at least in the 60's now.
Have you played with the block size (storage optimization settings in the job)?
B.F.
Expert
Posts: 160
Liked: 9 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: Bottleneck Confusion

Post by B.F. »

foggy wrote:Have you played with the block size (storage optimization settings in the job)?
Hmm, not sure where these job settings are that you speak of.

Tell me more. :)

Thanks
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Bottleneck Confusion

Post by foggy »

I'm talking about these settings.
B.F.
Expert
Posts: 160
Liked: 9 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: Bottleneck Confusion

Post by B.F. »

Ahh, these settings look like they are for backups and not replication. Job 2 is replication. The Job 1, which is a backup, has all the same settings as listed in your link except it's not encrypted since it's on premises.

Would changing the Transport Mode on the Target Proxy for Job 2 help at all? Stumbled on an older article about how they changing it from Automatic selection to Virtual appliance helped the throughput tremendously. However it looks like their scenario was for backup as well and not replication.

It was also suggested on Reddit that Automatic looks at what works and not necessarily what is fastest?

Thanks
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Bottleneck Confusion

Post by foggy »

You can find the same setting in the replication job. Changing transport mode for the target proxy will not help in your case, due to the fact the bottleneck is on source.
B.F. wrote:It was also suggested on Reddit that Automatic looks at what works and not necessarily what is fastest?
This is not correct, automatic selects the most optimal from the available transport modes.
B.F.
Expert
Posts: 160
Liked: 9 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: Bottleneck Confusion

Post by B.F. »

foggy wrote:You can find the same setting in the replication job.
Well this is interesting. When I go into the Replication Job Settings, Advanced Settings, I do not have all the options that it shows in the above link's pic. What I see under the Traffic tab:

Data Reduction
-Exclude swap file blocks

Compression Level
-<drop down>

That is it. We are on v9.5.0.1038

I also don't see "Guest Processing" as an option on my Replication setting's left column like I do on the pic either.

Please advise.
Thanks
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Bottleneck Confusion

Post by foggy »

Ok, now I see that you're replicating from backup rather than running backup copy job as I initially assumed. Replication from backup job doesn't have these settings, since it is using the same block size as the source backup job.
B.F.
Expert
Posts: 160
Liked: 9 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: Bottleneck Confusion

Post by B.F. »

From your standpoint, there is nothing else we can do to improve Job 2 performance rate?

Thanks
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Bottleneck Confusion

Post by foggy »

It's all about the storage read performance, so anything you could tweak in this regard could help.
B.F.
Expert
Posts: 160
Liked: 9 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: Bottleneck Confusion

Post by B.F. »

OK, not trying to beat the horse but just trying to fully understand all the behind the scenes things that are going on.

I did another test where I have a test VM at the DR site. I then created a VMDK that is on the same iSCSI storage where the replica is dumped. The VMDK is mounted in my test VM as E:\

I ran 11 LAN Speed Test runs to E:\ using 1gb of data. I ran the test from the same Veeam server at the main site that the replication job runs from. The average Read is 63 MB/s and 46 MB/s Write. That is way more than what I'm seeing on Veeam. As I type this, there is a replication job running 87% complete and only getting 5MB/s processing rate.

Why such a large discrepancy?

Thanks
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Bottleneck Confusion

Post by foggy »

Veeam B&R not just sequentially reads data from the source repository (like the test does), but needs to copy only changed blocks, which makes it (slower) random reads.
B.F.
Expert
Posts: 160
Liked: 9 times
Joined: Jan 28, 2014 5:41 pm
Contact:

Re: Bottleneck Confusion

Post by B.F. »

Ah, that makes sense. So if only changed blocks are figured out and read from the source, the writing of the replica to the destination should only need to write what was pulled from the source correct? There really shouldn't be much of a writing overhead then. Wouldn't the replica then be written in sequence?

Thanks!
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Bottleneck Confusion

Post by foggy »

That's correct, but you should pay attention at your source, since it is the bottleneck in this case.
Post Reply

Who is online

Users browsing this forum: No registered users and 92 guests