-
- Expert
- Posts: 164
- Liked: 9 times
- Joined: Jan 28, 2014 5:41 pm
- Contact:
Bottleneck Confusion
Here's the layout.
Veeam Server has a large RAID 5 Disk via SAS physical connection.
There are 2 jobs in this Veeam Server scenario:
Job 1 Backs up a large VM to the RAID listed above
Job 2 Replicates the recently backed up VM to a secondary site based off of the backup in Job 1
Job 1 Bottleneck shows that the Target (the SAS RAID 5) is at 0%
Job 2 Bottleneck shows the Source (same SAS RAID 5) is at 75%
Based off this data, Veeam has to work much harder to read the data than to write the data on the same media?
Is this normal?
Thanks
Veeam Server has a large RAID 5 Disk via SAS physical connection.
There are 2 jobs in this Veeam Server scenario:
Job 1 Backs up a large VM to the RAID listed above
Job 2 Replicates the recently backed up VM to a secondary site based off of the backup in Job 1
Job 1 Bottleneck shows that the Target (the SAS RAID 5) is at 0%
Job 2 Bottleneck shows the Source (same SAS RAID 5) is at 75%
Based off this data, Veeam has to work much harder to read the data than to write the data on the same media?
Is this normal?
Thanks
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Bottleneck Confusion
Bottleneck distribution depends on the other parts of the processing chain. In your case, backup job target spends most of the time waiting for other components in the source-proxy-network-target chain, while the same array is the weakest part in the backup copy job processing chain.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Bottleneck Confusion
It's also very important to note that this is nearly 100% to be expected. Veeam is measuring how much time it spent "waiting" for reads or writes and typically writes are lower latency than reads due to write-caching on the RAID controller which means that the writes can be acknowledged nearly immediately (as soon as it's in cache), while reads take whatever time is required to actually retrieve the blocks from the spinning disk.
Not only that, but as foggy properly points out, most likely the backup chain had the bottleneck somewhere else, which kept the writes to the RAID disk from being the point where Veeam was waiting.
Not only that, but as foggy properly points out, most likely the backup chain had the bottleneck somewhere else, which kept the writes to the RAID disk from being the point where Veeam was waiting.
-
- Expert
- Posts: 164
- Liked: 9 times
- Joined: Jan 28, 2014 5:41 pm
- Contact:
Re: Bottleneck Confusion
Does this seem normal for such a vast difference for the array if it's writing vs reading?
Perhaps I should have listed the full chains for each
Job 1: Source 68% > Proxy 82% > Network 39% > Target 0%
Job 2: Source 75% > Proxy 5% > Network 34% > Target 64%
Job 2 is the one I'm trying to figure out if there are any performance gains to be made. I have a couple things I'm trying for the Target. I'm guessing since the array is directly connected to the Veeam server as the source, there isn't much I can do?
Thanks!
Perhaps I should have listed the full chains for each
Job 1: Source 68% > Proxy 82% > Network 39% > Target 0%
Job 2: Source 75% > Proxy 5% > Network 34% > Target 64%
Job 2 is the one I'm trying to figure out if there are any performance gains to be made. I have a couple things I'm trying for the Target. I'm guessing since the array is directly connected to the Veeam server as the source, there isn't much I can do?
Thanks!
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Bottleneck Confusion
Yes, totally normal. In the first instance the data is being read from the source so there's a good amount of wait there, but then it's compressed and deduped, thus most wait is for CPU on the proxy. By the time the data stream is being written to the target it's compressed/deduped, so there's a lot less data being written to the target that being read and processed by the source/proxy.
For a backup copy the pattern is completely different, the amount of data read is the same as the amount of data written, so I'd expect the source to be the bottleneck, while the proxy is barely used (data is already compressed/deduped) so the bottlenecks are just read/write speeds of the two sides, with the read almost always being slightly slower than the writes. Those numbers look exactly like what would be expected.
For a backup copy the pattern is completely different, the amount of data read is the same as the amount of data written, so I'd expect the source to be the bottleneck, while the proxy is barely used (data is already compressed/deduped) so the bottlenecks are just read/write speeds of the two sides, with the read almost always being slightly slower than the writes. Those numbers look exactly like what would be expected.
-
- Expert
- Posts: 164
- Liked: 9 times
- Joined: Jan 28, 2014 5:41 pm
- Contact:
Re: Bottleneck Confusion
Wow, very insightful information!
If one were to want to try and improve performance in Job 1, looks like I need to beef up the Proxy?
The big thing I'm trying to improve on is Job 2. In the past we would get "Processing Rate" in the mid teens. Ever since we upgraded to a new Dell Compellent disk system, we are only getting around 7 MB/s processing rate. The bottleneck stats for the Source had not changed but the Target was in the 50's range and then jumped up to almost equal to the Source. I have been tweaking things here and there and got the Target at least in the 60's now.
If anyone might have some suggestions on how to improve Job 2, would be greatly appreciated.
Thanks
If one were to want to try and improve performance in Job 1, looks like I need to beef up the Proxy?
The big thing I'm trying to improve on is Job 2. In the past we would get "Processing Rate" in the mid teens. Ever since we upgraded to a new Dell Compellent disk system, we are only getting around 7 MB/s processing rate. The bottleneck stats for the Source had not changed but the Target was in the 50's range and then jumped up to almost equal to the Source. I have been tweaking things here and there and got the Target at least in the 60's now.
If anyone might have some suggestions on how to improve Job 2, would be greatly appreciated.
Thanks
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Bottleneck Confusion
Correct.B.F. wrote:If one were to want to try and improve performance in Job 1, looks like I need to beef up the Proxy?
Have you played with the block size (storage optimization settings in the job)?B.F. wrote:The big thing I'm trying to improve on is Job 2. In the past we would get "Processing Rate" in the mid teens. Ever since we upgraded to a new Dell Compellent disk system, we are only getting around 7 MB/s processing rate. The bottleneck stats for the Source had not changed but the Target was in the 50's range and then jumped up to almost equal to the Source. I have been tweaking things here and there and got the Target at least in the 60's now.
-
- Expert
- Posts: 164
- Liked: 9 times
- Joined: Jan 28, 2014 5:41 pm
- Contact:
Re: Bottleneck Confusion
Hmm, not sure where these job settings are that you speak of.foggy wrote:Have you played with the block size (storage optimization settings in the job)?
Tell me more.
Thanks
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Bottleneck Confusion
I'm talking about these settings.
-
- Expert
- Posts: 164
- Liked: 9 times
- Joined: Jan 28, 2014 5:41 pm
- Contact:
Re: Bottleneck Confusion
Ahh, these settings look like they are for backups and not replication. Job 2 is replication. The Job 1, which is a backup, has all the same settings as listed in your link except it's not encrypted since it's on premises.
Would changing the Transport Mode on the Target Proxy for Job 2 help at all? Stumbled on an older article about how they changing it from Automatic selection to Virtual appliance helped the throughput tremendously. However it looks like their scenario was for backup as well and not replication.
It was also suggested on Reddit that Automatic looks at what works and not necessarily what is fastest?
Thanks
Would changing the Transport Mode on the Target Proxy for Job 2 help at all? Stumbled on an older article about how they changing it from Automatic selection to Virtual appliance helped the throughput tremendously. However it looks like their scenario was for backup as well and not replication.
It was also suggested on Reddit that Automatic looks at what works and not necessarily what is fastest?
Thanks
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Bottleneck Confusion
You can find the same setting in the replication job. Changing transport mode for the target proxy will not help in your case, due to the fact the bottleneck is on source.
This is not correct, automatic selects the most optimal from the available transport modes.B.F. wrote:It was also suggested on Reddit that Automatic looks at what works and not necessarily what is fastest?
-
- Expert
- Posts: 164
- Liked: 9 times
- Joined: Jan 28, 2014 5:41 pm
- Contact:
Re: Bottleneck Confusion
Well this is interesting. When I go into the Replication Job Settings, Advanced Settings, I do not have all the options that it shows in the above link's pic. What I see under the Traffic tab:foggy wrote:You can find the same setting in the replication job.
Data Reduction
-Exclude swap file blocks
Compression Level
-<drop down>
That is it. We are on v9.5.0.1038
I also don't see "Guest Processing" as an option on my Replication setting's left column like I do on the pic either.
Please advise.
Thanks
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Bottleneck Confusion
Ok, now I see that you're replicating from backup rather than running backup copy job as I initially assumed. Replication from backup job doesn't have these settings, since it is using the same block size as the source backup job.
-
- Expert
- Posts: 164
- Liked: 9 times
- Joined: Jan 28, 2014 5:41 pm
- Contact:
Re: Bottleneck Confusion
From your standpoint, there is nothing else we can do to improve Job 2 performance rate?
Thanks
Thanks
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Bottleneck Confusion
It's all about the storage read performance, so anything you could tweak in this regard could help.
-
- Expert
- Posts: 164
- Liked: 9 times
- Joined: Jan 28, 2014 5:41 pm
- Contact:
Re: Bottleneck Confusion
OK, not trying to beat the horse but just trying to fully understand all the behind the scenes things that are going on.
I did another test where I have a test VM at the DR site. I then created a VMDK that is on the same iSCSI storage where the replica is dumped. The VMDK is mounted in my test VM as E:\
I ran 11 LAN Speed Test runs to E:\ using 1gb of data. I ran the test from the same Veeam server at the main site that the replication job runs from. The average Read is 63 MB/s and 46 MB/s Write. That is way more than what I'm seeing on Veeam. As I type this, there is a replication job running 87% complete and only getting 5MB/s processing rate.
Why such a large discrepancy?
Thanks
I did another test where I have a test VM at the DR site. I then created a VMDK that is on the same iSCSI storage where the replica is dumped. The VMDK is mounted in my test VM as E:\
I ran 11 LAN Speed Test runs to E:\ using 1gb of data. I ran the test from the same Veeam server at the main site that the replication job runs from. The average Read is 63 MB/s and 46 MB/s Write. That is way more than what I'm seeing on Veeam. As I type this, there is a replication job running 87% complete and only getting 5MB/s processing rate.
Why such a large discrepancy?
Thanks
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Bottleneck Confusion
Veeam B&R not just sequentially reads data from the source repository (like the test does), but needs to copy only changed blocks, which makes it (slower) random reads.
-
- Expert
- Posts: 164
- Liked: 9 times
- Joined: Jan 28, 2014 5:41 pm
- Contact:
Re: Bottleneck Confusion
Ah, that makes sense. So if only changed blocks are figured out and read from the source, the writing of the replica to the destination should only need to write what was pulled from the source correct? There really shouldn't be much of a writing overhead then. Wouldn't the replica then be written in sequence?
Thanks!
Thanks!
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Bottleneck Confusion
That's correct, but you should pay attention at your source, since it is the bottleneck in this case.
Who is online
Users browsing this forum: Google [Bot] and 30 guests