TargetWAN Bottleneck, but not the SSD's

obroni · Post by **obroni** » Jun 23, 2015 11:01 am this post

All of our copy jobs which are using Wan Accelerators always list TargetWAN as the bottleneck. The Target Wan Accelerator is sitting on reasonably fast SSD's and utilisation is minimal. The WAN link is 50Mps, but I rarely see utilisation go over 10-20Mps.

However I can see that during the job there are lots of reads on the target datastore. Am I correct in thinking that the "TargetWAN bottleneck" will also cover Cache Misses where the Accelerator reads from the target repository and uses it as a secondary cache?

If that assumption is correct I will make another one, that is, if I increase the Global Cache size, it should help to improve performance by shifting more work from the repository to the cache SSD's?

Post by **PTide** » Jun 23, 2015 11:42 am this post

Hi Nick,

Did I get it correct - copy jobs show that the bottleneck is on the target side whereas drive and network on the target accelerator are not fully utilized?

obroni · Post by **obroni** » Jun 23, 2015 11:45 am this post

Copy jobs are showing the TargetWAN as the bottleneck, but the SSD's used for the Accelerator are hardly being used. The target repository is under reasonable load, but not from writes.

Questions was mainly around what TargetWAN bottleneck includes and if the behaviour I am seeing is a result of too many cache misses.

Post by **PTide** » Jun 23, 2015 12:04 pm this post

Well, first of all, I'd check with Svc.VeeamWANSvc.log file. There you may find some info which part of copying takes most of the time.

Thank you.

obroni · Post by **obroni** » Jun 23, 2015 12:12 pm this post

Thanks for the hint. Are you referring to something like this line:-

Timings: PushData 3546 Read 8018 ReadImpl 2641 WaitForData 410

Post by **PTide** » Jun 23, 2015 12:27 pm this post

I'm reffering to something like

Code: Select all

[05.10.2013 12:29:33] <  9292> srct| Performance statistics:
[05.10.2013 12:29:33] <  9292> srct| SourceDiskRead           58 sec     
[05.10.2013 12:29:33] <  9292> srct| SourceWriteCmdFile       16 sec     
[05.10.2013 12:29:33] <  9292> srct| SourceCompressCmdFile    194 sec   
[05.10.2013 12:29:33] <  9292> srct| SourceDeleteCmdFile      0 sec     
[05.10.2013 12:29:33] <  9292> srct| SendCmdFile              39 sec     
[05.10.2013 12:29:33] <  9292> srct| SendWAIT                 2792 sec   
[05.10.2013 12:29:33] <  9292> srct| TargetDiskRead           287 sec   
[05.10.2013 12:29:33] <  9292> srct| TargetDiskWrite          97 sec     
[05.10.2013 12:29:33] <  9292> srct| TargetReadCmdFile        24 sec     
[05.10.2013 12:29:33] <  9292> srct| TargetDecompressCmdFile  15 sec     
[05.10.2013 12:29:33] <  9292> srct| TargetDeleteCmdFile      0 sec     
[05.10.2013 12:29:33] <  9292> srct| TargetWAITCmdFile        1871 sec   
[05.10.2013 12:29:33] <  9292> srct| TargetTotalCmdFileProcessing920 sec   
[05.10.2013 12:29:33] <  9292> srct| TargetGlobalInit         0 sec     
[05.10.2013 12:29:33] <  9292> srct| TargetGlobalWrite        417 sec   
[05.10.2013 12:29:33] <  9292> srct| TargetGlobalRead         0 sec     
[05.10.2013 12:29:33] <  9292> srct| TargetGlobalCommit       0 sec     
[05.10.2013 12:29:33] <  9292> srct| AlgDiskProcessing        2614 sec   
[05.10.2013 12:29:33] <  9292> srct| AlgInitialization        37 sec     
[05.10.2013 12:29:33] <  9292> srct| AlgInitNonZeroExtents    0 sec     
[05.10.2013 12:29:33] <  9292> srct| AlgInitCBT               0 sec     
[05.10.2013 12:29:33] <  9292> srct| AlgInitDigests           13 sec     
[05.10.2013 12:29:33] <  9292> srct| AlgInitGlobalDedup       15 sec     
[05.10.2013 12:29:33] <  9292> srct| AlgFinalization          35 sec     
[05.10.2013 12:29:33] <  9292> srct| AlgCommitGlobalDedup     15 sec     
[05.10.2013 12:29:33] <  9292> srct| AlgWAIT                  136 sec   
[05.10.2013 12:29:33] <  9292> srct| AlgBlockProcessing       2416 sec   
[05.10.2013 12:29:33] <  9292> srct| TASK COMPLETED SUCCESSFULLY, elapsed: 2841.4560 sec

As I remember it should be located in

C:\Users\All Users\Veeam\Backup\

obroni · Post by **obroni** » Jun 23, 2015 12:33 pm this post

Ah, gotcha I was looking in the target Wan log file. Here is that section from the source

Code: Select all

[23.06.2015 12:43:12] < 16484> wan| ___________________________________________________________________
[23.06.2015 12:43:12] < 16484> wan| Dedup statistics:
[23.06.2015 12:43:12] < 16484> wan| Total handled size       15.3438 GB 100%
[23.06.2015 12:43:12] < 16484> wan| New data size            0.9189 GB  5%  
[23.06.2015 12:43:12] < 16484> wan| Cur disk dedup           0.1742 GB  1%  
[23.06.2015 12:43:12] < 16484> wan| Prev disk dedup          1.1997 GB  7%  
[23.06.2015 12:43:12] < 16484> wan| Prev disk no write dedup 11.9447 GB 77% 
[23.06.2015 12:43:12] < 16484> wan| Global disk dedup        1.1063 GB  7%  
[23.06.2015 12:43:12] < 16484> wan| Zero data size           0.0000 GB  0%  
[23.06.2015 12:43:12] < 16484> wan| ___________________________________________________________________
[23.06.2015 12:43:12] < 16484> wan| Network statistics:
[23.06.2015 12:43:12] < 16484> wan| Received cmd:            0.0002 GB  
[23.06.2015 12:43:12] < 16484> wan| Send cmd:                0.0002 GB  
[23.06.2015 12:43:12] < 16484> wan| Received transport:      0.0000 GB  
[23.06.2015 12:43:12] < 16484> wan| Send transport:          0.3734 GB  
[23.06.2015 12:43:12] < 16484> wan| Corrupted blocks:        0          
[23.06.2015 12:43:12] < 16484> srct| Performance statistics:
[23.06.2015 12:43:12] < 16484> srct| SourceDiskRead           142 sec    
[23.06.2015 12:43:12] < 16484> srct| SourceDiskDirectRead     21 sec     
[23.06.2015 12:43:12] < 16484> srct| SourceCompression        15 sec     
[23.06.2015 12:43:12] < 16484> srct| SendWanData              40 sec     
[23.06.2015 12:43:12] < 16484> srct| SourceCommandsWrite      3 sec      
[23.06.2015 12:43:12] < 16484> srct| TargetDiskRead           210 sec    
[23.06.2015 12:43:12] < 16484> srct| TargetDiskWrite          10 sec     
[23.06.2015 12:43:12] < 16484> srct| TargetWAITCmdFile        0 sec      
[23.06.2015 12:43:12] < 16484> srct| TargetTotalCmdFileProcessing364 sec    
[23.06.2015 12:43:12] < 16484> srct| TargetGlobalInit         0 sec      
[23.06.2015 12:43:12] < 16484> srct| TargetGlobalWrite        30 sec     
[23.06.2015 12:43:12] < 16484> srct| TargetGlobalRead         99 sec     
[23.06.2015 12:43:12] < 16484> srct| TargetGlobalCommit       0 sec      
[23.06.2015 12:43:12] < 16484> srct| TargetDecompression      3 sec      
[23.06.2015 12:43:12] < 16484> srct| TargetGlobalAgentRead    11 sec     
[23.06.2015 12:43:12] < 16484> srct| AlgDiskProcessing        299 sec    
[23.06.2015 12:43:12] < 16484> srct| AlgInitialization        24 sec     
[23.06.2015 12:43:12] < 16484> srct| AlgInitNonZeroExtents    0 sec      
[23.06.2015 12:43:12] < 16484> srct| AlgInitCBT               0 sec      
[23.06.2015 12:43:12] < 16484> srct| AlgInitDigests           8 sec      
[23.06.2015 12:43:12] < 16484> srct| AlgInitGlobalDedup       8 sec      
[23.06.2015 12:43:12] < 16484> srct| AlgFinalization          15 sec     
[23.06.2015 12:43:12] < 16484> srct| AlgCommitGlobalDedup     0 sec      
[23.06.2015 12:43:12] < 16484> srct| AlgBlockProcessing       110 sec    
[23.06.2015 12:43:12] < 16484> srct| AlgManifestProcessing    51 sec     
[23.06.2015 12:43:12] < 16484> srct| AlgWaitManifest          54 sec     
[23.06.2015 12:43:12] < 16484> srct| AlgWaitTrgApply          85 sec     
[23.06.2015 12:43:12] < 16484> srct| TASK COMPLETED SUCCESSFULLY, elapsed: 431.6880 sec

I'm guessing the TargetDiskRead relates to the high number of reads I am seeing on the repository?

Post by **PTide** » Jun 23, 2015 12:39 pm this post

Nick,

Please contact support team so they can take a look at the logs.

alanbolte · Post by **alanbolte** » Jun 23, 2015 5:00 pm this post

Nick,

I believe your interpretation is accurate, but I'm not aware of any guidelines established yet for effectiveness of larger cache on this type of bottleneck. That is, I'm sure it'll help, but I can't say whether doubling your cache size will lead to a 1% or 100% improvement.

obroni · Post by **obroni** » Jun 23, 2015 5:04 pm this post

Ok, that's good to hear I'm on the correct line of thinking. I assume increasing the cache on the Target won't need all the digests to be recalculated and it will just start using the extra space?

Post by **PTide** » Jun 23, 2015 5:54 pm this post

Nick,

In case Traget WAN accelerator has not found anything in its cache it may issue some read requests to repo:

Additionally, Veeam Backup & Replication analyzes restore points that have been previously copied to the target side. If duplicates are found, Veeam Backup & Replication does not copy such blocks over the WAN but takes them from the global cache.

I'd like to ask you a few questions regarding you backup infrastructure, if you don't mind.

How many BackupJobs/Backups do you have in average per single BackupCopyJob?

How many RPs does your target repo contain in total?

Also, please specify, what kind of data do you have inside your backups?

Which OS is backed up?

Thank you!

obroni · Post by **obroni** » Jun 23, 2015 7:54 pm this post

Sure,

Its ranging from around 4-15 vm's per backup copy job
Keeping 2 restore points per job
Real mixture of OS's Server 2003,2008,2012 including normal SME stuff SQL,Exchange, fileserver...etc

Post by **PTide** » Jun 24, 2015 11:35 am this post

The systems and data you've mentioned has to be cached well, due to a fairly high rate of duplicate data.

Ok, then what's going on in statistics? It usually tells you how much data has been obtained from cache.

i.e.

Hard disk 1 (4.9 GB) 215.0 MB read at 14 MB/s
5.4 MB transferred over network, 415,7 MB obtained from WAN Accelerator cache

Also what's your WANaccelerator SSD's capacity in relation to total size of all backups that go through it?

Thank you.

obroni · Post by **obroni** » Jun 24, 2015 11:44 am this post

Few examples

24/06/2015 04:17:16 :: 781.7 MB transferred over network, 199.2 MB obtained from WAN Accelerator cache
24/06/2015 04:36:38 :: 1.2 GB transferred over network, 633.2 MB obtained from WAN Accelerator cache
24/06/2015 03:55:23 :: 1.2 GB transferred over network, 909.8 MB obtained from WAN Accelerator cache
24/06/2015 03:30:48 :: 1.2 GB transferred over network, 550.9 MB obtained from WAN Accelerator cache

The target global cache size is set to 70GB, there is around 4TB of raw VM's and the backup copy total size is around 2TB.

Post by **PTide** » Jun 24, 2015 11:57 am this post

obroni wrote: The target global cache size is set to 70GB, there is around 4TB of raw VM's and the backup copy total size is around 2TB.

Are you using Many-To-One WAN acceleration model?

70 Gb cache seems to be small for such amount of data, I'd try to increase the amount of cache, as Allan has already suggested.

Please refer to this guide.

P.S. After tweaking, please share your results here, if don't mind.

Thank you.

obroni · Post by **obroni** » Jun 24, 2015 12:03 pm this post

Ok, I've increased it up to 150GB for now. I will let the jobs run over night again and let you know in a couple of days if it has made a difference.

I have read that guide, but in less I'm mistaken there's no guidelines around actual cache sizing, apart from the 10GB per OS on the wan accelerator config wizard.

R&D Forums

TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Re: TargetWAN Bottleneck, but not the SSD's

Who is online