Veeam job replication very long with HP Storeonce as source

rpop · Post by **rpop** » May 25, 2016 9:32 am this post

Hello,

I need some advices / best practices for using job replication with HP Storeonce as source.

Before:
80 VMs Prod backuped up on "landing zone" (Repo -> SAS 10k on the same physical Veeam proxy server) --> good speed, good perf. !
80 VMs Prod replicated from the same landing zone to (WAN 10Gbps, latency = ~1ms) DR site (NetApp - datastore Vmware) --> acceptable perf (3-4 hours)

Today:
80 VMs Prod backuped up on HP Storeonce catalyst (FC 8Gbps to proxy Veeam) --> perf very poor ! (6-7 hours)
Of course compression and dedup disabled on job Veeam parameter.
80 VMs Prod replicated from the same hp storeonce catalyst to DR site (NetApp - datastore Vmware) --> catastrofic perf (10-12 hours)

Do you think the best solution is always to use a "landing zone"
as primary backup, and the HP StoreOnce for secondary backup (medium-long retention backup) ?

Thanks for your help.

Best regards

Daniel

Post by **nielsengelen** » May 25, 2016 9:55 am this post

It is advised to use a landing zone, this is also what we define as ultimate backup architecture. The problem with using dedup appliances as primary backup target (and source for anything) is the fact that we need to rehydrate all the data which can take a long time.

HPE StoreOnce is a great target for long term archiving with Backup Copy Jobs and GFS.

Post by **foggy** » May 25, 2016 10:19 am this post

Niels is spot on, using dedupe appliance as a primary target for your backups is against our reference architecture.

rpop wrote:80 VMs Prod backuped up on HP Storeonce catalyst (FC 8Gbps to proxy Veeam) --> perf very poor ! (6-7 hours)

Daniel, could you please share some performance numbers for this job (or jobs)? I'm mostly interested in the job processing rate, bottleneck stats, and gateway server location/specs as well. Thanks!

rpop · Post by **rpop** » May 25, 2016 12:09 pm this post

Thanks for your advices and recommandations,

Regarding our performance encountered, here are some numbers:

Code: Select all

the scope:
3 jobs Veeam started at the same time (~9:00PM - 11:00PM) , represent 1 VM for 1st job, 27 VMs for the seconf and 50 for the third job.

Total VMs = 1+27+50 = 78 VMs
(the 1st job with only one VM is one of our two DAG Exchange servers) we don't want to start backup in the same time for our two DAG Exchange, because of problem with timeout cluster and so on  :wink:  big debate on the web !
Total volume for the 3 jobs : 6 To (raw data size view on Vmware)

Veeam job parameter:
20 restore points
Veeam Backup repository :  One Catalyst store created on HP Storeonce (FC 2x 8Gbps) on switch MDS Cisco
                                          per-VM backup files , Decompress backup data blocks before storing
                                          Limit max. concurrent tasks to :  4
                                          No limit data rates to : 0

StoreOnce Housekeeping :   Blackout Windows --> every day 8:00PM - 11:59PM
Catalyst config :   64 devices per initiator Port
Store catalyst :    Primary (Default) Transfer Policy --> Low Bandwidth
                           No quota
                           No limitation bandwidth
                           No encryption
                           Actual dedup ratio in 1 only month :  10.6 (excellent  :) )
                           
Veeam Backup Proxy : One physical server with 16 cores CPU, 48Go of RAM and 17To (SAS 10k in raid5) --> acted as previous landing zone too.
                                   of course connected on switch MDS with dual FC port 8Gbps

Backup mode : Incremental every business night day + full synthetic every week-end.
No maintenance parameter
Storage: unchecked "inline data dedup"
Compression : none
Storage optimization : Local target 16Tb (mandatory with Storeonce  :wink: )
No encryption
CBT enabled of course
Storage integration with NetApp enabled
----------------------------------------------------------
The result: (between 9:00PM - 11:00PM)
NetApp
Dedicated shelves for VMs Prod :  latency --> 1.5ms
                                                                      Network troughput : outbound--> 175-200 Mbps , inbound --> 15-20 Mbps
                                                                      Global IOPS --> ~3500-3700 IOPS
----------------------------------------------------------
HP StoreOnce:     Troughput --> one peak in write at 100 MB/s , average 70-75 MB/s

-----------------------------------------------------------

VEEAM

Backup job: PROD_1
Warning
27 of 27 VMs processed
mardi 24 mai 2016 20:45:08
Success 25 Start time 20:45:08 Total size 2.4 TB Backup size 225.1 GB 
Warning 2 End time 23:13:23 Data read 224.5 GB Dedupe 1.1x
Error 0 Duration 2:28:14 Transferred 224.3 GB Compression 1.0x

Load: Source 92% > Proxy 15% > Network 24% > Target 32% 


Backup job: PROD2
Success
50 of 50 VMs processed
mardi 24 mai 2016 20:11:13
Success 50 Start time 20:11:13 Total size 5.1 TB Backup size 309.0 GB 
Warning 0 End time 22:10:32 Data read 347.4 GB Dedupe 1.2x
Error 0 Duration 1:59:18 Transferred 307.6 GB Compression 1.0x

Load: Source 95% > Proxy 11% > Network 16% > Target 22% 


Backup job: PROD3
Success
1 of 1 VMs processed
mardi 24 mai 2016 20:45:07
Success 1 Start time 20:45:07 Total size 1.1 TB Backup size 227.8 GB 
Warning 0 End time 23:34:59 Data read 228.2 GB Dedupe 1.0x
Error 0 Duration 2:49:52 Transferred 227.8 GB Compression 1.0x

Load: Source 95% > Proxy 11% > Network 16% > Target 22%

Well ! ask me if you need some other information

Best regards
Daniel

Post by **foggy** » May 25, 2016 3:31 pm this post

Seems like the bottleneck is source storage, not StoreOnce. What transport mode is being used? Are storage snapshots effectively utilized?

Post by **tsightler** » May 25, 2016 4:47 pm this post

rpop wrote:Store catalyst : Primary (Default) Transfer Policy --> Low Bandwidth

Curious if you have tried with High Bandwidth setting. Low Bandwidth is really optimal when you run many streams to keep from saturating the bandwidth to StoreOnce. In your case you're only running 4 streams and have 16Gb of FC connectivity which represents ~20x available bandwidth compared to your current throughput, so obviously that's not a concern in your environment for now. You may see better throughput on the jobs with high bandwidth mode. You could also consider increasing your tasks count since you are only running 4 but your setup should be able to support more. This wouldn't like help individual stream throughput, but should give you more streams. Both of those should at least be easy to try and revert if they don't help.

rpop · Post by **rpop** » May 26, 2016 5:10 am this post

Hi,

@foggy : Yes the storage snapshot is utilized with NetApp on a NFS Volume (10 Gbps between NetApp and Veeam proxy on a dedicated VLAN)

Below an extract of Veeam logs for one VM in the job PROD_1

Code: Select all

25.05.2016 21:00:54 :: Using guest interaction proxy xxxxxxxx.lesrp.ch (Different subnet) 
25.05.2016 21:00:59 :: Inventorying guest system 
25.05.2016 21:01:03 :: Preparing guest for hot backup 
25.05.2016 21:01:08 :: Creating snapshot 
25.05.2016 21:01:17 :: Releasing guest 
25.05.2016 21:01:18 :: Getting list of guest file system local users 
25.05.2016 21:01:18 :: Collecting disk files location data 
25.05.2016 21:08:25 :: Removing VM snapshot 
25.05.2016 21:12:22 :: Queued for processing at 25.05.2016 21:12:22 
25.05.2016 21:12:23 :: Required backup infrastructure resources have been assigned 
25.05.2016 21:42:54 :: VM processing started at 25.05.2016 21:42:54 
25.05.2016 21:42:54 :: VM size: 180,0 GB (105,3 GB used) 
25.05.2016 21:44:08 :: Saving [dc-nfs-sas-datastore] xxxxxxxx/xxxxxxxx.vmx 
25.05.2016 21:44:11 :: Saving [dc-nfs-sas-datastore] xxxxxxxx/xxxxxxxx.vmxf 
25.05.2016 21:44:16 :: Saving [dc-nfs-sas-datastore] xxxxxxxx/xxxxxxxx.nvram 
25.05.2016 21:44:20 :: Using backup proxy xxxxxxxx.lesrp.ch for retrieving Hard disk 1 data from storage snapshot 
25.05.2016 21:44:20 :: Using backup proxy xxxxxxxx.lesrp.ch for retrieving Hard disk 2 data from storage snapshot 
25.05.2016 21:44:32 :: Hard disk 2 (120,0 GB) 10,8 GB read at 46 MB/s [CBT]
25.05.2016 21:44:35 :: Hard disk 1 (60,0 GB) 4,7 GB read at 33 MB/s [CBT]
25.05.2016 21:48:53 :: Saving GuestMembers.xml 
25.05.2016 21:49:07 :: Finalizing 
25.05.2016 21:49:36 :: Truncating transaction logs 
25.05.2016 21:49:45 :: Busy: Source 52% > Proxy 15% > Network 30% > Target 84% 
25.05.2016 21:49:45 :: Primary bottleneck: Target 
25.05.2016 21:49:45 :: Network traffic verification detected no corrupted blocks 
25.05.2016 21:49:45 :: Processing finished at 25.05.2016 21:49:46

@tsightler : Thanks for your comment, i will test your advice, but if you read the "Veeam with HPE StoreOnce Catalyst Configuration Guide",
i think the "Low Bandwidth" is the best option for my configuration, what do you think about it ?

http://www8.hp.com/h20195/v2/GetPDF.asp ... 336ENW.pdf (page 4)

Thanks

Best regards

Post by **foggy** » May 26, 2016 11:45 am this post

This particular VM shows target as the bottleneck...

Post by **tsightler** » May 26, 2016 1:38 pm this post

rpop wrote:@tsightler : Thanks for your comment, i will test your advice, but if you read the "Veeam with HPE StoreOnce Catalyst Configuration Guide",
i think the "Low Bandwidth" is the best option for my configuration, what do you think about it ?

http://www8.hp.com/h20195/v2/GetPDF.asp ... 336ENW.pdf (page 4)

Yes, I'm aware of the recommendations, but my point is that your not really in the neighborhood where low bandwidth is likely to cause an improvement since you are nowhere near saturating your actual bandwidth. The recommendations in the guide are good baselines, but they make the assumption that, if you are running many streams, you are likely to see a performance benefit from reducing the bandwidth to the StoreOnce, because otherwise the bandwidth into the StoreOnce would become the bottleneck. But in your case you are nowhere near using all of the available bandwidth on the StoreOnce side.

The low-bandwidth mode has a tendency to limit the performance of a single stream due to the extra overhead of performing extra data reduction prior to sending the data to the StoreOnce device.

To explain this in more detail, imagine a scenario where your StoreOnce has only a single 1GbE link. A single job running at 120MB/s would saturate that link, so of course low-bandwidth mode makes tons of sense there. However, you have a 2x 8Gb FC links, so that's enough bandwidth to support 2000MB/s of throughput, yet your jobs are way down in the 100MB/s range. Obviously data reduction isn't an issue in this case. In a few cases I've seen the change from low to high bandwidth the performance of a single stream from 50MB/s to 150MB/s. With 4 concurrent tasks, that's still only 600MB/s, so nowhere near the bandwidth you have available.

That being said, I'm not sure it's the problem in your case since I see the storage also shows a significant amount of time spent. That being said, you indicated that performance was much better prior to the StoreOnce and I'm assuming the source storage was the same, so I'm just trying to suggest things that are really easy to change and "might" help.

rpop · Post by **rpop** » May 27, 2016 9:58 am this post

Thanks a lot for your explanation, that confirm what i supposed.

You're right the source storage still the same.

I tested with "high bandwidth" parameter on HP StoreOnce but the average troughput during windows backup is always 120-140 Mo/s, not more !

I don't understand why my bootleneck is at source level (source -> storage snapshot NetApp) for almost all Veeam jobs backup
NetApp (nfs volumes & iscsi luns) --> 2x10Gbps ethernet(very low latency) --> Proxy Veeam --> 2x8Gbps FC --> HP StoreOnce 4500

For this week, i prepared the landing zone (SAS 10k on Veeam proxy server) as primary Prod backup, from there i will able to backup on HP StoreOnce, copy on the other HP StoreOnce on the DR site, replicate VMs on NetApp dr site.

We wanted to test HP StoreOnce as primary backup, but we expect that this will be difficult in terms of performance

What do you think about "compression" in the proxy Veeam when the target is an appliance as HP StoreOnce ? for example for copy job ? may we have to maintain the compression at proxy Veeam level ?

Thanks for your time

Best regards

Post by **foggy** » May 30, 2016 10:12 am this post

You can keep compression if the StoreOnce repository is configured to decompress data. However, in your case, repository gateway seems to reside on the same server as the proxy, so it will not take any effect.

R&D Forums

Veeam job replication very long with HP Storeonce as source

Re: Veeam job replication very long with HP Storeonce as sou

Re: Veeam job replication very long with HP Storeonce as sou

Re: Veeam job replication very long with HP Storeonce as sou

Re: Veeam job replication very long with HP Storeonce as sou

Re: Veeam job replication very long with HP Storeonce as sou

Re: Veeam job replication very long with HP Storeonce as sou

Re: Veeam job replication very long with HP Storeonce as sou

Re: Veeam job replication very long with HP Storeonce as sou

Re: Veeam job replication very long with HP Storeonce as sou

Re: Veeam job replication very long with HP Storeonce as sou

Who is online