Best B/U & Replic Config with Nimble & vSphere

Post by **coolsport00** » May 18, 2015 3:41 pm this post

Ok, so at the behest of my mate "dellock6" on the Twitters, I've decided to come here to post my concern/issue/question. This will be my first question on this forum!

I've been using Veeam since its infancy days (8yrs or so) but over the past week I've come to realize how little I still know about Veeam & how best to configure it. Before diving into my query, I'll explain briefly our environment - vSphere55U2, 3 main Clusters of VMs I back up and replicate using Veeam. Use Nimble for prod (VM/source) storage, VNX5300 for target (Repository) storage, and Veeam8U2. How I've config'd Veeam in yrs past - proxy VMs (1 per Cluster) config'd with 4GB RAM & 4 cores & run 4 conc tasks; 1 Veeam VM for b/u & 1 Veeam VM for replication at remote site (jobs are config'd similarly as on the b/u server VM, as far as VMs go..same amt of VMs, same data/size); add VNX storage volumes to vSphere as Datastores, then config a VMDK taking up that whole volume & present it to the Veeam VM as OS Volumes & use those as Repositories. I've config'd the proxies to use nbd specifically because of occasional 'permissions' errors on a VM or 2 within a job, or snaps not releasing from proxies when config'd to use hotadd. I have similar config for my replica Veeam VM (we have a separate VNX for replica data). Performance runs from 3MB/s to 30-40MB/s. Based on numbers I've heard folks get, this is horrible performance/throughput. But, jobs finish generally in under 2hrs. Sometimes jobs take longer. And, I use reverse incr to try & minimize storage size impact of the b/u data. Jobs have anywhere from 8-30VMs in them & anywhere from 1/2TB data (at least total processed data) to just under 5TB. I have no doubt my throughput, speed, & job completion time can be increased/minimized...and probably significantly. So, here I am

Our goal is to move away from VNX as target & use Nimble for everything, and to try & get more than 7 restore points (i.e. 7 days worth of data). That being said, I don't mind starting from scratch and will ask this - given a vSphere environment with 3 Clusters of VMs & Nimble storage, how would you EXPLICITLY (details please) set up Veeam for best performance?....assume a 'blank slate' (how config proxies, how many, conc tasks, how set up repositories & where [proxy or Veeam VM] & conc tasks, what Veeam features to turn on or off, etc., etc.). I don't think there's any true benefit between Direct-SAN & hotadd performance-wise, but am open for either.

I did a test config like this - config'd a proxy VM with 4GB RAM & 4 Cores & added 2 vmnics & presented them as iSCSI to a vSphere vDS (& enabled 9000MTU in the guest OS nic settings). Using Nimble Connection Manager (NCM), which for those who don't know is basically a proprietary piece of software you install in Windows (i.e. on the proxy and/or Veeam server) that's used as a guest OS software iSCSI Initiator. I 'connected' all my Datastores to the proxy via NCM, but didn't present them as volumes (which, if I did would wipe out all my VMS!). Then, I configured a volume on my Nimble & presented that to the proxy which I then DID configure as a Volume in Windows. I then config'd that new volume on the Proxy VM to be used as a Repository on the Veeam VM. I config'd a b/u job to then use that 1 Proxy (config'd the proxy transport mode to 'autoselect') and use the Repository presented to that Proxy as the target. Since I have 4 Cores on the Proxy, I ran at 4 conc tasks, and had the same config'd for the Repository (4 conc tasks). On initial seed/job run, I hit around 400MB/s. My orig job config on initial run only max'd at just under 200MB/s if I recall. Subsequent job runs on my test job only hit at anywhere from 20MB/s-60MB/s. Why such a drop-off on the incremental runs? I spoke with a Veeam engineer last wk & asked & he said the b/u speed just hadn't had a chance to reach multiple hundreds of MB/s yet before the VM finished backing up. Hmm...don't think I believe that, but open for suggestions.And, I believe how I have this test job config'd is a complete Direct-SAN config. I got a suggestion from someone on Twitter who is getting 200MB/s I believe on subsequent (incremental) job runs who uses hotadd. His config is a bit like my test job- 1 Proxy per Host in a Cluster; Proxy connected to Datastores via NCM; Proxy config with 4 cores & thus 4 conc tasks; Repository config'd 'directly' on Veeam VM (proxy feature on Veeam VM disabled) using NCM; config b/u job to 'autoselect' proxy & use the Repository config'd on Veeam. Only latency I would suspect with that config is the network stack transferring data from Proxy to Veeam VM to the Repository. But, according to him, that isn't latent at all. So anyway, given all this (sorry for the long post), what do you all suggest as a super fast b/u (& replica) job config. Oh, 1 other note... our LAN & WAN is 10Gb, so network "shouldn't" be latent; although, I think somewhere in the path there is 1Gb that slows things down, but to this point not sure where that is (Windows guest OS for Proxy or Veeam VM?...not sure what speed the vmxnet3 adapter runs at). The problem I see with my test job is ability to only use 1 Proxy...at least for 'best performance'. I mean, sure I can config the job to use all my proxies (autoselect), but not all the proxies are configured to use the Repository.. just the 1 Proxy (would think there would be some kind of conflict/corruption if all Proxies were connected to the same Volume?). So, using a 1-to1 config doesn't seem too efficient to me.

So, I thinks that it. I think that's all my questions

Thanks much all for any input/insight you can provide.

@coolsport00

Post by **dellock6** » May 18, 2015 9:14 pm this post

Hi Shane,
well I've written shorter blog posts in my life

Kidding, welcome to the forums, let's see what I can answer or ask for more details.
First of all, since the Nimble is iscsi based (or if you have one of the latest FC-based, same applies) the fastest performances can come by using DirectSAN mode with physical proxies, since with it you are avoiding completely the ESXi stack when reading data. This said, network mode is a good choice too if you have 10GB network connections between ESXi and the proxies, but again a good design is to have physical proxies, otherwise blocks go in and out of a virtual proxy network, thus creating twice the traffic. For the benefits, without hotadd you save around 1 minute per VM by skipping all the mount-unmount operations. In environments with hundreds of VMs, those are hours...

Also, for what I understood the repository for Veeam is another volume of the Nimble array? So the IO is double against the same storage array: first to read data from VM, then again to write into the repository. also is not a good practice for security, if you loose the Nimble you loose both production and its backups. They should be separated.

For the autoselection, there is some confusion: the machine mounting the repository is the "repository" role in Veeam, and is the only one writing to disk all the backup files. There is no option in Veeam to mount the same storage in multiple repositories at the same time exactly to guarantee consistency in writes. So, if the repository is a specified machine, when multiple proxies are processing the same job concurrently because of parallel processing, the all read data from the production storage, and they all send data to this repository. This one collects data and write them to disk. So I would enable all the proxies to run any job without a 1:1 assignment, and let Veeam automatic load balancing do the rest.

Also, let us know what the bottleneck statistics are, this is usually the first step to work on tuning.

Luca

Post by **coolsport00** » May 18, 2015 10:43 pm this post

Hey Luca!
Thanks for the response mate! I got all you shared, except the last part ir: Repositories. And, that may be my fault in how I shared info in my initial post. In my test job, to bypass the Proxy from writing data across the virtual stack (VMkernel) from the Proxy VM to the Veeam VM (to save a bit of latency), I added a Nimble Volume directly to the Proxy and configured that volume to be a Repository in the Veeam VM. My assumption is the Veeam VM 'controls' the backup job, telling the Proxy "hey, go find all these VMs in this job from the direct-connected volumes/datastores you see", then "hey, write all this data to the Volume you have direct-connected to you" (B:\ drive on the Proxy is how I configured it). Is my assumption wrong in how Veeam works with Proxies? Is having a Volume directly on a Proxy VM & used as a Repository configured in the Veeam server not the best way because data still goes back to the Veeam server/VM? I guess knowing the data path of how Veeam works would provide me some benefit, which also may help in how I can best configure my backup (and replica) jobs

I deleted my test job earlier today, so I no longer have explicit bottleneck stats. I do recall my target was 0, and I believe the most latency/highest bottleneck was the source in the 90% range. Network & Proxy were minimal %.

Thanks Luca!

Post by **dellock6** » May 19, 2015 7:43 am this post

Data flows directly from proxies to the repository machine, VBR server is only cohordinating the activities (unless the VBR server also holds proxy or repository roles).
We have a directpath via memory when proxy and repository are running in the same VM, but if you also have additional proxies, those write to the remote repository via network. I understand the idea behind your design, but with it you are basically having a bunch of independent "pods" acting as proxy+repository, and each running their own jobs, kinda like v5 design.
honestly I'll try to consolidate repositories in fewer machines (and most of all use a storage that is not the production one), have multiple proxies to leverage parallel processing, and then let VBR dynamically assign VMs to different proxies based on our algorythms.

Encode7 · Post by **Encode7** » May 19, 2015 10:51 am this post

I had success with being less conservative with the limits applied to the number of backup threads to the primary storage & the repository, and letting Veeam to get on with it itself, via automatic selection of Proxies, Parallel processing (to let Veeam choose which proxies to use at an individual VMDK level within a job, e.g. on some jobs I have 3 different proxies backing up the same VM, each different VMDKS) & Backup IO control (to throttle back Veeam if the latency on source or repository goes past desired latency values)

Instead of seeing the source always being the bottleneck, although it was still on some, on quite a few the bottleneck moved to the repository or the network, and the stats on the primary storage saw it achieving higher bandwidth usage for longer, with the same on the physical network switches.

I currently have up to 9 backup threads pointing at the source storage, and have set up to 5 max tasks on the repository.

Post by **tsightler** » May 19, 2015 3:45 pm this post

coolsport00 wrote:...given a vSphere environment with 3 Clusters of VMs & Nimble storage, how would you EXPLICITLY (details please) set up Veeam for best performance

If it were me, knowing no more than I know now, I'd probably buy a separate physical server with enough local storage to keep the backups, connect it via iSCSI to the Nimble for Direct SAN access which would provide by far the fastest backup path and simple dataflow, while storing you backup data outside of the Nimble itself which, as Luca mentioned, is best practice. This provides a very simple design, with easy to understand data flow, and fast recoveries, all for moderate cost.

At that point the only real decision would be choosing the backup mode, which is mostly selecting balance between disk space vs performance. Reverse incremental is the "slowest" but it is also well proven and very space efficient and it can still be quite fast if the target storage is fast and the job setup is well designed (many smaller jobs with multiple repositories on the same disk). The forward incremental modes all backup very fast, but you have to pay at some point, either with active or synthetic fulls, which use both space and time (and in the case of active fulls, source storage I/O) or with merge time after each job (when using forever forward incremental).

Personally I still like reverse incremental, assuming it can be designed to meet your backup window, because I believe it is the easiest to plan and manage from space perspective, but it's not going to give you blazing fast incremental speeds, but typically in the 20-30MB/s range. But the question is, do you need it to be faster? You can have many jobs running concurrently, all providing similar performance, so good job design can really help here. Also, in most environments the majority of VMs have a very low change rate, so they'll still backup in a few minutes no matter what. If a subset of VMs in the environment are high change rate (typically Exchange/SQL/Oracle) then those are probably candidates for forward incremental with active fulls.

But there are simply so many ways to achieve your goals that providing a single "best" way is pretty difficult. If you want to dig a little deeper we'd probably need some more details, such as number of VMs, size of data, etc. It likely wouldn't change my answer to the "best" hardware as I still believe dedicated boxes with physical storage is best, but it might help us decide on better approaches for job management.

Post by **coolsport00** » May 19, 2015 4:54 pm this post

I don't have the time to give a full reply to all 3 of you (yet) as work & home 'stuff' has me tied up so far today. But...I will later tonight or tomorrow.

To the point made about physical boxes...my Director & I are looking at potentially repurposing a server or 2 to dedicate as a Proxy and/or Veeam server. I'll provide more info, etc. later.

THANK YOU SOOOO MUCH for *all* the great input thus far! It seems I wasn't too far off on my logic how Veeam works. Now I just need to piece it all together to make it work most efficiently. "tsightler" - you're pretty close to what my thinking was/is ir: the 'best' b/u solution. Again, I'll provide more details later.

Cheers!
@coolsport00

Post by **coolsport00** » May 19, 2015 7:21 pm this post

Ok gents.. I finally found a window of time to provide more info.

So, I have 9 total jobs. I generally have the jobs (& thus VMs) grouped by either vSphere Cluster (DMZ, CO, etc.), vendor product (i.e. app & DB VMs for our timeclock system or our main org system/software), or "type" (I have a job with about 10 or so DB VMs solely; have another job with Sharepoint app/DB VMs). Jobs have anywhere from 8-38 VMs in them. The jobs with more VMs contain VMs with not much change data. Jobs containing VMs with higher change data (i.e. DB VMs) have fewer VMs. I also have 2 file servers in their own job because of their size (5TB and 3TB each, but not all disk is used; each has about 4TB & 1.5TB usable data written to disk/Repository).

About my vSphere environment - I have 3 Clusters that have VMs I backup and replicate. I backup locally with 1 Veeam server; and replicate offsite with another Veeam server that is also offsite (at the replicated site). I have a Proxy VM in each Cluster for each Veeam server type (backup & replication).

I think my Director & I can gather up 3 physical boxes to use. Proliant DL360G6 and G7. How best could they be used? Make 1 the Veeam b/u server & others Proxies? Make them all Proxies? Is there a benefit to having a Phys Proxy vs VM Proxy? And, by benefit...I mean *significant* I/O or latency benefit. These don't have room for local storage for Repository. We'd have to do something else for that, though what not sure. Open to suggestions for inexpensive, reliable storage for iSCSI Repository. Need about 40TB + extra to grow. But, what I'm gathering from you all (& makes sense & I had the same thought myself), don't use the same storage for Repository as where my Prod/source data resides? That would place a lot load against it, as well as be a SPOF (although I do have VM replicas on a 2nd array offsite). I agree.

What about I/O Control settings as well as 'concurrent task' settings for both Proxy & Repository? You state 'overload' Veeam for those...but you can't specify more conc tasks on the Proxy than what resources it has, right? Or, can you, but you just get a warning? Is there a suggested limit of conc tasks for Proxy & Repository? When you speak of Backup I/O Control, I assume you mean the latency setting in the 'Options' > 'I/O Control' tab? If so, what are appropriate ms settings for that? I know vSphere latency for disk begins to be noticeable at 10ms. I assume for image backup, can be more? Suggested values?

Ok, so given our vSphere environment (55U2, but about to transfer to 6), 2 or 3 phys servers to use, Veeam v8U2, Nimble as our source (prod/VM) SAN, & unknown (VNX5300?) for target/Repository, what general setup would you all suggest for good b/u performance?

If you all would like any further info, let me know.

Regards.

Encode7 · Post by **Encode7** » May 19, 2015 7:48 pm this post

On the concurrent task of proxies, stick with what Veeam advises, i.e. number of concurrent tasks limited by the number of CPU cores in the proxy. If you want more backup threads going to the source storage, have more CPU cores, either within the same proxy, or have multiple proxies. For sizing see http://helpcenter.veeam.com/backup/70/b ... oxies.html. I am still experimenting to see the optimal number of backup tasks and hence CPU count. I presently have 9, but my repository is limited to 5 at present (once I switch to a more powerful storage target soon, I'll try multiple repositories).

IO latency control, at least on the source side, it depends what other storage demand there is whilst the backup is running - 20ms is the general max for apps you don't want to go above, but whatever is most relevant for your environment and what latency apps are used to.

A physical server with sufficient CPU power and RAM with local disks (internal or direct attached) with direct SAN connection to storage, and then moving that data off somewhere else (tape, other) is the typical recommendation for fast performance at reasonable cost.

Post by **coolsport00** » May 19, 2015 7:58 pm this post

Oh hey Ian... thanks for chiming in to my thread, mate!

Ok, so basically enable I/O control & keep the defaults that are set (generally speaking)?

Others' input?

Thanks!

Encode7 · Post by **Encode7** » May 19, 2015 8:05 pm this post

I have IO control left at the defaults:

Stop assigning new tasks to datastore at: 20ms
Throttle I/O of existing tasks at: 30ms

I have the enterprise plus edition of Veeam, which enables more granular control of IO control than in the enterprise edition (IO control isn't in the standard edition), but I haven't enabled that (so far).

Post by **coolsport00** » May 19, 2015 8:16 pm this post

Ok, cool; we have the Ent version as well. Will just keep that setting when I re-setup our Veeam infrastructure & see how it all goes.

Post by **coolsport00** » May 27, 2015 6:02 pm this post

So, I think I have it all down, but a last question about Repositories - how is it 'best' to configure them/set them up, not only based off what we've posted about my environment so far here, but as a 'general best practice'? Do I connect Repositories to Proxie (then of course add them via the Veeam server)? Do I only connect them solely to the Veeam server then let the Proxies transfer the backup data to them via the network? Or, set up Proxies with Repositories, then do 'autoselect' in the backup job as far as which Proxies to use and, if a Proxy is direct-access to VMs as well as Repository, then great, but if the Proxy used is 'remote' to the Repository, it will just use network. This gives me the most flexibility with still pretty good performance. I'm thinking I'm not gaining that much more performance doing a 1-to-1 (Proxy > Repository) per job. Seems like I'd be limiting myself. Thoughts?

Thanks!

Post by **coolsport00** » May 27, 2015 6:03 pm this post

One additional bit of info ir: what we're going to use - we have a phys server to use as the Veeam server and, at least at this point, Proxies will be VMs. That may change, but for now that's what I have.

Thanks.

Post by **coolsport00** » Feb 29, 2016 2:28 pm this post

Thought I would update (finalize) this thread to share what we decided on implementing, and seems to work pretty well (best/fastest performance and job run time):

Environment:
Latest version of Veeam B&R (9.0) in a VM on vSphere 6
3 total physical proxies - 2 "local" (CO - central office) and 1 at remote site (DR)
Storage - prod storage (used by vSphere) is Nimble (will call array1 and array2 for CO and DR location respectively)
- backup storage used by Veeam B&R is also Nimble (will call array3 and array4 for CO and DR location respectively)

Setup for DIRECT-SAN:
1. Created 2 physical proxies at our CO (Central Office) and 1 physical proxy at our remote site (DR site) all on Win2012R2; installed Nimble Connection Manager from the Nimble Windows Toolkit. Installed Windows Multipath I/O 'feature' and added an HP NC552SFP 10Gb Dual-Port HBA for iSCSI. Added array1 or array2 to be seen by the proxies only (i.e. not configured/formatted in Disk Mgr, of course) to give direct-access to source VM data. This is done by granting access to those volumes on array1 and array2 via iSCSI Initiator Groups and adding to the 'Access' tab of the volumes' settings
2. On array3 and array4, create volumes for Veeam B&R backups and attach to proxies; create volumes in Windows and format (default 4KB block size) to give Veeam B&R direct-access to the target storage. Create volumes in Nimble using 4KB blocks (perf policy on Nimble)
3. On the Veeam B&R server, add the Proxy Servers, then add Repositories assigned from each Proxy; create backup jobs and assign to proper storage/Repository. We used Reverse Incremental as a Veeam B&R backup job type to conserve storage. We also configured each b/u job to use 'dedup-friendly' compression and "local target' dedup; and, enabled I/O control keeping the default storage latency settings .

We see immensely good speed and job completion times (e.g. 1TB b/u job runs in appx 25mins with max read speed of over 1Gb/s using Reverse Incremental); but again, times also are dependent on amt of change data as well. But, generally speaking, this is our best setup. Upon initial (FULL) seeding of my b/u jobs, I was seeing up to around 9Gb/s read speed in Veeam, but out of my 10 total b/u jobs, I think the avg was around 5Gb/s or so.

Thank you everyone for the input.

R&D Forums

Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Re: Best B/U & Replic Config with Nimble & vSphere

Who is online