Comprehensive data protection for all workloads
Post Reply
mkretzer
Veeam Legend
Posts: 1203
Liked: 417 times
Joined: Dec 17, 2015 7:17 am
Contact:

My small or less big hot add proxies

Post by mkretzer »

Hello,

we want to start using linux hotadd proxies after using NBD for a year because of the data inconsistency bug that was discussed in the other thread today.

I remember with hotadd with every disk that was added to the proxies it took some time for the reconfiguration which basically was done sequentially. Back then we had 3 hotadd proxies with 8 CPU each, so in total 24 concurrent tasks.
In theory, would it help it have 24 VMs with one cpu each instead? Would that help parallelizing the hot add processes?

Markus
HannesK
Product Manager
Posts: 14844
Liked: 3086 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: My small or less big hot add proxies

Post by HannesK »

Hello Markus,
8 vCPUs sounds good. I'm not sure what you mean with "sequentially" (which source are you referring to?) and would not go with 24 proxies :-)

Best regards,
Hannes
mkretzer
Veeam Legend
Posts: 1203
Liked: 417 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: My small or less big hot add proxies

Post by mkretzer »

I mean that with every disk that must be read the config of the proxy VM must be changed, which takes some seconds. Since we have > 3600 VMs with > ~ 4000 Disks this will take some time - and only one change can be done at a time!

In theory with 24 proxies these changes can all be done in paralell, correct?
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: My small or less big hot add proxies

Post by Gostev »

Correct.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: My small or less big hot add proxies

Post by tsightler » 1 person likes this post

Hey Markus. In the field we've generally found that 8 vCPU hotadd are a really good balance of performance and manageability for larger scale environments that wish to use hotadd. I work with some customers that use 4 or 8 vCPU hot add proxies to backup >10,000 VMs (the largest are in the 20,000 VM ballpark) with no issues overall.

IMO, the biggest thing to keep in mind is not the small delay during attach/detach, but that the host has enough resource to schedule all of the CPUs without spiking CPU ready/co-stop. Remember that having a bunch of proxies does move significantly additional work to the VBR resource scheduler, which evaluates each proxies eligibility for each disk before attach/detach. I've watched customers build very large hotadd proxies only to see them have sub-par performance, mostly due to high CPU ready. In general, as long as your proxy doesn't exceed 1/2 of the host CPU resources, fits within a single NUMA node, and the host isn't overloaded with CPU heavy tasks, an 8 vCPU proxy will work.
mkretzer
Veeam Legend
Posts: 1203
Liked: 417 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: My small or less big hot add proxies

Post by mkretzer »

Strange. Our experience is that the backup itself goes extremly fast (many VMs, not so many changes) but the hot-add process itself often takes more time than the data moving (at least under VSphere 6). How many proxies are "normal" for customers with ~4000 VM?
CPU is not our issues and i really do not see why the limit should be 1/2 of our host ressources. In our newer hosts that would mean 162 GHz are unused per host.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: My small or less big hot add proxies

Post by tsightler »

It's hard to say what is "normal" because a lot has to do with infrastructure, how many different clusters, what repos are being used, size of VMs, total change rate, and what backup window you are trying to achieve. For example, you say you have lots of VMs with not so many changes, but I work with a lot of enterprise customers where 20-30% of their VMs are multi-TB database servers with large number if disks and overall high change rates (>10%) so the workload is quite different than what you describe. In these cases hotadd operations are, overall, a small percentage of the time, but not totally inconsequential. That being said, there are generally enough parallel operations to keep the overall workload pretty busy.

If I were to just say what is "normal" for the typical deployment, I'd say somewhere between 8-16 proxies with 4-8 vCPU for 4000 VMs, assuming the goal was to get backups done in ~8 hours and that there were no cluster specific requirements. That being said, we usually find that we need more proxies because there are a lot of clusters that don't share datastores in such large envrionments, not to mention that many times you may have 8 hours, but you may want to get the backup done in half that time so that backup copies/offload/whatever can finish. It's not at all uncommon for me to see 100-200 virtual proxies in cases where customers have 10-20K VMs but in many cases they could get by with less, especially if they had fewer clusters. Customers generally want to optimize for minimum proxies as they don't like managing many VMs so we try to find the best balance between # proxies and performance.

If your overall change rate is very low, then I wonder if it even makes sense to go with hotadd as its difficult to overcome the add/remove overhead compared to NBD if the backup itself takes less than a couple of minutes per VM. Linux hotadd proxies in v10 aren't as fast as Windows proxies, especially for incremental backup (no advanced data fetcher) so the gap between NBD and hotadd is ever larger there for now.

Regarding 1/2 of host resources, this is just a general recommendation because, in most environments, customer workloads aren't just sitting there doing nothing at night, many times there are batch process or other functions a night is actually a busy time. If a customers have 10-20% base workload already and, when you build a proxy with more than 1/2 of the host resources (basically, needing to schedule cores on both CPUs mostly concurrently) it's easy to see CPU ready spike quite significantly during high backup load, adding more risk to impacting existing workloads.

We've found that configuring proxies to use less resources per-host (i.e. no more than 50% of host cores for a single proxy), and spreading the proxy load out across hosts, provides better overall performance with less impact on existing loads vs trying to load up a large proxy or number of proxies on a single host. As long as you keep an eye on CPU ready, and your happy with it, then it really doesn't matter.
mkretzer
Veeam Legend
Posts: 1203
Liked: 417 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: My small or less big hot add proxies

Post by mkretzer »

Sounds good, thank you. We want to use hotadd because even with thousands of small VMs we about hundred very heavy VMs. Perhaps we will have to split them in two groups again....
But from the information about the performance it seems like windows would really be a better option for right now. Would you recommend going with 2016 (because VDDK for 6.7 is not officially compatible with 2019 according to https://vdc-download.vmware.com/vmwb-re ... notes.html)?
Post Reply

Who is online

Users browsing this forum: Google [Bot], Semrush [Bot] and 51 guests