Largest environments backed up by Veeam, thousands VMs

Post by **razorvines** » May 21, 2012 6:17 pm this post

I have used B&R for all of my clients to backup their virtual environment as well as replicate to an alternate site. They are typically a 3 host environment with 20 - 30 virtual machines. Now I am involved in a much larger internal environment build that will be approximately 10 hosts, 200 VMs, and about 100 TB of data.

What is the largest environment out there being backed up and replicated using Veeam B&R? How many proxies are you using and what kind of bandwidth do you have to DR?

May 21, 2012 6:44 pm

I work with environments every day that are 1000's of VMs. Rarely do I work with environments that are less than 500 VMs. What I'm a little surprised about with your setup is that you have only 200VMs but 100TB of data. That's a very high ratio, far higher than what I typically see. That means that your average VM size is 500GB, which is pretty huge for an average. More typically I see environments that are 100GB/VM, actually with actual used sizes much smaller than that. Is you data evenly spread across your VMs or do you have a small percentage of very huge VMs that make up the bulk of the data?

Post by **razorvines** » May 21, 2012 8:52 pm this post

And you manage all of those backups with Veeam and nothing else? 50 to 60GB of that is actual production data, the rest are archives so pretty static.

May 21, 2012 9:13 pm

Sure, all data managed with Veeam (fair notice, I work for Veeam as the Solutions Architect for large customers). Veeam scales out pretty well, just add enough proxies to handle your data by running enough jobs concurrently. 200 VMs really aren't all that much, though I now see in your initial question that you asked about both backups and replication (sorry, I missed that in the first read). Replication can be a little bit more of a challenge, at least if you are attempting to replicate more often than once a day. Still, it's all about having an infrastructure that can support the required snaps and enough bandwidth to get the data to the remote site. Veeam itself wouldn't have any issues with that many VMs or that much data in general, but there might be some specific things that could trip it up, such as if you have a single server that is very, very large. How big is your largest server?

Post by **razorvines** » May 21, 2012 9:21 pm this post

2Tb is the largest we have. So are you on named accounts or to a territory????

Post by **Berniebgf** » May 22, 2012 10:29 am this post

Tsighter

Can I presume the majority of your backup Targets for Proxies at these large sites are NFS targets? or CIFS targets? Would be interested in how you structure the solutoin (around storage targets/ devices) for these larger multi proxies sites (specifically around backup) when it comes to storage performance... and then movement to tape (if at all)..
Do you more lean towards Multiple proxies (CPU grunt) to Centralised NFS storage? or more distributed model for processing and storage?

Obviously this will depend on site / location and so on.....but lets say for a single LARGE site that requires MANY "Data Mover"........

Bernie.

May 22, 2012 12:26 pm

I'd say I don't typically see NFS targets, but they can be by using a Linux repository. I actually like that approach, but most clients are more comfortable with Windows, and this more common is to see CIFS or NTFS attached to Windows.

It really boils down to what the customer is wanting to do and what infrastructure they have in place. If the client is using a dedupe appliance (probably the majority) then it's either CIFS directly from multiple proxies, or NFS via a Linux repository. If a client is simply going to disk (probably the next biggest group) then SAN attached disk or locally attachted disk on a system used as a repository is the next best method.

From a performance/cost perspective I personally prefer using dedicated physical servers as proxies with locally attached disks for repositories. This provides a self contained device with a single maintenance contract with a fixed performance ceiling that can be easily defined. When you need more storage you add another repository so you get more processing horses as well. Potentially these can be proxies as well so you effectively get SAN offload and scale forever simply by adding dedicated proxy nodes. The disadvantage of this approach is of course that it requires manual balancing of jobs across the available storage/proxies, somewhat negating the smart load balancing built into the V6 product. That being said, from a scale out performance perspective, it's hard architecture to beat since it guarantees that traffic does not cross the network (direct SAN to local disk).

Many large clients have attempted to build single massive repositories using SAN attached disk. This simplifies job management since there is simply one massive pool of target storage, but has significant performance side effects as all of the I/O is targeted at a single large pool of disks it only takes a few reverse incremental jobs to have long request queues and cause tremendous I/O latency, significantly degrading backup performance. The lure of this "single repository" is strong, but is much more difficult to build at scale with reasonably random I/O performance, especially because they are generally attempting to use low end SAN hardware to do so (small caches that are easy to saturate).

So in other words, just as you said, it varies a lot from site to site, customer to customer, based on their goals and budget.

Post by **Berniebgf** » May 22, 2012 12:45 pm this post

Agree %100 with NOT having backups on the same SAN Array that you are backing up, even if separate disks, draws or controllers.....
With the workload that Veeam pushes you do not want to double up your IO on the array! (especially if its a Netapp array with iSCSI!!)

Can't beat internal disk to the proxy or DAS or Dedicated Fibre Array, biggest issues I see are IO throughput issues with source and/or destinations.

Must admit I have changed my mindset with the last few install between Reverse Inc and Incremental, doing allot more Incremental solution to reduce workload on the back-end disk systems...

Thanks for the info, I was wondering how you best leverage the "Load Balancing" which I LOVE with the Replica Proxies. But hard to Architect with the backups.....($$$).

Bernie

May 22, 2012 1:35 pm

Certainly incremental will reduce the workload, but the cost is storage space is significant and thus difficult to scale (assuming no dedupe appliance). Basically you effectively need at least 2x the space of forward incremental, and in many cases (based on retention) 3x or more. Reverse incremental can be designed to perform well for the vast majority of systems, only systems with high change rate are usually an issue. Of course the problem is that most clients don't really know what systems will have high change rate and which ones will not, they want a "single best mode to rule them all" but unfortunately for now it's about tradeoffs between the modes.

Getting the best performance out of reverse incremental as well as synthetic fulls is all about building target storage that is well optimized for the random I/O workload that it will get. So many times I find customers that simply take the default of their RAID or attempt to strip massive bundles of 12-16 disks together creating stripes that are far to small (thus creating too many IOPS), or are way too large (thus creating wasted I/O on every block read/write). Not only that, but having controllers with decent sized, battery-backed write cache is critical.

Also, remember that RAID levels 5/6 have a significant IOP write penalty, which really comes into play with reverse incremental. RAID1 has a write penalty that is half of RAID 5, and thus can generally deliver more IOPS for a mixed R/W workload. Because of this, it can many times be beneficial to use RAID1. Many customers look at this as "wasting" space, but the reality is that this may be the difference between being able to use reverse incremental or not and reverse incremental will easily provide the most space efficient and easiest to manage backup options assuming the IOPS are available to support it.

bhwong · Post by **bhwong** » May 28, 2012 5:47 am this post

Agree %100 with NOT having backups on the same SAN Array that you are backing up, even if separate disks, draws or controllers.....
With the workload that Veeam pushes you do not want to double up your IO on the array! (especially if its a Netapp array with iSCSI!!)

For backing up on the same SAN Array, some SAN vendors provide an API for vCenter so that the data transfer happen internally within the SAN storage without consuming the iSCSI network. Wonder can Veeam make use of this to improve performance?

May 28, 2012 6:50 am

You are missing the fact that Veeam backup process involves on-the-fly data deduplication and compression, not just moving unmodified source data to the backup repository.

bhwong · Post by **bhwong** » May 30, 2012 5:46 am this post

Gostev wrote:You are missing the fact that Veeam backup process involves on-the-fly data deduplication and compression, not just moving unmodified source data to the backup repository.

I see. Will Veeam be able to install an agent to this: http://www.dell.com/us/enterprise/p/pow ... -nx3100/pd that comes with Windows® Storage Server 2008 R2 build-in so that it can act as Proxy and Repository Server without assigning a VM to do so?

Post by **Vitaliy S.** » May 30, 2012 8:17 am this post

Yes, it should be possible. For additional info on supported OS for proxy and repository servers, please review system requirements section in the Release Notes document included with your download. Thanks!

jpeake · Post by **jpeake** » May 31, 2013 3:39 pm this post

I'm curious how large some of your environments are. Is anyone backing up 1,000+ VM's at a single site?

Post by **veremin** » May 31, 2013 3:48 pm this post

There is one of the largest clients that I’ve heard about (from Tom Sightler) who backs up over 2000 VMs on a regular basis. This is nothing but an example, and there are others, as well.

Thanks.

Post by **tsightler** » May 31, 2013 4:16 pm this post

The clients that I work with are, on average, all over 1000 VMs, and many are larger, and quite a few of those are single site. I'm not sure of the largest "single site" customer out there, but I've worked with a few that are in the 2000-2500 range for a single site.

I know of quite a few customers that in total backup 4000+ VMs, but those are almost always multiple sites, or at least multiple "environments" within a single site. It really doesn't matter much from a scaling perspective if their all in one site since we use a distributed architecture, although if you're trying to backup to a single repository (like a dedupe appliance or something) then having enough bandwidth to this single point is critical.

Cokovic · Jun 03, 2013 9:40 am

Backing up 1900 VMs atm on a daily basis.

kte · Post by **kte** » Jan 14, 2014 8:15 pm this post

what type of destination storage do u use and the raid configuration and proxies for 1000 vm's

Post by **dellock6** » Jan 14, 2014 10:28 pm this post

I can speak for myself, but is a Service Provider so we have different requirements: short retention at 7 days, no surebackup or other checks, reversed incremental for super fast restores.
We use several physical proxies running in directSAN towards our iscsi storages. Repository is separated, and is a 10 nodes Ceph cluster. No raid at all on Ceph, but each object (Ceph is an objcet store) is saved twice in two different locations of the cluster. Think about it like a raid10, even if it's not.

chrisBrindley · Post by **chrisBrindley** » Jan 17, 2014 1:31 pm this post

i am interested to here from company's that have around 2000 or more vm's in vmware and how they have their environment setup.

Here is our setup
Vmware 5.5 30 Hosts 40 datastores clustered.
Veeam backup replication 50g ram 16 processors, SQL housed on separate SQL 2012 server
10 Proxy servers running 16g ram 1 socket with 16 cores for processor
EMC DD860 2 10Gig uplinks for backup data only.
total vm's backup count just over 1008, 250TB total space

We run daily backups, 14 day retention, active full every 2 weeks to keep retention pool consistent, we run no compression or dedupe, we let the data domain handle it. we have throttling off, data repository set to 50 connections max and each proxy can process 16 jobs each x 10. CBT is enabled.
As you can see this is no slouch, and i am concerned that if this environment doubles i don't' think we could get all the servers backed up in a 24 hour windows.
we are pushing 16 hours now to get this all done with this setup.
I don't see any bottlenecks, the only issue i see is when hot add fails and we have to rely on the network.
I have heard from veeam that some people are backing up 10,000 vm's and i hope on e of you is on here to discuss your setup

Post by **foggy** » Jan 17, 2014 2:15 pm this post

Chris, you could try to enable dedupe-friendly compression in your jobs, this should reduce backup window while not significantly impact DataDomain data reduction ratio.

prolix21 · Post by **prolix21** » Jan 26, 2016 3:32 pm this post

Wanted to re-visit this post and get some comments from the 1000+ community.

We're a service provider and building out a Veeam environment that needs to support around 1000vms today and be able to scale beyond that eventually.

One issue we are having is the structure of our repositories. We've tried a single repository server which isn't working well, becomes a bottleneck very quickly. I've seen posts where people recommend using your proxy as a repository server and spreading the load that way. So we are considering some 16core x 16gb ram proxies and moving repositories there. All our storage is 10gb iscsi that we mount to a windows 2012 r2 server.

Everything is 10gb and we can't identify bottlenecks anywhere else, the resource contention is always at the repository server (runs out of mem, cpu, etc). I'd appreciate any examples you're willing to share of your setup.

We built up a small POC for a couple hundred VMs and that all was blazing fast, but around 200vms in we began seeing issues with things.

We've also been using scale-out with our repositories and I'm wondering how well that plays with repositories spread out over a handful of proxies. Being a new feature support didn't have enough info to comment yet on that.

pirx · Jan 26, 2016 6:04 pm

I'm interesed in the results too. We are considering Veeam as replacement for our legacy backup appliaction. We will also have 1000+ VMs (200 TB used data, 400 TB vmdks) and thought about 2x StoreOnce 6500 as backup targets. I learnd now that this might not be the best idea, so I'm trying to find out what other people are using. JOBD's are not popular here, we have a large FC SAN enviroment , so additional space there might be one option (separate array of course).

prolix21 · Post by **prolix21** » Jan 26, 2016 8:16 pm this post

pirx wrote:I'm interesed in the results too. We are considering Veeam as replacement for our legacy backup appliaction. We will also have 1000+ VMs (200 TB used data, 400 TB vmdks) and thought about 2x StoreOnce 6500 as backup targets. I learnd now that this might not be the best idea, so I'm trying to find out what other people are using. JOBD's are not popular here, we have a large FC SAN enviroment , so additional space there might be one option (separate array of course).

Our repository storage is all FreeNAS based ZFS volumes. We buy large chassis's that will hold 45 drives and we just stack them to get cheap backup performance over 10GB iSCSI. Those are working well, they take a beating, but how to structure it all so that Veeam itself can scale and take advantage of it has not been so clear and we're struggling. This kind of setup was working fine in Commvault, our previous platform but that's a whole other beast.

After talking to Veeam support they think doing some sort of 1-1 proxy to repository is the way to go, which some others have mentioned here, but then do you make those a scale-out repository? I can't imagine how you'd manually manage the job to repository allocation without scale-out on this number of jobs/VMs. Seems crazy to do that manually and try to balance. We also discussed the idea of a single repository server with all our storage mounted, but I feel like we'd see the same bottlenecks with that with the OS of the server we use. At least the dual role proxy/repository spreads the risk a bit.

Jan 27, 2016 12:39 am

All, just wanted to make sure everyone realizes that this is almost 4 years old topic, and even the last post before today's bump by Dan is 2 years old. Too many things have changed since the original discussion. These days we have clients with 10000+ VMs protected, while those 1000+ VMs environments are countless.

Unfortunately, our largest customers carry world's top brands and legally require that we do not talk about them about being our clients publicly, with just a few exceptions. For the same reason, you will not see them commenting on this thread

pirx wrote:I'm interesed in the results too. We are considering Veeam as replacement for our legacy backup appliaction. We will also have 1000+ VMs (200 TB used data, 400 TB vmdks) and thought about 2x StoreOnce 6500 as backup targets. I learnd now that this might not be the best idea, so I'm trying to find out what other people are using.

Actually, one of our largest clients stores petabytes of data over a few high end StoreOnce units, and they have been extremely happy (and we did not even have any special integration with StoreOnce until v9). They've been our customer for a few years now. May be Tom will be able to provide some more anonymous details, because he was the one who told me about this client.

prolix21 wrote:After talking to Veeam support they think

I highly recommend working closely with our Solution Architect instead to design your Veeam deployment correctly. With thousands of VMs, many small things start to matter (even though v9 simplified many things for this kind of environments). You definitely don't want to design your 1000+ VMs deployment based on advices from our support engineers, as they simply don't have an expertise in architecting Veeam at scale. Even I myself don't have sufficient expertise in this, and often come to Tom for an advise on these matters.

Things is, you gotta do this for living every day for many years to become a good architect... learn by observing what's working and what does not in real-world, how this or that feature behaves at scale, have a good understanding of tons of adjacent subjects (networking, storage fabric etc.) and so on. Our support is great, but just knowing all ins and outs of our product is never enough to create the best architecture for the given environment (otherwise, I would make the best Solution Architect at Veeam)

Jan 27, 2016 7:04 am

Also, I'd add two additional informations that may be useful to others:
- we have released a Best Practice free e-book, written in fact by our Solutions Architect: https://veeampdf.s3.amazonaws.com/guide ... vmware.pdf
- I see from your email domain you are a partner. This year we will release a VMCE Advanced class, that is specifically designed for architects. I've not seen the content yet, but knowing who is working on it, I'm pretty sure the quality is going to be great

Luca

R&D Forums

Largest environments backed up by Veeam, thousands VMs

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

[MERGED] How does Veeam scale?

Re: How does Veeam scale?

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

Re: What is the largest environment being backed up

[MERGED] 2000 VM plus people

Re: Largest environments backed up by Veeam, thousands VMs

Re: Largest environments backed up by Veeam, thousands VMs

Re: Largest environments backed up by Veeam, thousands VMs

Re: Largest environments backed up by Veeam, thousands VMs

Re: Largest environments backed up by Veeam, thousands VMs

Re: Largest environments backed up by Veeam, thousands VMs

Who is online