Comprehensive data protection for all workloads
Post Reply
rcarstens
Service Provider
Posts: 5
Liked: never
Joined: Jan 20, 2014 6:31 pm
Full Name: Robert Carstens
Contact:

Design for large number of VM's

Post by rcarstens » Mar 07, 2016 6:49 pm

We are currently running into some time limitations when backing up clients with lots of VM's and I am curious what others are doing to make Veeam scale when backing up 200+ VM's and 20TB a night. The environment in question currently is about 175 VMs with about 300-400GB of changed data nightly. This is running to a Synology 3400 with WD Red Pro drives. There are 8 jobs, one for each host, all standard incremental. The backups them self are fairly quick, but the merge's are painfully slow. Once there is more than job running and another merging, the performance is poor. All in all, it takes 14 hours most nights for this all to complete; which is too long in most cases and leaves no room for growth. The jobs take about 3 hours to merge once the backup is finished. It seems like the merge just kills the Synology in terms of performance.

How are others handling situations like this? Multiple Veeam servers, multiple repositories, better storage, etc?

nmdange
Expert
Posts: 469
Liked: 113 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: Design for large number of VM's

Post by nmdange » Mar 07, 2016 7:25 pm 1 person likes this post

If the merging is taking a long time, it sounds like a performance issue on the target storage. How is this storage accessed by Veeam? Is it a Windows repository or a CIFS share? I would suspect a CIFS share would not provide the same performance as storage directly attached to a Windows computer. Also how many physical disks is in this storage device? What RAID level? etc.

rcarstens
Service Provider
Posts: 5
Liked: never
Joined: Jan 20, 2014 6:31 pm
Full Name: Robert Carstens
Contact:

Re: Design for large number of VM's

Post by rcarstens » Mar 07, 2016 10:18 pm

It is a Synology 3400 series with 12 drives in a RAID6. The repository is a SMB share directly from the Synology. Is there a known performance improvement to attach the Synology via iSCSI to a physical Windows server and then create repositories to that Windows machine via CIFS?

DaveWatkins
Expert
Posts: 349
Liked: 93 times
Joined: Dec 13, 2015 11:33 pm
Contact:

Re: Design for large number of VM's

Post by DaveWatkins » Mar 07, 2016 11:50 pm

I'd think it's more likely you're running up against an IOPS issue with RAID6 or the drives. If you can rebuild it RAID10 you'd probably be better off, although you'll lose storage space. Ultimately, it might be time to look at some faster drives

rcarstens
Service Provider
Posts: 5
Liked: never
Joined: Jan 20, 2014 6:31 pm
Full Name: Robert Carstens
Contact:

Re: Design for large number of VM's

Post by rcarstens » Mar 08, 2016 5:25 am

I agree DaveWatkins, RAID 10 would definitely help if we are up against an IO issue on the drives. The enviroment was just recently moved from a 4 drive to a Synology to the 12 drive and performance was not significantly better. This was the reasoning behind wonder if it was a limitation of SMB on the Synology.

I am curious what others are seeing in terms of performance when doing 300GB+ merges on a Synology over iSCSI. Anyone have any data on this?
When calculating the number IO operations needed to complete a merge of this size and factoring in the expected IOPS of the RAID 6, I am getting about half of what is expected. Assuming 250 IOPS for a 12 drive RAID6 a 300GB merge should take about 3 hours based on 4 IO operations per block with 512KB blocks. We are seeing roughly 6 hours to actually perform this operation. I would imagine moving this to a RAID10 would greatly help, just curious if something else is at play here.

If you have an environment of this size or larger, what are you using for your repository?

slos
Influencer
Posts: 20
Liked: 3 times
Joined: Jan 21, 2014 3:53 am
Full Name: Steven Los
Contact:

Re: Design for large number of VM's

Post by slos » Mar 08, 2016 7:46 am

The metrics the environment below is not as large but the goal was to create as many restore points per day as possible. Three host ESXi cluster, Physical VCS Server, Physical Veeam Server, Synology NAS, & Direct SAN Access backup mode.

Previously this was configured in network mode with the Veeam Server as the proxy and a CIFS share on the NAS as the target. One long job per evening was not a problem but running a job during production hours was very noticeable.

Modified the Veeam Sever from one two-disc raid 1 to one two-disc raid1 and one six disc raid5; also moved Veeam to Direct SAN access. The data drive on the Veeam Server became the temporary holding repository holding a short number of backups. A copy job then moves the data to the NAS with a significantly longer retention span.

You’ll have to do your own planning to ensure your data drive and copy job target have sufficient space in order to perform all the actions required to keep the backup chain moving and performance to meet your time goal.
VMCE, MCSE

nmdange
Expert
Posts: 469
Liked: 113 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: Design for large number of VM's

Post by nmdange » Mar 08, 2016 5:10 pm

What is the network interface on this NAS? If it's 1Gbps that could also be a bottleneck.

I use RAID50 in my environment using local SAS disks in a server (and a SAS jbod) directly attached to a SCSI controller. The backup repository server is also my off-host proxy, and the connection between this server and the virtual environment (mostly Hyper-V but also some VMWare) is 10Gbps. I do have a lot more drives (84 drives vs 12) though. Each RAID50 is a grouping of 3 sets of 7-disk RAID5 arrays, all 4TB 7.2k drives. I prefer to stick with SAS-attached storage because you get a lot more bandwidth compared with SAN storage, be it iSCSI or Fiber Channel.

meilicke
Influencer
Posts: 22
Liked: 4 times
Joined: Sep 02, 2014 2:51 pm
Full Name: Scott Meilicke
Contact:

Re: Design for large number of VM's

Post by meilicke » Mar 10, 2016 12:45 am

rcarstens wrote:I agree DaveWatkins, RAID 10 would definitely help if we are up against an IO issue on the drives. The enviroment was just recently moved from a 4 drive to a Synology to the 12 drive and performance was not significantly better. This was the reasoning behind wonder if it was a limitation of SMB on the Synology.

I am curious what others are seeing in terms of performance when doing 300GB+ merges on a Synology over iSCSI. Anyone have any data on this?
When calculating the number IO operations needed to complete a merge of this size and factoring in the expected IOPS of the RAID 6, I am getting about half of what is expected. Assuming 250 IOPS for a 12 drive RAID6 a 300GB merge should take about 3 hours based on 4 IO operations per block with 512KB blocks. We are seeing roughly 6 hours to actually perform this operation. I would imagine moving this to a RAID10 would greatly help, just curious if something else is at play here.

If you have an environment of this size or larger, what are you using for your repository?
I think your 250 IOPS is generous. R6 will scale to faster reads as you add disks, but generally acts as a single disk for writes. I am not surprised you are only seeing ~130 IOPS. If we assume 90 IOPS per disk (I used to assume 180 per 15k FC disk - EMC), six striped mirrors would give you 540 IOPS. 130 IOPS across six is nearly 800 IOPS, so maybe an hour to complete the merge? However at that point you start to bump into the 1G limits.

The other consideration is in my testing with a queue depth of 1, the synologies are fast. As soon as you start to pile up the requests, i.e. increase queue depth, they start to fall over.

foggy
Veeam Software
Posts: 18263
Liked: 1561 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Design for large number of VM's

Post by foggy » Mar 10, 2016 4:33 pm

Synology NAS is typically not the best at providing random IOPS the merge process is all about.

Gostev
SVP, Product Management
Posts: 24793
Liked: 3524 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Design for large number of VM's

Post by Gostev » Mar 10, 2016 7:57 pm

NAS brand makes zero difference to random IOPS capacity... it's all about number of spindles and their speed (and much more rarely, NAS CPU).

csinetops
Expert
Posts: 113
Liked: 15 times
Joined: Jun 06, 2014 2:45 pm
Full Name: csinetops
Contact:

Re: Design for large number of VM's

Post by csinetops » Mar 10, 2016 10:03 pm

I'd have to agree, while your backup target has the capacity to ingest the backups it sounds like it lacks the power to roll up data. I had the same issue when I first installed Veeam 3 years ago, I under spec'd the repository and exceeded my windows. I ended up just getting a HP server and DAS shelf full of disk, no issues after that.

rcarstens
Service Provider
Posts: 5
Liked: never
Joined: Jan 20, 2014 6:31 pm
Full Name: Robert Carstens
Contact:

Re: Design for large number of VM's

Post by rcarstens » Mar 11, 2016 4:53 pm

The Synology has a 1gbps NIC on it, however I hardly ever see it saturated, so it does not seem to be the bottleneck.

Meilicke, I have noticed the same in regards to queue depth. Running just 2-3 tasks I can saturate the network heading to the synology. Change this to 6 concurrent tasks and the performance of the Synology will drop drastically. However, I am still not clear as to whether this bottleneck is from CIFS running on the Synology or the queue depth itself on the box. I am hoping this will be answered when I rebuild the box as a iSCSI target only.

csinetops
Expert
Posts: 113
Liked: 15 times
Joined: Jun 06, 2014 2:45 pm
Full Name: csinetops
Contact:

Re: Design for large number of VM's

Post by csinetops » Mar 11, 2016 8:48 pm

That will be interesting to see if it helps, I bet it will. When I tried to use my Netapp FAS-2240 as a CIFS target for Veeam , performance was abysmal, I couldn't get jobs to roll up in a 12 hour window. Changed it to a iscsi RDM lun (12TB) mounted on the Veeam server and performance has been great.

justyjusty123
Novice
Posts: 7
Liked: 2 times
Joined: Oct 21, 2015 10:16 am
Full Name: Christoph Leitl
Contact:

Re: Design for large number of VM's

Post by justyjusty123 » Mar 14, 2016 5:56 am

We are seeing long merge times and I have an open call at ID# 01709298

*140 VMS about 7,5 TB on Production 4-host ESX cluster.
*Read and write incremental is complete within 3,5 hours. Merge takes another 9 hours, so total job time is 12-14 hours.
*using a virtual machine as a backup proxy on ESX Production cluster.
*All in one big job.
*The backup server (repository) is a physical Windows server with 5+1 drive in RAID6 (SATA drives) with hot-spare. We are considering redoing the setup with RAID10 (8 drives) or moving to a different machine because of long merge times. The server was not built to have that high amount of random iops Veeam needs for the merge, and does not have a BBU.
*10g NIC both on Production as on the backup server.

What I found out is, correct me if I'm wrong, that the incremental backup is written to the disk before the merge is starting (looking at the job details I can see that last machine is being read after 3,5 hours).
So the merge does not affect my production: It will use only the disk resources of the backup repository resources to merge the oldest backup into the second oldest.
That has two consequences for me:
1. I am unable to meet an 8 hour to complete the backup, but:
2. I am still able to meet the requirement that production is not impacted within business hours.

Problems with merge started when we changed from "forward incremental" to "forward incremental forever", meaning, we stopped doing additional weekly fulls. When we have enough space, we will switch back to forward incremental again.

kryptoem
Influencer
Posts: 11
Liked: 5 times
Joined: Jan 28, 2016 6:36 am
Full Name: Etienne Munnich
Contact:

Re: Design for large number of VM's

Post by kryptoem » Mar 14, 2016 8:55 am

I have a similar issue however with more VMs + hosts. I've configured more proxies which improved performance.

Merging of backups is a killer - my solution is to run 7 day retention with a active full on one of the days. Not space efficient but has less of a hit on the NAS (TARGET). I've also now specced 1x SSD upgrade for our Synology. Will update once I've test synthetic fulls and always incremental.

lando_uk
Expert
Posts: 306
Liked: 22 times
Joined: Oct 17, 2013 10:02 am
Full Name: Mark
Location: UK
Contact:

Re: Design for large number of VM's

Post by lando_uk » Mar 14, 2016 10:34 am

Our solution is to not do any transforms during the week and save the pain for weekends.

We manage to protect 350 VMs with about 70TB of frontend data, but we've also hit a wall as some of the transforms are leaking into Monday, thankfully they finish by Monday afternoon but it wont be long until we have to have a rethink. Scrapping RAID6 and going RAID10 for everything is the answer, but that's costly...

An example, a typical job takes 15 mins each night, but takes 9hrs on a weekend to do a the synthetic full with rollbacks.

If we were a 7 days a week place, having to do a Sat and Sun backup would really screw us up, we'd need lots more faster repositories.

ferrus
Veeam ProPartner
Posts: 246
Liked: 31 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: Design for large number of VM's

Post by ferrus » Mar 14, 2016 10:45 am 1 person likes this post

Similar size. We have 375 VMs and over 55TB of data, and that should shortly grow even bigger.
Without the guest indexing on the file servers, the whole estate would be backed up in just over a couple of hours.

The merge adds a few more hours onto that, but on local storage - away from the production SAN.
The longest tasks we've run into are the consistency checking - which on the file servers stretch over a couple of days.

Overall though, two hours a night easily beats the window of our previous backup solution - which stretched to almost a full day :roll:

ITP-Stan
Service Provider
Posts: 97
Liked: 11 times
Joined: Feb 18, 2013 10:45 am
Full Name: Stan (IF-IT4U)
Contact:

Re: Design for large number of VM's

Post by ITP-Stan » Mar 14, 2016 10:59 am

I had a similar issue with a smaller amount of VM's and a smaller Synology NAS device.
We have the Synology connected using iSCSI instead of CIFS/SMB.
Our Synology device is a 4 bay system (DS412+) with 4 WD Red (not pro) 4TB disks in RAID5.
We had about 50 VM's our so and the merge was taking 8 ~ 10hours, take in mind that we have only 4 disks in RAID5 and it are 5400rpm drives!

We had Veeam support investigate this issue, and after some escalation and analyzing, they supplied us with registry parameters that would tune the merge engine for faster cooperation with Synology NAS. This helped reduce the merge time by a couple of hours.

Another option is to avoid the merge all together by using periodic fulls. If you use active fulls the source storage and systems will take the load. If you use synthetic fulls the target storage will take the load (similar to merge). But you can schedule this weekly instead. Ofcourse this will increase your backup storage capacity requirements.

To support growth we are going to use a recently decommissioned SAN (HP P2000 G3) as our main backup repository and the Synology perhaps for backup copy's.

Gostev
SVP, Product Management
Posts: 24793
Liked: 3524 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Design for large number of VM's

Post by Gostev » Mar 14, 2016 12:48 pm

Transform allows for trading storage system performance performance for backups size. If there's no performance, there's nothing to trade - simple as that. Just don't use transforms and store multiple full backups instead (which is also way more reliable approach with low end storage).

@Stan I am finding DS412+ with LFF hard drives waaay too slow even for my home use (at least after getting used to SSDs in my PC).

JailBreak
Veeam Vanguard
Posts: 22
Liked: 1 time
Joined: Jan 01, 2006 1:01 am
Full Name: Luciano Patrao
Contact:

Re: Design for large number of VM's

Post by JailBreak » Mar 14, 2016 4:14 pm

Hi

We backup around 800 VMs(with several jobs) and only 2 concurrent and the full data is around 25Tb with a huge amount of Read and tranfered data, Backups starts at 7:00PM and around 7am all are finish.

But yes we give up of merging backups because takes ages to finish. I prefer to do a full backup in the end of each week.

Post Reply

Who is online

Users browsing this forum: wendellmoine and 43 guests