slow full backup of large file server

theflakes · Post by **theflakes** » Jul 18, 2010 12:54 am this post

We have a file server with ~3.25TB of space and only ~1TB used. When using HOTADD mode the backup starts at around 55MB/s, but after two hours of throughput dropoff it slows down to around 7MB/s to 12MB/s until it hits free space then it really speeds back up. This happens when using NBD mode for the backup as well. The first full backup via the NBD method took 18 1/2 hours. The new full backup currently running has processed ~500GB in 7 1/2 hours.

I've tested throughput to the CIFS share we backup to and we can sustain 51MB/s write performance; not great, but it is an older NAS. The read performance from our iSCSI Equallogic VM storage is easily over 100MB/s. There is really nothing else competing for performance from either our VM iSCSI storage or our backup NAS during the times I've tested the full backup of the file server VM.

We are using deduplication and optimal compression. I'm confused as to why the sustained throughput is not much higher. They are thick disks on thin provisioned volumes on the two Equallogic clustered units. CBT backups after the initial full are very fast at anywhere from 1GB/s to 2GB/s.

We are running ESXi version 4 219382.

What am I missing? Thanks...

Post by **tsightler** » Jul 18, 2010 1:21 am this post

How many CPU's and how much memory are you allocating to the Veeam VM? My guess is you're CPU starved. Veeam's dedupe/compression engine can use quite a bit of CPU. What type of processors do the physical ESX hosts use?

theflakes · Post by **theflakes** » Jul 18, 2010 1:43 am this post

Veeam is running on a Win2008 64bit server with 4vCPUs and 8GB of RAM. The CPUs in each vSphere server are two quad core Intel Zeon X5550 at 2.67GHz.

Post by **tsightler** » Jul 18, 2010 3:57 am this post

That sounds like a lot of horsepower, so maybe that's not the problem I've seen several posts on the forum describing similar issues, maybe there's some issue with how the dedupe engine scales as the source system approaches a given size. We have backup jobs in the 1 and 2TB range, including single servers with 1.4TB of data, so while they're not as big as your for the total size, it's similar as far as the amount of data on a single server, and we've never seen that problem.

theflakes · Post by **theflakes** » Jul 19, 2010 12:58 am this post

Well from the testing I've done I've ruled out as best I can the backup to the CIFS share, the Equallogic storage cluster, and the 2008 64bit server VM Veeam is running on. The throughput issue seems to be with Veeam. I've copied very large files to and from the Veeam VM using the CIFS backup share while the file server backup is running and those copies are as fast as they should be. I'm at a loss.

Post by **Gostev** » Jul 19, 2010 8:37 am this post

I would shut down the VM in question, and do two incremental backup passes one after another. Before performing the second pass, disable changed block tracking in the job settings. Since there will be no changes in disks for the second pass to pickup due to VM being shut down, nothing will be sent or written to the target storage. Thus, this experiment would show raw speed of source VM data retrieval - and the bottleneck should become more clear.

nguyent · Post by **nguyent** » Jul 19, 2010 3:02 pm this post

Here are some general performance improving tips from Tech Support.

One thing you can do is check the compression level. If you increase the compression you will be making a tradeoff between CPU cycles and amount of data to push to the target location. You can increase the compression level by:

1. Opening the Backup and Replication console
2. Open the jobs view
3. Right click on the slow job and select properties
4. Click next three times until you reach the "Backup Destination" screen
5. Click on the "Advanced button"
6. Open the "compression tab"
7. Select a higher compression level. (While here, make sure the "Enable inline deduplication" checkbox is selected"
8. Press ok and next through the rest of the properties

You can also defragment the VM from within the operating system.

After defragmenting the VM you can run sdelete (You can find sdelete here > http://technet.microsoft.com/en-us/sysi ... 97443.aspx) on the OS (sdelete writes zeros to the disk where "deleted" data resides). Also, make sure that inline deduplication has been enabled (inline deduplication will not copy zero MB blocks over the network). After Sdelete has been run you can perform a full backup and track the speed of the backup. The incremental and full backup should benefit greatly from these actions, especially if you have a heavily fragmented disk.

If you do not do a full backup after running sdelete you will have an extremely large transfer (we would be tracking the unused "1"s that have been changed to "0"s), subsequent backups should be faster.

The command you want to run would be:
sdelete -c DIRECTORY

Where DIRECTORY is a drive letter or a folder that has data removed from it frequently.

If the performance is still not up to your expectations read on.

Here is a way to test where the bottleneck lies.

First, disable change block tracking, under the vSphere tab in the advanced options of a job. It is best to choose a smaller guest so the test is faster, and more accurate. After that, run a full backup and then immediately after it has completed, start the job once more to perform an incremental backup.
Monitor the data throughput/performance in the summary of the job during both the full and incremental. If the speeds are about the same for both runs, than data retrieval from your datastores is likely the bottleneck. If the incremental is significantly faster than the full, then write speed is the likely culprit.

We can delve a little deeper once we get the test results back, because then we can focus on the area where the bottleneck resides.

Also, if using a SAN based data retrieval over fibre via VCB or vStorage API, make sure that your HBA drivers are up to date.

If the backup server is a VM you may want to try using the vStorage API in "Virtual Appliance" mode also, which does not use the Virtual Infrastructure LAN to retrieve data.

theflakes · Post by **theflakes** » Jul 19, 2010 3:21 pm this post

Our SAN is iSCSI over 1Gb copper. There are three 1Gb connections from each of the three vSphere ESXi servers using MPIO and Jumbo frames.

I ran an analysis with windows defrag tool on all three data hard drives and there is very little fragmentation on any of the drives. I have two data drives that are 1.5TB in size and one that is 250GB in size. Each of the 1.5TB sized drives are located on their own 2TB VMFS store. I see the same poor throughput with all three drives.

The last full backup was done via Virtual Appliance mode yesterday and took over 24 hours to complete. Total data size was 3.26TB; backup size is 555.47GB; De-Dupe Ratio is 21%; Compress Ratio was 77%.

When backing up the individual files with Backup Exec we saw throughput around 1.4GB to 2GB per minute. Being a file server there are a lot of small files that hurts the throughput of in host backups. I wouldn't think Veeam would be affect by this though.

Post by **tsightler** » Jul 19, 2010 3:43 pm this post

I notice that you said you tested the NAS storage and said that it's an older device. When you tested it's performance, did you test it with a 500GB, block based tool, or did you just write some largish files to it? I've seen some NAS devices that perform very poorly with large files, especially large files with a fairly random IO pattern. Do you have any other target that you can test?

theflakes · Post by **theflakes** » Jul 19, 2010 3:55 pm this post

That's a good point. I tested by taking one of the Veeam backups that were considerably larger than the host memory size and copied it back and forth several times. I do have IOMeter installed on the device that I have used as well, but I've only tested either 100% read or 100% write performance. I'll run some more tests with IOMeter and measure a mix of read and write performance with it. Does Veeam when it has Enable automatic backup integrity checks enabled read and write during the backup job? Or does it write the backups and then verify the backups later? Grasping at straws here.

We have two NAS'. The one that the backups are initially written to is an old Overland NAS. The other that copies of each night's backups are made to is a new Windows 2008 Storage server NAS.

Shutting the VM down and running the two suggested incrementals can't happen for a while unfortunately.

theflakes · Post by **theflakes** » Jul 20, 2010 4:39 am this post

After monitoring performance counters on both my NAS' and on the Veeam appliance I think I have some of the mystery sorted out; thank you tsightler! I'm used to thinking about backups as streams where it's all 100% write and then 100% read when the verify kicks in. I see the obvious now with Veeam that it does not work like that. When an incremental kicks off Veeam is reading as much as it writes in order to update the synthetic full backup and create the .vrb file. The Overland NAS is optimized for streaming backups unfortunately.

Running IOmeter against the Overland NAS using 64k blocks, 80/20 sequential/random split, and 50%/50% read/write split showed how horrible the Overland NAS performed. I got very close to the throughput I'm seeing with Veeam backups with that test. My other NAS showed about a three times better throughput; still not amazing throughput, but definitely better. Both NAS's have SATA drives in them. My guess is that with incrementals there are not enough changes in the VMs for the backups to be affected in any significant way by the poor throughput.

This doesn't completely explain the poor full backup performance though as I would think that would be a stream of writes as there is no .vrb file to build. Correct?

Is there any tuning optimizations I can do to help with the throughput? For example, using IOmeter the larger the read/write block the greater the throughput, to a point, as the heads are not having to reposition as quickly. I do run an automated disk defrag of the NAS volumes every morning as well.

What are others seeing for throughput for full and incrementals and what is the hardware config on the NAS; RAID level, # of harddrives, ... ?

Post by **tsightler** » Jul 21, 2010 10:26 pm this post

theflakes wrote: This doesn't completely explain the poor full backup performance though as I would think that would be a stream of writes as there is no .vrb file to build. Correct?

While my knowledge is based exclusively on observing the I/O patterns from a storage perspective, I believe that Veeam has a relatively large "random" component even with the initial full backups. I think this is due to the transactional nature of their datastore as they attempt to protect the integrity of their data if a backup were unexpectedly interrupted. Incomplete transactions can be "rolled back" so that the VBK is left in a consistent state. Also, the dedupe is preformed on the fly, which means that blocks are marked as "duplicate" and metadata is updated throughout the backup process.

theflakes wrote: What are others seeing for throughput for full and incrementals and what is the hardware config on the NAS; RAID level, # of harddrives, ... ?

My backup hardware is not the best, older IBM servers from about 5 years ago (x336), but backup performance for fulls is generally 35-50MB/sec per job, and I can run multiple jobs and hit 100MB/sec pretty easily, with each job running in the 30-40MB/sec range using optimal compression.

Incrementals are far faster for most systems, with nearly 90% of my systems recording speeds of 100MB/sec or more. Because Veeam includes the entire job time in it's calculation for throughput, smaller systems (<50GB) generally show speeds in the 100-200MB/sec range, while larger systems (500GB-1.5TB) show speeds in the 500MB/sec to 1GB/sec or more (fastest reported speed I've ever seen is something like 3GB/sec but that was on a 1.2TB system that has experienced very few changes).

I do have a few systems where incrementals are still slow, mostly our Exchange and busy SQL servers. These systems have a high amount of randomly changed blocks and thus put a lot of overhead on Veeam since it has to perform a lot of single block reads and updates.

My hardware is mostly modest, Equallogic arrays that were mostly purchased back in 2006, all iSCSI. The targets are cheap iSCSI arrays from Enhance Technology with 1TB drives in a RAID6 array. Nothing very special.

R&D Forums

slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Re: slow full backup of large file server

Who is online