Veeam Restore Speed

Post by **tsightler** » Mar 29, 2010 9:42 pm this post

OK, so we recently had a storage array that decided to eat 4TB of VM's sitting on a group of VMFS volumes so Veeam got a good workout. We were able to restore our VM's without serious difficulty, however, it took far to long to get our VM's restored. The average transfer speed was around 30-40MB/sec.

Today, just playing around, I decided to try restoring a VMDK file to one of our Linux host rather than to an ESX server. I was amazed at the speed difference, easily hitting 110MB/sec and faster. Why is restoring to VMFS volumes via the ESX console so slow? I understand the VMware COS is not optimized for this operation, but it still seems exceptionally slow, 30-40MB/sec vs 110-120MB/sec. Are there any tools that can copy to the VMFS volume faster?

I think if I ever find myself in a hurry to restore some VM's in the future I'd restore them to my linux host and share them out via NFS to the VMware servers. Then I could SVmotion them to the VMFS volumes while their running.

Anybody else have any hints/tricks to improve the restore performance of Veeam?

Post by **Gostev** » Mar 29, 2010 9:59 pm this post

I don't currently believe ESX COS is a bottleneck, because our experiments on restoring directly to SAN do not show significant improvement either. I currently believe that the real reason is VMFS design around how it handles writes.

As for tips and tricks, I have heard a few times already that having battery backed cache improves the speed quite significantly.

JLaaij · Post by **JLaaij** » Mar 30, 2010 6:29 am this post

Hi Gostev,

"As for tips and tricks, I have heard a few times already that having battery backed cache improves the speed quite significantly."

Using ESXi v4.0.x with latest patches etc.

I 'm running Starwind HA as SAN.
HP 150G6, 5Gb memory( test)
Raid 1 on 2x WD RE3 500Gb disks
Raid 0 on 4x WD RE3 500Gb disks

Starwind has a caching option.
Have you ever tested speed with Starwind with caching enabled. Or heard results about it?

Greetz Jaap

Post by **Gostev** » Mar 30, 2010 10:34 am this post

Jaap, the battery backed cache I am talking about relates to I/O controller in the ESX servers, not the storage side caching.

Post by **fredbloggs** » Mar 30, 2010 8:33 pm this post

tsightler wrote:I think if I ever find myself in a hurry to restore some VM's in the future I'd restore them to my linux host and share them out via NFS to the VMware servers. Then I could SVmotion them to the VMFS volumes while their running.
?

Just as a query what performance do you get if you run a Linux host as a VM on the same storage, that way you may be able to find a little more on what performance the SAN is offering to confirm whether it's vmfs. Imagine you'd be limited by the 1GB LUN connection to the SAN.

I'm interested, have a SAN from the same vendor as you.

stephaneb · Post by **stephaneb** » Mar 31, 2010 10:07 am this post

FYI, we get about 150 MB/sec during LAN restores on our infrastructure.

Backed up data is on low perf SAN LUNs (SVC with sata storage backend) and restored to high perf SAN LUNs (SVC with fiber 15k disk storage backend), Veeam server OS is 2008 32bit, the Veeam server is using x2 Gbps ports grouped in a LDAP etherchannel team (useful only when restoring to multiple ESX servers).

Our ESX servers are running ESX 4U1 on IBM System x 3850 M2 hardware and have 2 dedicated active/standby Gbps adapters on the vSwitch with the Service Console port (which has no other port group).

Post by **tsightler** » Mar 31, 2010 1:02 pm this post

stephaneb wrote:FYI, we get about 150 MB/sec during LAN restores on our infrastructure.

Is that with large VM's with lots of compressed data? I get similar speeds on some VM's, mainly VM's that have lots of "unused" space or zero'd disk space, or data that's reasonably compressible. Also, are you running Veeam 4.1.1? We saw much better restore speeds with previous versions, but those versions had issues restoring Linux volumes without corruption.

I guess my point here is that our storage is obviously capable of much better restore speeds, the restore to the Linux box is the same disk as the restore to the ESX console, so something else has to be the factor. I'll try restoring to a Linux VM later today, that would make the restore to the actual same volume.

Post by **tsightler** » Mar 31, 2010 2:36 pm this post

OK, here are my results for restoring to the various platforms:

Restore to Linux Physical Host: 213MB/sec -- 1min 36sec

Restore to Linux VM: 155MB/sec -- 2min 12sec

Restore to ESX Console: 41MB/sec -- 8min 19sec

The restore to the Linux VM and the ESX Console were both to the very same VMFS LUN. Obviously VMFS is optimized for performance for VMDK operations (most operations within a VMDK file don't require an iSCSI reservation), but I didn't realize that the overhead when writing via the service console was so much.

I can envision a feature here, when Veeam is performing a restore, it could create the VMDK files, and then run a small appliance that mounts the empty images and lays the blocks down from within this "restore VM".

Post by **tsightler** » Mar 31, 2010 3:48 pm this post

BTW, I forgot to include that the above numbers were for restoring a 20GB VM with only about 7.5GB of actual data, the final 12.5GB restore in just a few seconds no matter the platform.

I'm currently running a test restore of a 350GB VMDK file that previous took 4 hours to restore via the ESX console (it's very full of lots of good data). Currently the restore to the Linux server is estimated to take 45mins and I'm at 135MB/sec and still climbing.

stephaneb · Post by **stephaneb** » Apr 01, 2010 4:58 am this post

I'll be running a test restore of a 200 GB VM full of data later today and will post the results, that way we'll see if I can replicate the poor ESX console performance you are getting.

stephaneb · Post by **stephaneb** » Apr 02, 2010 8:15 am this post

Ok, I have run that test and it averaged at about 67MB/sec which matched the bytes sent/sec I could see in the performance monitor on my Veeam box.
However, the 240 GB I did back up were de-duped @11% and compressed @60%. The COS was running at near 85% CPU and a top revealed it was busy with the Veeamagent processes.
The data was actually written at >100MB onto the VMFS volume, so overall, I believe it was the de-compression in the COS that was the bottleneck.

When you say you are restoring the data to a Linux box or a VM, what do you mean exactly? What do you do in the Veeam B&R console?

When I have the chance, I will run a backup job with dedup & compression turned off, then we'll see if going thru the COS to write the data really is a bottleneck.
It may also show that letting the Veeam server handle the decompression and sending the data as is over the LAN link may be a better strategy when restoring large amount of heavily compressed data.

vbussiro · Post by **vbussiro** » Apr 02, 2010 9:48 am this post

You might achieve this scenario forcing "agentless mode" connecting to the esx for restore (properties of esx host into VB&R), thus forcing veeam server to send data uncompressed. Am i right ?

Post by **Gostev** » Apr 02, 2010 10:49 am this post

@Stephane CPU only becomes a bottleneck when it's load is 100%, I don't believe in your case there are issues with CPU. In anyway decompression is not CPU intensive operation, unlike compression.

@vbussiro That's right, but in case of "agentless mode" the restore will be done through VMware file management API, and this is typically much slower than what you can get with agent mode.

Post by **tsightler** » Apr 02, 2010 1:47 pm this post

When I say "restored the data to a linux server" it's pretty simple, we have a lot of Linux servers, both VM's, and a few physical machines. Veeam backup supports adding Linux systems as targets just like adding ESX servers. Once these servers are added to the list, you can restore the VM files directory to them. I setup one of our physical Linux servers as an NFS server using some of our low-cost, tier 2 storage, and configured a couple of our ESX servers to use the NFS destination as a datastore (Veeam support NFS datastores and the NFS server in RHEL5 U2 is even a certified storage option). I then ran the restore with Veeam normally, but told it to restore the files to the linux server, then used the command-line "vmware-cmd" to register the restored VM to vCenter and fire it up.

For the test of restoring a to a VM, I just picked one of our virtual linux systems that had enough free space, added it as a target to the Veeam console, and told the system to run the restore of the VM files. This VM was running on the very same ESX host, and very same VMFS LUN on which I restored using the COS.

I have an idea on how to test if the COS is the problem. For the NFS restore I restored directly to the linux host that was hosting the datastore, however, since this NFS share is also now a datastore mounted on two of my ESX servers, it's also a target for a "normal" restore via the ESX COS. In other words, I can restore the the ESX server, but either pick the VMFS datastore, or the NFS datastore. This would use the exact same process, but simply write to two different filesystems. If the restore speeds are the same, then the bottleneck is likely the COS, if the restore speeds are different, then the bottleneck is VMFS. I'm pretty sure VMFS has a lot of overhead when writing via the COS because it has to obtain a SCSI reservation for every write, something that normally doesn't have to happen with writes from within a VM. This is a lot of overhead for the storage systems, although I'm sure some storage handles this better than others.

I'll perform my restore to the NFS datastore soon, via the ESX COS soon and update the numbers above.

Also, I don't understand how you say that your array was transferring 100MB/sec but Veeam only reported 67MB/sec. For a 250GB VM that would be a lot of overhead. In our scenario Veeams performance reports seem pretty accurate. We typically see write speeds of 40-60MB/sec and Veeam reports average speeds within that range.

stephaneb · Post by **stephaneb** » Apr 02, 2010 3:29 pm this post

The difference between the actual disk write rate and the Veeam restore rate (which matched the NIC sent bytes/sec) is, as I said, I believe due to the fact that my backed up data was compressed at 60%+
If I understand correctly, when the Veeam agent is used, it sends the compressed data over the wire, then the Veeam agent decompresses it and writes it to disk, so it is expected to see the disk write rate > network transfer rate, no?

I am running a series of tests using the same data with & without dedup/compression and with & without the Veeam Agent, but so far I'm not seeing a big difference... I also average around 55 MB/sec, so I guess your hunch about VMFS being accessed thru the COS being the overhead is likely a good hunch : )
What I fail to see is why writing to the VMFS volume thru the COS requires SCSI reservation when doing it through a VM does not.

I thought SCSI reservations were only required when writing metadata (such as when creating a new file or expanding an existing one).

It is also a bit of a shame that Veeam restore jobs are not logged in the session screen, and that actual transfer rate is not differentiated from the aggregated rate (that would include the processing time used for dedup/decompression) as is done in other traditional backup products.

stephaneb · Post by **stephaneb** » Apr 02, 2010 3:32 pm this post

I forgot to add that in all my scenarios, the network send rate matches perfectly the backup storage disk read rate.
I tested that disk read rate with IO Meter and I know I can get a lot more than the average 55 MB read rate I get when running a restore job from Veeam B&R.

The question is: why is Veeam B&R solliciting more the disk?

Post by **tsightler** » Apr 02, 2010 4:42 pm this post

Restore to Linux Physical Host (ext4 filesystem, NFS datastore): 213MB/sec -- 1min 36sec

Restore to Linux VM (ext3 filesystem on VMFS datastore): 155MB/sec -- 2min 12sec

Restore to ESX Console (VMFS filesystem/datastore): 41MB/sec -- 8min 19sec

Restore to ESX Console (NFS mounted datastore): 96MB/sec -- 3min 35sec

So this shows that, as Anton theorized, the ESX console is not a significant bottleneck. When restoring to an NFS mounted datastore the restore was more than twice as fast, and I believe this was limited by the network connectivity. Currently my console OS is sharing the same physical NIC with the NFS mount since we're only doing this for testing. That means that Veeam was sending the restore data, and the ESX server was writing the restored files over the same 1Gb NIC, not exactly optimal. I suspect that if I changed my configuration to make sure the NFS datastore traffic used a separate NIC from the COS the restore would have been closer to the 155MB/sec number. As it was we were completely saturating a 1Gb link. Still, the results show that the COS is capable of good performance during restores, just not when writing to a VMFS volumes.

I'm certainly not a VMFS expert, it's possible it's not SCSI reservations, but rather locking behavior. I believe that VMFS supports file level locking per host at the filesystem level. When a VM is powered on it locks it's VMDK file blocks by "region" and thus can write to it's blocks without acquiring a filesystem wide lock. On the other hand, an console OS using traditional unix "open" and "write" syscalls requires a lock for each write. You can see the locking behavior of a VMFS by attempting to delete a flat VMDK file for a running host from another host. It will hang for a few seconds and then eventually give a "device or resource busy" error. You don't get the same behavior if you try to delete a file while Veeam is restoring it. In that scenario you'll get a persistent hang until the restore is complete and, if you don't cancel the delete, it will be removed the instant Veeam completes the restore. There's obviously a locking difference between the two.

stephaneb · Post by **stephaneb** » Apr 06, 2010 10:07 am this post

I also wanted to see if there was any benefit to use the Veeam agent in the COS.

When using data that is neither compressed nor deduped, there is about 10% overhead when using the Veeam agent (based on 5 tests using the same 215 GB restore data, restoring a single VM to a single host, it restores on average in 83 minutes with the Veeam agent, and on average in 75 minutes without the Veeam agent).

When the data is heavily compressed and slightly deduped (60% compression and 5% deduped), there is about 25% overhead when using the Veeam agent (based on 5 tests using the same 215 GB restore data, restoring a single VM to a single host, it restores on average in 72 minutes with the Veeam agent, and on average in 55 minutes without the Veeam agent).

In conclusion: there seems to be a performance hit when using the Veeam agent during restore operations (probably because by Veeam server is decompressind and a lot faster with its 4 Nehalem cores than my COS with its single 7440 virtual processor). Probably using the Veeam agent is beneficial when restoring multiple virtual machines to multiple hosts because then you definitely want to save the network bandwidth and decompress and dedupe at the ESX host level.

vbussiro · Post by **vbussiro** » Apr 07, 2010 6:44 am this post

Very interesting post. So in case of hurry, it should be better to restore on nfs datastore then svmotion it to final datastore ?

Post by **tsightler** » Apr 07, 2010 1:21 pm this post

That is certainly the case in our environment. Restoring to an NFS datastore is 2-3x faster than restoring to a VMFS datastore and our environment is not optimized for restoring to NFS. I'm going to try to reconfigure one of our systems to mount the NFS datastore on a different NIC and see if performance gets even better.

So far our best performance is achieved by restoring to a Linux system that acts as an NFS server. This is the perfect solution for us because we already use Linux servers as our backup targets. We've created a "Veeam Recovery Area" on our Linux backup targets and shared them out via NFS to the ESX servers and now, when we do a restore, we simply restore to this "Veeam Recovery Area" and SVmotion from there. Our restores are now 4-5x our previous performance levels, hitting 150-200MB/sec even for very large, compressed VM's. We restored a 750GB VM in just under an hour, where that took almost 8 hours restoring to VMFS. Even better, we can easily run multiple restores to the Linux NFS server because it has so much more native horsepower than the ESX console meaning that we can get aggregate restore performance that completely saturates our bandwidth. Sweet!

jzemaitis · Post by **jzemaitis** » May 14, 2010 2:50 pm this post

I'm running into the same problem. The backups are very fast but restores are causing a big problem. 2.5 hours to restore 60GB when it's backup takes less then 45 minutes. All my hardware is good... the backup server isn't breaking a sweat. Neither is the ESX host. It's always been this way... whats going on?

Maybe you can help me out. I posted something here:
http://www.veeam.com/forums/viewtopic.php?f=2&t=3733

clocatel · Post by **clocatel** » May 18, 2010 2:20 pm this post

Is there any project to restore with SAN Mode. Other backup products use vStorage API with SAN Mode to restore. With VEEAM Backup, restore is about 30 MB/s. With other product, it's about 70 MB/s and i can launch several restore jobs (240 MB/s with 4 restore jobs). One or 8 VEEAM restore jobs, the total speed is always 30 MB/s.
My client uses backups for his Disaster Recovery Plan. So he has about 1,5 TB to restore. 30 MB/s versus 240 MB/s is a significant difference with this amount of data.

Post by **Gostev** » May 18, 2010 2:32 pm this post

Yes, we are planning to implement direct restores to SAN.

sunshineb · Post by **sunshineb** » Jun 07, 2012 6:55 pm this post

When do they plan on offering direct restore to SAN? I"m testing Symantec Backup Exec's product and you are able to do restores with the HotAdd. This is restoring directly to SAN, instead of traversing the network. The speeds are noticeably faster than Veeam. We are planning on using Symantec Backup Exec visualization product, since they can do physical machines too. The pricing has also dropped to start planning on replacing veeam. Their Application recovery is more reliable than Veeams. Veeams works sometimes and sometimes it doesn't. We can not have a product like that backing up SQL Servers, Exchange Servers, and Domain Controllers.

Thanks,
Sunshine

Post by **Gostev** » Jun 07, 2012 7:30 pm this post

We have ended up implementing direct SAN restores via ESXi I/O stack (aka hot add restore), because "normal" direct SAN restores do not work well with thin-provisioned disk, which is what majority of our customers are using for most of their VMs. Specifically, it goes extremely slow (slower than restore over the network), and causes a storm of events that may impact vCenter.

As I explained earlier, we had Direct SAN restores implemented over 2 years ago, but decided not to enable it because of performance issues and potential impact on the environment.

For fastest possible restores, we provide Instant VM Recovery, which Symantec does not provide. No restore type can beat that (granted you backup storage provides decent performance). Also, consider this or this most recent feedback from actual BackupExec 2012 users. That said, you should certainly go with the product that works well for your specific environment, storage and addresses your needs best.

By the way, since you mention application item recovery - one thing that is commonly overlooked with Symantec, is that their file level and application item level recoveries require that backups are stored as uncompressed. As a result, depending on your retention policy, the additional storage cost may be a few times larger than the cost of Veeam licenses. Since you mentioned the price of both solutions, I understand that TCO is a significant factor for you?

Post by **tsightler** » Jun 08, 2012 12:34 am this post

sunshineb wrote:I'm testing Symantec Backup Exec's product and you are able to do restores with the HotAdd.

Veeam also support restores via hotadd.

sunshineb · Post by **sunshineb** » Jun 11, 2012 8:54 pm this post

I will continue testing both products. Thanks for the responds.

R&D Forums

Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Re: Veeam Restore Speed

Who is online